analyzing-email-headers-for-phishing-investigation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Analyzing Email Headers for Phishing Investigation

针对钓鱼邮件调查的邮件头分析

When to Use

适用场景

  • When investigating a suspected phishing email to determine its true origin
  • For verifying sender authenticity and detecting email spoofing
  • During incident response when a user has clicked a phishing link
  • When tracing the delivery path and relay servers of a suspicious email
  • For validating SPF, DKIM, and DMARC alignment to identify forgery
  • 调查疑似钓鱼邮件以确定其真实来源时
  • 验证发件人真实性并检测邮件伪造时
  • 用户点击钓鱼链接后的事件响应过程中
  • 追踪可疑邮件的投递路径和中继服务器时
  • 通过验证SPF、DKIM和DMARC一致性来识别邮件伪造时

Prerequisites

前提条件

  • Raw email headers from the suspicious message (EML or MSG format)
  • Understanding of SMTP protocol and email header fields
  • Access to DNS lookup tools (dig, nslookup) for SPF/DKIM/DMARC verification
  • Email header analysis tools (MHA, emailheaders.net concepts)
  • Python with email parsing libraries for automated analysis
  • Access to threat intelligence platforms for IP/domain reputation
  • 可疑邮件的原始邮件头(EML或MSG格式)
  • 了解SMTP协议和邮件头字段
  • 可访问DNS查询工具(dig、nslookup)用于SPF/DKIM/DMARC验证
  • 邮件头分析工具(MHA、emailheaders.net相关工具)
  • 带有邮件解析库的Python环境,用于自动化分析
  • 可访问威胁情报平台以查询IP/域名信誉

Workflow

工作流程

Step 1: Extract Raw Email Headers

步骤1:提取原始邮件头

bash
undefined
bash
undefined

Export from Outlook: Open email > File > Properties > Internet Headers

Export from Outlook: Open email > File > Properties > Internet Headers

Export from Gmail: Open email > Three dots > Show original

Export from Gmail: Open email > Three dots > Show original

Export from Thunderbird: View > Message Source

Export from Thunderbird: View > Message Source

If working with EML file from forensic image

If working with EML file from forensic image

cp /mnt/evidence/Users/suspect/AppData/Local/Microsoft/Outlook/phishing_email.eml
/cases/case-2024-001/email/
cp /mnt/evidence/Users/suspect/AppData/Local/Microsoft/Outlook/phishing_email.eml
/cases/case-2024-001/email/

If working with PST file, extract individual messages

If working with PST file, extract individual messages

pip install pypff python3 << 'PYEOF' import pypff
pst = pypff.file() pst.open("/cases/case-2024-001/email/outlook.pst") root = pst.get_root_folder()
def extract_messages(folder, path=""): for i in range(folder.get_number_of_sub_messages()): msg = folder.get_sub_message(i) headers = msg.get_transport_headers() subject = msg.get_subject() if headers: filename = f"/cases/case-2024-001/email/msg_{i}_{subject[:30]}.txt" with open(filename, 'w') as f: f.write(headers) for i in range(folder.get_number_of_sub_folders()): extract_messages(folder.get_sub_folder(i))
extract_messages(root) PYEOF
undefined
pip install pypff python3 << 'PYEOF' import pypff
pst = pypff.file() pst.open("/cases/case-2024-001/email/outlook.pst") root = pst.get_root_folder()
def extract_messages(folder, path=""): for i in range(folder.get_number_of_sub_messages()): msg = folder.get_sub_message(i) headers = msg.get_transport_headers() subject = msg.get_subject() if headers: filename = f"/cases/case-2024-001/email/msg_{i}_{subject[:30]}.txt" with open(filename, 'w') as f: f.write(headers) for i in range(folder.get_number_of_sub_folders()): extract_messages(folder.get_sub_folder(i))
extract_messages(root) PYEOF
undefined

Step 2: Parse the Email Header Chain

步骤2:解析邮件头链

bash
undefined
bash
undefined

Parse headers using Python email library

Parse headers using Python email library

python3 << 'PYEOF' import email from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)
print("=== KEY HEADER FIELDS ===") print(f"From: {msg['From']}") print(f"To: {msg['To']}") print(f"Subject: {msg['Subject']}") print(f"Date: {msg['Date']}") print(f"Message-ID: {msg['Message-ID']}") print(f"Reply-To: {msg['Reply-To']}") print(f"Return-Path: {msg['Return-Path']}") print(f"X-Mailer: {msg['X-Mailer']}") print(f"X-Originating-IP: {msg['X-Originating-IP']}")
print("\n=== RECEIVED HEADERS (bottom-up = chronological) ===") received_headers = msg.get_all('Received') if received_headers: for i, header in enumerate(reversed(received_headers)): print(f"\nHop {i+1}: {header.strip()}")
print("\n=== AUTHENTICATION RESULTS ===") auth_results = msg.get_all('Authentication-Results') if auth_results: for result in auth_results: print(result)
print(f"\nARC-Authentication-Results: {msg.get('ARC-Authentication-Results', 'Not present')}") print(f"Received-SPF: {msg.get('Received-SPF', 'Not present')}") print(f"DKIM-Signature: {msg.get('DKIM-Signature', 'Not present')}") PYEOF
undefined
python3 << 'PYEOF' import email from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)
print("=== KEY HEADER FIELDS ===") print(f"From: {msg['From']}") print(f"To: {msg['To']}") print(f"Subject: {msg['Subject']}") print(f"Date: {msg['Date']}") print(f"Message-ID: {msg['Message-ID']}") print(f"Reply-To: {msg['Reply-To']}") print(f"Return-Path: {msg['Return-Path']}") print(f"X-Mailer: {msg['X-Mailer']}") print(f"X-Originating-IP: {msg['X-Originating-IP']}")
print("\n=== RECEIVED HEADERS (bottom-up = chronological) ===") received_headers = msg.get_all('Received') if received_headers: for i, header in enumerate(reversed(received_headers)): print(f"\nHop {i+1}: {header.strip()}")
print("\n=== AUTHENTICATION RESULTS ===") auth_results = msg.get_all('Authentication-Results') if auth_results: for result in auth_results: print(result)
print(f"\nARC-Authentication-Results: {msg.get('ARC-Authentication-Results', 'Not present')}") print(f"Received-SPF: {msg.get('Received-SPF', 'Not present')}") print(f"DKIM-Signature: {msg.get('DKIM-Signature', 'Not present')}") PYEOF
undefined

Step 3: Validate SPF, DKIM, and DMARC Records

步骤3:验证SPF、DKIM和DMARC记录

bash
undefined
bash
undefined

Extract the envelope sender domain

Extract the envelope sender domain

SENDER_DOMAIN="example-corp.com"
SENDER_DOMAIN="example-corp.com"

Check SPF record

Check SPF record

dig TXT $SENDER_DOMAIN +short | grep "v=spf1"
dig TXT $SENDER_DOMAIN +short | grep "v=spf1"

Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"

Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"

Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")

Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")

DKIM_SELECTOR="selector1" dig TXT ${DKIM_SELECTOR}._domainkey.${SENDER_DOMAIN} +short
DKIM_SELECTOR="selector1" dig TXT ${DKIM_SELECTOR}._domainkey.${SENDER_DOMAIN} +short

Check DMARC record

Check DMARC record

dig TXT _dmarc.${SENDER_DOMAIN} +short
dig TXT _dmarc.${SENDER_DOMAIN} +short

Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"

Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"

Verify the sending IP against SPF

Verify the sending IP against SPF

Extract IP from first Received header

Extract IP from first Received header

SENDING_IP="203.0.113.45"
SENDING_IP="203.0.113.45"

Manual SPF check using python

Manual SPF check using python

python3 << 'PYEOF' import spf # pip install pyspf
result, explanation = spf.check2( i='203.0.113.45', s='sender@example-corp.com', h='mail.example-corp.com' ) print(f"SPF Result: {result}") print(f"Explanation: {explanation}")
python3 << 'PYEOF' import spf # pip install pyspf
result, explanation = spf.check2( i='203.0.113.45', s='sender@example-corp.com', h='mail.example-corp.com' ) print(f"SPF Result: {result}") print(f"Explanation: {explanation}")

Results: pass, fail, softfail, neutral, none, temperror, permerror

Results: pass, fail, softfail, neutral, none, temperror, permerror

PYEOF
PYEOF

Check if sending IP is in known malicious IP lists

Check if sending IP is in known malicious IP lists

Query AbuseIPDB or VirusTotal

Query AbuseIPDB or VirusTotal

curl -s "https://api.abuseipdb.com/api/v2/check?ipAddress=${SENDING_IP}"
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool
undefined
curl -s "https://api.abuseipdb.com/api/v2/check?ipAddress=${SENDING_IP}"
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool
undefined

Step 4: Analyze Sender Domain and Infrastructure

步骤4:分析发件人域和基础设施

bash
undefined
bash
undefined

WHOIS lookup on sender domain

WHOIS lookup on sender domain

whois $SENDER_DOMAIN | grep -iE '(registrar|creation|expiration|registrant|nameserver)'
whois $SENDER_DOMAIN | grep -iE '(registrar|creation|expiration|registrant|nameserver)'

Check domain age (recently registered domains are suspicious)

Check domain age (recently registered domains are suspicious)

DNS record investigation

DNS record investigation

dig A $SENDER_DOMAIN +short dig MX $SENDER_DOMAIN +short dig NS $SENDER_DOMAIN +short
dig A $SENDER_DOMAIN +short dig MX $SENDER_DOMAIN +short dig NS $SENDER_DOMAIN +short

Reverse DNS on sending IP

Reverse DNS on sending IP

dig -x $SENDING_IP +short
dig -x $SENDING_IP +short

Check for lookalike/typosquatting domains

Check for lookalike/typosquatting domains

Compare with legitimate domain using visual similarity

Compare with legitimate domain using visual similarity

python3 << 'PYEOF' import Levenshtein # pip install python-Levenshtein
legitimate = "microsoft.com" suspicious = "micr0soft.com"
distance = Levenshtein.distance(legitimate, suspicious) ratio = Levenshtein.ratio(legitimate, suspicious) print(f"Edit distance: {distance}") print(f"Similarity ratio: {ratio:.2%}") if ratio > 0.8: print("WARNING: Likely typosquatting/lookalike domain!") PYEOF
python3 << 'PYEOF' import Levenshtein # pip install python-Levenshtein
legitimate = "microsoft.com" suspicious = "micr0soft.com"
distance = Levenshtein.distance(legitimate, suspicious) ratio = Levenshtein.ratio(legitimate, suspicious) print(f"Edit distance: {distance}") print(f"Similarity ratio: {ratio:.2%}") if ratio > 0.8: print("WARNING: Likely typosquatting/lookalike domain!") PYEOF

Check domain reputation on VirusTotal

Check domain reputation on VirusTotal

curl -s "https://www.virustotal.com/api/v3/domains/${SENDER_DOMAIN}"
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool
curl -s "https://www.virustotal.com/api/v3/domains/${SENDER_DOMAIN}"
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool

Check if the Reply-To differs from From (common phishing indicator)

Check if the Reply-To differs from From (common phishing indicator)

python3 -c " import email with open('/cases/case-2024-001/email/phishing_email.eml') as f: msg = email.message_from_file(f) from_addr = email.utils.parseaddr(msg['From'])[1] reply_to = email.utils.parseaddr(msg.get('Reply-To', msg['From']))[1] if from_addr != reply_to: print(f'WARNING: From ({from_addr}) != Reply-To ({reply_to})') else: print('From and Reply-To match') "
undefined
python3 -c " import email with open('/cases/case-2024-001/email/phishing_email.eml') as f: msg = email.message_from_file(f) from_addr = email.utils.parseaddr(msg['From'])[1] reply_to = email.utils.parseaddr(msg.get('Reply-To', msg['From']))[1] if from_addr != reply_to: print(f'WARNING: From ({from_addr}) != Reply-To ({reply_to})') else: print('From and Reply-To match') "
undefined

Step 5: Examine Email Body and Attachments

步骤5:检查邮件正文和附件

bash
undefined
bash
undefined

Extract URLs from email body

Extract URLs from email body

python3 << 'PYEOF' import email import re from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)
body = msg.get_body(preferencelist=('html', 'plain')) if body: content = body.get_content() urls = re.findall(r'https?://[^\s<>"']+', content) print("=== URLs FOUND IN EMAIL BODY ===") for url in set(urls): print(f" {url}")
# Check for URL obfuscation (display text != href)
href_pattern = re.findall(r'<a[^>]*href=["\']([^"\']+)["\'][^>]*>(.*?)</a>', content, re.DOTALL)
print("\n=== HYPERLINK ANALYSIS ===")
for href, text in href_pattern:
    display_url = re.findall(r'https?://[^\s<]+', text)
    if display_url and display_url[0] != href:
        print(f"  MISMATCH: Display='{display_url[0]}' -> Actual='{href}'")
python3 << 'PYEOF' import email import re from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)
body = msg.get_body(preferencelist=('html', 'plain')) if body: content = body.get_content() urls = re.findall(r'https?://[^\s<>"']+', content) print("=== URLs FOUND IN EMAIL BODY ===") for url in set(urls): print(f" {url}")
# Check for URL obfuscation (display text != href)
href_pattern = re.findall(r'<a[^>]*href=["\']([^"\']+)["\'][^>]*>(.*?)</a>', content, re.DOTALL)
print("\n=== HYPERLINK ANALYSIS ===")
for href, text in href_pattern:
    display_url = re.findall(r'https?://[^\s<]+', text)
    if display_url and display_url[0] != href:
        print(f"  MISMATCH: Display='{display_url[0]}' -> Actual='{href}'")

Extract and hash attachments

Extract and hash attachments

print("\n=== ATTACHMENTS ===") for part in msg.walk(): if part.get_content_disposition() == 'attachment': filename = part.get_filename() content = part.get_payload(decode=True) import hashlib sha256 = hashlib.sha256(content).hexdigest() print(f" File: {filename}, Size: {len(content)}, SHA-256: {sha256}") with open(f'/cases/case-2024-001/email/attachments/{filename}', 'wb') as af: af.write(content) PYEOF
print("\n=== ATTACHMENTS ===") for part in msg.walk(): if part.get_content_disposition() == 'attachment': filename = part.get_filename() content = part.get_payload(decode=True) import hashlib sha256 = hashlib.sha256(content).hexdigest() print(f" File: {filename}, Size: {len(content)}, SHA-256: {sha256}") with open(f'/cases/case-2024-001/email/attachments/{filename}', 'wb') as af: af.write(content) PYEOF

Submit attachment hashes to VirusTotal

Submit attachment hashes to VirusTotal

Submit URLs to URLhaus or PhishTank for reputation check

Submit URLs to URLhaus or PhishTank for reputation check

undefined
undefined

Key Concepts

核心概念

ConceptDescription
SPF (Sender Policy Framework)DNS record specifying authorized mail servers for a domain
DKIM (DomainKeys Identified Mail)Cryptographic signature verifying email content integrity
DMARCPolicy framework combining SPF and DKIM for sender authentication
Received headersServer-added headers showing each hop in the delivery chain (read bottom to top)
Return-PathEnvelope sender address used for bounce messages; may differ from From
Message-IDUnique identifier assigned by the originating mail server
X-Originating-IPOriginal sender IP address (added by some mail services)
Header forgeryAttackers can forge From, Reply-To, and other headers but not Received chains
概念描述
SPF (Sender Policy Framework)指定域授权邮件服务器的DNS记录
DKIM (DomainKeys Identified Mail)用于验证邮件内容完整性的加密签名
DMARC结合SPF和DKIM的发件人认证策略框架
Received headers服务器添加的邮件头,显示投递链中的每个节点(从下往上阅读为时间顺序)
Return-Path用于退信的信封发件人地址,可能与From字段不同
Message-ID由原始邮件服务器分配的唯一标识符
X-Originating-IP原始发件人IP地址(部分邮件服务会添加)
Header forgery攻击者可伪造From、Reply-To等邮件头,但无法伪造Received链

Tools & Systems

工具与系统

ToolPurpose
MXToolboxOnline email header analyzer and DNS lookup
dig/nslookupDNS record queries for SPF, DKIM, DMARC verification
pyspfPython SPF record validation library
dkimpyPython DKIM signature verification library
PhishToolSpecialized phishing email analysis platform
VirusTotalURL and file reputation checking service
AbuseIPDBIP address reputation database
whoisDomain registration information lookup
工具用途
MXToolbox在线邮件头分析器和DNS查询工具
dig/nslookup用于SPF、DKIM、DMARC验证的DNS记录查询工具
pyspfPython的SPF记录验证库
dkimpyPython的DKIM签名验证库
PhishTool专业钓鱼邮件分析平台
VirusTotalURL和文件信誉检查服务
AbuseIPDBIP地址信誉数据库
whois域名注册信息查询工具

Common Scenarios

常见场景

Scenario 1: CEO Fraud / Business Email Compromise The email claims to be from the CEO but Reply-To points to a Gmail address, SPF fails because the sending IP is not authorized for the spoofed domain, DKIM is missing, and the From domain is a lookalike (ceo-company.com vs company.com).
Scenario 2: Credential Harvesting Phishing Email contains a link that displays "login.microsoft.com" but href points to a lookalike domain, the attachment is an HTML file containing a fake login page with credential exfiltration JavaScript, the sending domain was registered 3 days ago.
Scenario 3: Malware Delivery via Attachment Email with an Office document attachment containing macros, the sender domain passes SPF but the account was compromised, DKIM signature is valid (sent from legitimate infrastructure), attachment SHA-256 matches known malware on VirusTotal.
Scenario 4: Spear Phishing with Legitimate Service Attacker uses a legitimate email marketing service to send phishing, SPF and DKIM pass because the service is authorized, the phishing is in the content not the infrastructure, requires URL and content analysis rather than header authentication checks.
场景1:CEO诈骗/企业邮件入侵 邮件声称来自CEO,但Reply-To指向Gmail地址;SPF验证失败,因为发件IP未被伪造域授权;DKIM签名缺失;From域为相似域名(ceo-company.com vs company.com)。
场景2:凭证窃取钓鱼邮件 邮件包含显示为“login.microsoft.com”但实际指向相似域名的链接;附件是包含伪造登录页面和凭证窃取JavaScript的HTML文件;发件域注册于3天前。
场景3:通过附件投递恶意软件 邮件包含带宏的Office文档附件;发件域通过SPF验证,但账户已被入侵;DKIM签名有效(来自合法基础设施);附件SHA-256值与VirusTotal上已知恶意软件匹配。
场景4:利用合法服务的鱼叉式钓鱼 攻击者使用合法邮件营销服务发送钓鱼邮件;SPF和DKIM验证通过,因为该服务已被授权;钓鱼行为存在于内容而非基础设施中,需要进行URL和内容分析而非邮件头认证检查。

Output Format

输出格式

Email Header Analysis Report:
  Subject:     "Urgent: Invoice Payment Required"
  From:        accounting@examp1e-corp.com (SPOOFED)
  Reply-To:    payments.urgent@gmail.com (MISMATCH)
  Return-Path: <bounce@mail-server.xyz>
  Date:        2024-01-15 09:23:45 UTC

  Delivery Path (4 hops):
    Hop 1: mail-server.xyz [203.0.113.45] -> relay1.isp.com
    Hop 2: relay1.isp.com -> mx.target-company.com
    Hop 3: mx.target-company.com -> internal-filter.target.com
    Hop 4: internal-filter.target.com -> mailbox

  Authentication:
    SPF:    FAIL (203.0.113.45 not authorized for examp1e-corp.com)
    DKIM:   NONE (no signature present)
    DMARC:  FAIL (p=none, no enforcement)

  Indicators of Phishing:
    - Lookalike domain (examp1e-corp.com vs example-corp.com, 96% similar)
    - From/Reply-To mismatch
    - Domain registered 2 days before email sent
    - URL in body points to credential harvesting page
    - Attachment: invoice.xlsm (SHA-256: a3f2...) - Known malware on VT

  Risk Level: HIGH
Email Header Analysis Report:
  Subject:     "Urgent: Invoice Payment Required"
  From:        accounting@examp1e-corp.com (SPOOFED)
  Reply-To:    payments.urgent@gmail.com (MISMATCH)
  Return-Path: <bounce@mail-server.xyz>
  Date:        2024-01-15 09:23:45 UTC

  Delivery Path (4 hops):
    Hop 1: mail-server.xyz [203.0.113.45] -> relay1.isp.com
    Hop 2: relay1.isp.com -> mx.target-company.com
    Hop 3: mx.target-company.com -> internal-filter.target.com
    Hop 4: internal-filter.target.com -> mailbox

  Authentication:
    SPF:    FAIL (203.0.113.45 not authorized for examp1e-corp.com)
    DKIM:   NONE (no signature present)
    DMARC:  FAIL (p=none, no enforcement)

  Indicators of Phishing:
    - Lookalike domain (examp1e-corp.com vs example-corp.com, 96% similar)
    - From/Reply-To mismatch
    - Domain registered 2 days before email sent
    - URL in body points to credential harvesting page
    - Attachment: invoice.xlsm (SHA-256: a3f2...) - Known malware on VT

  Risk Level: HIGH