Analyzing Email Headers for Phishing Investigation

针对钓鱼邮件调查的邮件头分析

When to Use

适用场景

When investigating a suspected phishing email to determine its true origin
For verifying sender authenticity and detecting email spoofing
During incident response when a user has clicked a phishing link
When tracing the delivery path and relay servers of a suspicious email
For validating SPF, DKIM, and DMARC alignment to identify forgery

调查疑似钓鱼邮件以确定其真实来源时
验证发件人真实性并检测邮件伪造时
用户点击钓鱼链接后的事件响应过程中
追踪可疑邮件的投递路径和中继服务器时
通过验证SPF、DKIM和DMARC一致性来识别邮件伪造时

Prerequisites

前提条件

Raw email headers from the suspicious message (EML or MSG format)
Understanding of SMTP protocol and email header fields
Access to DNS lookup tools (dig, nslookup) for SPF/DKIM/DMARC verification
Email header analysis tools (MHA, emailheaders.net concepts)
Python with email parsing libraries for automated analysis
Access to threat intelligence platforms for IP/domain reputation

可疑邮件的原始邮件头（EML或MSG格式）
了解SMTP协议和邮件头字段
可访问DNS查询工具（dig、nslookup）用于SPF/DKIM/DMARC验证
邮件头分析工具（MHA、emailheaders.net相关工具）
带有邮件解析库的Python环境，用于自动化分析
可访问威胁情报平台以查询IP/域名信誉

Workflow

工作流程

Step 1: Extract Raw Email Headers

步骤1：提取原始邮件头

bash

undefined

bash

undefined

Export from Outlook: Open email > File > Properties > Internet Headers

Export from Gmail: Open email > Three dots > Show original

Export from Thunderbird: View > Message Source

If working with EML file from forensic image

cp /mnt/evidence/Users/suspect/AppData/Local/Microsoft/Outlook/phishing_email.eml
/cases/case-2024-001/email/

If working with PST file, extract individual messages

pip install pypff python3 << 'PYEOF' import pypff

pst = pypff.file() pst.open("/cases/case-2024-001/email/outlook.pst") root = pst.get_root_folder()

def extract_messages(folder, path=""): for i in range(folder.get_number_of_sub_messages()): msg = folder.get_sub_message(i) headers = msg.get_transport_headers() subject = msg.get_subject() if headers: filename = f"/cases/case-2024-001/email/msg_{i}_{subject[:30]}.txt" with open(filename, 'w') as f: f.write(headers) for i in range(folder.get_number_of_sub_folders()): extract_messages(folder.get_sub_folder(i))

extract_messages(root) PYEOF

undefined

pip install pypff python3 << 'PYEOF' import pypff

pst = pypff.file() pst.open("/cases/case-2024-001/email/outlook.pst") root = pst.get_root_folder()

def extract_messages(folder, path=""): for i in range(folder.get_number_of_sub_messages()): msg = folder.get_sub_message(i) headers = msg.get_transport_headers() subject = msg.get_subject() if headers: filename = f"/cases/case-2024-001/email/msg_{i}_{subject[:30]}.txt" with open(filename, 'w') as f: f.write(headers) for i in range(folder.get_number_of_sub_folders()): extract_messages(folder.get_sub_folder(i))

extract_messages(root) PYEOF

undefined

Step 2: Parse the Email Header Chain

步骤2：解析邮件头链

bash

undefined

bash

undefined

Parse headers using Python email library

python3 << 'PYEOF' import email from email import policy

with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)

print("=== KEY HEADER FIELDS ===") print(f"From: {msg['From']}") print(f"To: {msg['To']}") print(f"Subject: {msg['Subject']}") print(f"Date: {msg['Date']}") print(f"Message-ID: {msg['Message-ID']}") print(f"Reply-To: {msg['Reply-To']}") print(f"Return-Path: {msg['Return-Path']}") print(f"X-Mailer: {msg['X-Mailer']}") print(f"X-Originating-IP: {msg['X-Originating-IP']}")

print("\n=== RECEIVED HEADERS (bottom-up = chronological) ===") received_headers = msg.get_all('Received') if received_headers: for i, header in enumerate(reversed(received_headers)): print(f"\nHop {i+1}: {header.strip()}")

print("\n=== AUTHENTICATION RESULTS ===") auth_results = msg.get_all('Authentication-Results') if auth_results: for result in auth_results: print(result)

print(f"\nARC-Authentication-Results: {msg.get('ARC-Authentication-Results', 'Not present')}") print(f"Received-SPF: {msg.get('Received-SPF', 'Not present')}") print(f"DKIM-Signature: {msg.get('DKIM-Signature', 'Not present')}") PYEOF

undefined

python3 << 'PYEOF' import email from email import policy

with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)

print("=== KEY HEADER FIELDS ===") print(f"From: {msg['From']}") print(f"To: {msg['To']}") print(f"Subject: {msg['Subject']}") print(f"Date: {msg['Date']}") print(f"Message-ID: {msg['Message-ID']}") print(f"Reply-To: {msg['Reply-To']}") print(f"Return-Path: {msg['Return-Path']}") print(f"X-Mailer: {msg['X-Mailer']}") print(f"X-Originating-IP: {msg['X-Originating-IP']}")

print("\n=== RECEIVED HEADERS (bottom-up = chronological) ===") received_headers = msg.get_all('Received') if received_headers: for i, header in enumerate(reversed(received_headers)): print(f"\nHop {i+1}: {header.strip()}")

print("\n=== AUTHENTICATION RESULTS ===") auth_results = msg.get_all('Authentication-Results') if auth_results: for result in auth_results: print(result)

print(f"\nARC-Authentication-Results: {msg.get('ARC-Authentication-Results', 'Not present')}") print(f"Received-SPF: {msg.get('Received-SPF', 'Not present')}") print(f"DKIM-Signature: {msg.get('DKIM-Signature', 'Not present')}") PYEOF

undefined

Step 3: Validate SPF, DKIM, and DMARC Records

步骤3：验证SPF、DKIM和DMARC记录

bash

undefined

bash

undefined

Extract the envelope sender domain

SENDER_DOMAIN="example-corp.com"

Check SPF record

dig TXT $SENDER_DOMAIN +short | grep "v=spf1"

Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"

Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")

DKIM_SELECTOR="selector1" dig TXT ${DKIM_SELECTOR}._domainkey.${SENDER_DOMAIN} +short

Check DMARC record

dig TXT _dmarc.${SENDER_DOMAIN} +short

Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"

Verify the sending IP against SPF

Extract IP from first Received header

SENDING_IP="203.0.113.45"

Manual SPF check using python

python3 << 'PYEOF' import spf # pip install pyspf

result, explanation = spf.check2( i='203.0.113.45', s='sender@example-corp.com', h='mail.example-corp.com' ) print(f"SPF Result: {result}") print(f"Explanation: {explanation}")

python3 << 'PYEOF' import spf # pip install pyspf

result, explanation = spf.check2( i='203.0.113.45', s='sender@example-corp.com', h='mail.example-corp.com' ) print(f"SPF Result: {result}") print(f"Explanation: {explanation}")

Results: pass, fail, softfail, neutral, none, temperror, permerror

PYEOF

Check if sending IP is in known malicious IP lists

Query AbuseIPDB or VirusTotal

curl -s "https://api.abuseipdb.com/api/v2/check?ipAddress=${SENDING_IP}"
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool

undefined

curl -s "https://api.abuseipdb.com/api/v2/check?ipAddress=${SENDING_IP}"
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool

undefined

Step 4: Analyze Sender Domain and Infrastructure

步骤4：分析发件人域和基础设施

bash

undefined

bash

undefined

WHOIS lookup on sender domain

Check domain age (recently registered domains are suspicious)

DNS record investigation

dig A $SENDER_DOMAIN +short dig MX $SENDER_DOMAIN +short dig NS $SENDER_DOMAIN +short

Reverse DNS on sending IP

dig -x $SENDING_IP +short

Check for lookalike/typosquatting domains

Compare with legitimate domain using visual similarity

python3 << 'PYEOF' import Levenshtein # pip install python-Levenshtein

legitimate = "microsoft.com" suspicious = "micr0soft.com"

distance = Levenshtein.distance(legitimate, suspicious) ratio = Levenshtein.ratio(legitimate, suspicious) print(f"Edit distance: {distance}") print(f"Similarity ratio: {ratio:.2%}") if ratio > 0.8: print("WARNING: Likely typosquatting/lookalike domain!") PYEOF

python3 << 'PYEOF' import Levenshtein # pip install python-Levenshtein

legitimate = "microsoft.com" suspicious = "micr0soft.com"

distance = Levenshtein.distance(legitimate, suspicious) ratio = Levenshtein.ratio(legitimate, suspicious) print(f"Edit distance: {distance}") print(f"Similarity ratio: {ratio:.2%}") if ratio > 0.8: print("WARNING: Likely typosquatting/lookalike domain!") PYEOF

Check domain reputation on VirusTotal

curl -s "https://www.virustotal.com/api/v3/domains/${SENDER_DOMAIN}"
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool

Check if the Reply-To differs from From (common phishing indicator)

python3 -c " import email with open('/cases/case-2024-001/email/phishing_email.eml') as f: msg = email.message_from_file(f) from_addr = email.utils.parseaddr(msg['From'])[1] reply_to = email.utils.parseaddr(msg.get('Reply-To', msg['From']))[1] if from_addr != reply_to: print(f'WARNING: From ({from_addr}) != Reply-To ({reply_to})') else: print('From and Reply-To match') "

undefined

python3 -c " import email with open('/cases/case-2024-001/email/phishing_email.eml') as f: msg = email.message_from_file(f) from_addr = email.utils.parseaddr(msg['From'])[1] reply_to = email.utils.parseaddr(msg.get('Reply-To', msg['From']))[1] if from_addr != reply_to: print(f'WARNING: From ({from_addr}) != Reply-To ({reply_to})') else: print('From and Reply-To match') "

undefined

Step 5: Examine Email Body and Attachments

步骤5：检查邮件正文和附件

bash

undefined

bash

undefined

Extract URLs from email body

python3 << 'PYEOF' import email import re from email import policy

with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)

body = msg.get_body(preferencelist=('html', 'plain')) if body: content = body.get_content() urls = re.findall(r'https?://[^\s<>"']+', content) print("=== URLs FOUND IN EMAIL BODY ===") for url in set(urls): print(f" {url}")

# Check for URL obfuscation (display text != href)
href_pattern = re.findall(r'<a[^>]*href=["\']([^"\']+)["\'][^>]*>(.*?)</a>', content, re.DOTALL)
print("\n=== HYPERLINK ANALYSIS ===")
for href, text in href_pattern:
    display_url = re.findall(r'https?://[^\s<]+', text)
    if display_url and display_url[0] != href:
        print(f"  MISMATCH: Display='{display_url[0]}' -> Actual='{href}'")

python3 << 'PYEOF' import email import re from email import policy

with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f: msg = email.message_from_file(f, policy=policy.default)

body = msg.get_body(preferencelist=('html', 'plain')) if body: content = body.get_content() urls = re.findall(r'https?://[^\s<>"']+', content) print("=== URLs FOUND IN EMAIL BODY ===") for url in set(urls): print(f" {url}")

# Check for URL obfuscation (display text != href)
href_pattern = re.findall(r'<a[^>]*href=["\']([^"\']+)["\'][^>]*>(.*?)</a>', content, re.DOTALL)
print("\n=== HYPERLINK ANALYSIS ===")
for href, text in href_pattern:
    display_url = re.findall(r'https?://[^\s<]+', text)
    if display_url and display_url[0] != href:
        print(f"  MISMATCH: Display='{display_url[0]}' -> Actual='{href}'")

Extract and hash attachments

print("\n=== ATTACHMENTS ===") for part in msg.walk(): if part.get_content_disposition() == 'attachment': filename = part.get_filename() content = part.get_payload(decode=True) import hashlib sha256 = hashlib.sha256(content).hexdigest() print(f" File: {filename}, Size: {len(content)}, SHA-256: {sha256}") with open(f'/cases/case-2024-001/email/attachments/{filename}', 'wb') as af: af.write(content) PYEOF

Submit attachment hashes to VirusTotal

Submit URLs to URLhaus or PhishTank for reputation check

undefined

undefined

Key Concepts

核心概念

Concept	Description
SPF (Sender Policy Framework)	DNS record specifying authorized mail servers for a domain
DKIM (DomainKeys Identified Mail)	Cryptographic signature verifying email content integrity
DMARC	Policy framework combining SPF and DKIM for sender authentication
Received headers	Server-added headers showing each hop in the delivery chain (read bottom to top)
Return-Path	Envelope sender address used for bounce messages; may differ from From
Message-ID	Unique identifier assigned by the originating mail server
X-Originating-IP	Original sender IP address (added by some mail services)
Header forgery	Attackers can forge From, Reply-To, and other headers but not Received chains

概念	描述
SPF (Sender Policy Framework)	指定域授权邮件服务器的DNS记录
DKIM (DomainKeys Identified Mail)	用于验证邮件内容完整性的加密签名
DMARC	结合SPF和DKIM的发件人认证策略框架
Received headers	服务器添加的邮件头，显示投递链中的每个节点（从下往上阅读为时间顺序）
Return-Path	用于退信的信封发件人地址，可能与From字段不同
Message-ID	由原始邮件服务器分配的唯一标识符
X-Originating-IP	原始发件人IP地址（部分邮件服务会添加）
Header forgery	攻击者可伪造From、Reply-To等邮件头，但无法伪造Received链

Tools & Systems

工具与系统

Tool	Purpose
MXToolbox	Online email header analyzer and DNS lookup
dig/nslookup	DNS record queries for SPF, DKIM, DMARC verification
pyspf	Python SPF record validation library
dkimpy	Python DKIM signature verification library
PhishTool	Specialized phishing email analysis platform
VirusTotal	URL and file reputation checking service
AbuseIPDB	IP address reputation database
whois	Domain registration information lookup

工具	用途
MXToolbox	在线邮件头分析器和DNS查询工具
dig/nslookup	用于SPF、DKIM、DMARC验证的DNS记录查询工具
pyspf	Python的SPF记录验证库
dkimpy	Python的DKIM签名验证库
PhishTool	专业钓鱼邮件分析平台
VirusTotal	URL和文件信誉检查服务
AbuseIPDB	IP地址信誉数据库
whois	域名注册信息查询工具

Common Scenarios

常见场景

Scenario 1: CEO Fraud / Business Email Compromise The email claims to be from the CEO but Reply-To points to a Gmail address, SPF fails because the sending IP is not authorized for the spoofed domain, DKIM is missing, and the From domain is a lookalike (ceo-company.com vs company.com).

Scenario 2: Credential Harvesting Phishing Email contains a link that displays "login.microsoft.com" but href points to a lookalike domain, the attachment is an HTML file containing a fake login page with credential exfiltration JavaScript, the sending domain was registered 3 days ago.

Scenario 3: Malware Delivery via Attachment Email with an Office document attachment containing macros, the sender domain passes SPF but the account was compromised, DKIM signature is valid (sent from legitimate infrastructure), attachment SHA-256 matches known malware on VirusTotal.

Scenario 4: Spear Phishing with Legitimate Service Attacker uses a legitimate email marketing service to send phishing, SPF and DKIM pass because the service is authorized, the phishing is in the content not the infrastructure, requires URL and content analysis rather than header authentication checks.

场景1：CEO诈骗/企业邮件入侵 邮件声称来自CEO，但Reply-To指向Gmail地址；SPF验证失败，因为发件IP未被伪造域授权；DKIM签名缺失；From域为相似域名（ceo-company.com vs company.com）。

场景2：凭证窃取钓鱼邮件 邮件包含显示为“login.microsoft.com”但实际指向相似域名的链接；附件是包含伪造登录页面和凭证窃取JavaScript的HTML文件；发件域注册于3天前。

场景3：通过附件投递恶意软件 邮件包含带宏的Office文档附件；发件域通过SPF验证，但账户已被入侵；DKIM签名有效（来自合法基础设施）；附件SHA-256值与VirusTotal上已知恶意软件匹配。

场景4：利用合法服务的鱼叉式钓鱼 攻击者使用合法邮件营销服务发送钓鱼邮件；SPF和DKIM验证通过，因为该服务已被授权；钓鱼行为存在于内容而非基础设施中，需要进行URL和内容分析而非邮件头认证检查。

Output Format

输出格式

Email Header Analysis Report:
  Subject:     "Urgent: Invoice Payment Required"
  From:        accounting@examp1e-corp.com (SPOOFED)
  Reply-To:    payments.urgent@gmail.com (MISMATCH)
  Return-Path: <bounce@mail-server.xyz>
  Date:        2024-01-15 09:23:45 UTC

  Delivery Path (4 hops):
    Hop 1: mail-server.xyz [203.0.113.45] -> relay1.isp.com
    Hop 2: relay1.isp.com -> mx.target-company.com
    Hop 3: mx.target-company.com -> internal-filter.target.com
    Hop 4: internal-filter.target.com -> mailbox

  Authentication:
    SPF:    FAIL (203.0.113.45 not authorized for examp1e-corp.com)
    DKIM:   NONE (no signature present)
    DMARC:  FAIL (p=none, no enforcement)

  Indicators of Phishing:
    - Lookalike domain (examp1e-corp.com vs example-corp.com, 96% similar)
    - From/Reply-To mismatch
    - Domain registered 2 days before email sent
    - URL in body points to credential harvesting page
    - Attachment: invoice.xlsm (SHA-256: a3f2...) - Known malware on VT

  Risk Level: HIGH

Email Header Analysis Report:
  Subject:     "Urgent: Invoice Payment Required"
  From:        accounting@examp1e-corp.com (SPOOFED)
  Reply-To:    payments.urgent@gmail.com (MISMATCH)
  Return-Path: <bounce@mail-server.xyz>
  Date:        2024-01-15 09:23:45 UTC

  Delivery Path (4 hops):
    Hop 1: mail-server.xyz [203.0.113.45] -> relay1.isp.com
    Hop 2: relay1.isp.com -> mx.target-company.com
    Hop 3: mx.target-company.com -> internal-filter.target.com
    Hop 4: internal-filter.target.com -> mailbox

  Authentication:
    SPF:    FAIL (203.0.113.45 not authorized for examp1e-corp.com)
    DKIM:   NONE (no signature present)
    DMARC:  FAIL (p=none, no enforcement)

  Indicators of Phishing:
    - Lookalike domain (examp1e-corp.com vs example-corp.com, 96% similar)
    - From/Reply-To mismatch
    - Domain registered 2 days before email sent
    - URL in body points to credential harvesting page
    - Attachment: invoice.xlsm (SHA-256: a3f2...) - Known malware on VT

  Risk Level: HIGH

analyzing-email-headers-for-phishing-investigation

Original

Translation

Analyzing Email Headers for Phishing Investigation

针对钓鱼邮件调查的邮件头分析

When to Use

适用场景

Prerequisites

前提条件

Workflow

工作流程

Step 1: Extract Raw Email Headers

步骤1：提取原始邮件头

Export from Outlook: Open email > File > Properties > Internet Headers

Export from Outlook: Open email > File > Properties > Internet Headers

Export from Gmail: Open email > Three dots > Show original

Export from Gmail: Open email > Three dots > Show original

Export from Thunderbird: View > Message Source

Export from Thunderbird: View > Message Source

If working with EML file from forensic image

If working with EML file from forensic image

If working with PST file, extract individual messages

If working with PST file, extract individual messages

Step 2: Parse the Email Header Chain

步骤2：解析邮件头链

Parse headers using Python email library

Parse headers using Python email library

Step 3: Validate SPF, DKIM, and DMARC Records

步骤3：验证SPF、DKIM和DMARC记录

Extract the envelope sender domain

Extract the envelope sender domain

Check SPF record

Check SPF record

Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"

Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"

Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")

Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")

Check DMARC record

Check DMARC record

Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"

Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"

Verify the sending IP against SPF

Verify the sending IP against SPF

Extract IP from first Received header

Extract IP from first Received header

Manual SPF check using python

Manual SPF check using python

Results: pass, fail, softfail, neutral, none, temperror, permerror

Results: pass, fail, softfail, neutral, none, temperror, permerror

Check if sending IP is in known malicious IP lists

Check if sending IP is in known malicious IP lists

Query AbuseIPDB or VirusTotal

Query AbuseIPDB or VirusTotal

Step 4: Analyze Sender Domain and Infrastructure

步骤4：分析发件人域和基础设施

WHOIS lookup on sender domain

WHOIS lookup on sender domain

Check domain age (recently registered domains are suspicious)

Check domain age (recently registered domains are suspicious)

DNS record investigation

DNS record investigation

Reverse DNS on sending IP

Reverse DNS on sending IP

Check for lookalike/typosquatting domains

Check for lookalike/typosquatting domains

Compare with legitimate domain using visual similarity

Compare with legitimate domain using visual similarity

Check domain reputation on VirusTotal

Check domain reputation on VirusTotal

Check if the Reply-To differs from From (common phishing indicator)

Check if the Reply-To differs from From (common phishing indicator)

Step 5: Examine Email Body and Attachments

步骤5：检查邮件正文和附件

Extract URLs from email body

Extract URLs from email body

Extract and hash attachments

Extract and hash attachments

Submit attachment hashes to VirusTotal

Submit attachment hashes to VirusTotal

Submit URLs to URLhaus or PhishTank for reputation check