analyzing-email-headers-for-phishing-investigation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAnalyzing Email Headers for Phishing Investigation
针对钓鱼邮件调查的邮件头分析
When to Use
适用场景
- When investigating a suspected phishing email to determine its true origin
- For verifying sender authenticity and detecting email spoofing
- During incident response when a user has clicked a phishing link
- When tracing the delivery path and relay servers of a suspicious email
- For validating SPF, DKIM, and DMARC alignment to identify forgery
- 调查疑似钓鱼邮件以确定其真实来源时
- 验证发件人真实性并检测邮件伪造时
- 用户点击钓鱼链接后的事件响应过程中
- 追踪可疑邮件的投递路径和中继服务器时
- 通过验证SPF、DKIM和DMARC一致性来识别邮件伪造时
Prerequisites
前提条件
- Raw email headers from the suspicious message (EML or MSG format)
- Understanding of SMTP protocol and email header fields
- Access to DNS lookup tools (dig, nslookup) for SPF/DKIM/DMARC verification
- Email header analysis tools (MHA, emailheaders.net concepts)
- Python with email parsing libraries for automated analysis
- Access to threat intelligence platforms for IP/domain reputation
- 可疑邮件的原始邮件头(EML或MSG格式)
- 了解SMTP协议和邮件头字段
- 可访问DNS查询工具(dig、nslookup)用于SPF/DKIM/DMARC验证
- 邮件头分析工具(MHA、emailheaders.net相关工具)
- 带有邮件解析库的Python环境,用于自动化分析
- 可访问威胁情报平台以查询IP/域名信誉
Workflow
工作流程
Step 1: Extract Raw Email Headers
步骤1:提取原始邮件头
bash
undefinedbash
undefinedExport from Outlook: Open email > File > Properties > Internet Headers
Export from Outlook: Open email > File > Properties > Internet Headers
Export from Gmail: Open email > Three dots > Show original
Export from Gmail: Open email > Three dots > Show original
Export from Thunderbird: View > Message Source
Export from Thunderbird: View > Message Source
If working with EML file from forensic image
If working with EML file from forensic image
cp /mnt/evidence/Users/suspect/AppData/Local/Microsoft/Outlook/phishing_email.eml
/cases/case-2024-001/email/
/cases/case-2024-001/email/
cp /mnt/evidence/Users/suspect/AppData/Local/Microsoft/Outlook/phishing_email.eml
/cases/case-2024-001/email/
/cases/case-2024-001/email/
If working with PST file, extract individual messages
If working with PST file, extract individual messages
pip install pypff
python3 << 'PYEOF'
import pypff
pst = pypff.file()
pst.open("/cases/case-2024-001/email/outlook.pst")
root = pst.get_root_folder()
def extract_messages(folder, path=""):
for i in range(folder.get_number_of_sub_messages()):
msg = folder.get_sub_message(i)
headers = msg.get_transport_headers()
subject = msg.get_subject()
if headers:
filename = f"/cases/case-2024-001/email/msg_{i}_{subject[:30]}.txt"
with open(filename, 'w') as f:
f.write(headers)
for i in range(folder.get_number_of_sub_folders()):
extract_messages(folder.get_sub_folder(i))
extract_messages(root)
PYEOF
undefinedpip install pypff
python3 << 'PYEOF'
import pypff
pst = pypff.file()
pst.open("/cases/case-2024-001/email/outlook.pst")
root = pst.get_root_folder()
def extract_messages(folder, path=""):
for i in range(folder.get_number_of_sub_messages()):
msg = folder.get_sub_message(i)
headers = msg.get_transport_headers()
subject = msg.get_subject()
if headers:
filename = f"/cases/case-2024-001/email/msg_{i}_{subject[:30]}.txt"
with open(filename, 'w') as f:
f.write(headers)
for i in range(folder.get_number_of_sub_folders()):
extract_messages(folder.get_sub_folder(i))
extract_messages(root)
PYEOF
undefinedStep 2: Parse the Email Header Chain
步骤2:解析邮件头链
bash
undefinedbash
undefinedParse headers using Python email library
Parse headers using Python email library
python3 << 'PYEOF'
import email
from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f:
msg = email.message_from_file(f, policy=policy.default)
print("=== KEY HEADER FIELDS ===")
print(f"From: {msg['From']}")
print(f"To: {msg['To']}")
print(f"Subject: {msg['Subject']}")
print(f"Date: {msg['Date']}")
print(f"Message-ID: {msg['Message-ID']}")
print(f"Reply-To: {msg['Reply-To']}")
print(f"Return-Path: {msg['Return-Path']}")
print(f"X-Mailer: {msg['X-Mailer']}")
print(f"X-Originating-IP: {msg['X-Originating-IP']}")
print("\n=== RECEIVED HEADERS (bottom-up = chronological) ===")
received_headers = msg.get_all('Received')
if received_headers:
for i, header in enumerate(reversed(received_headers)):
print(f"\nHop {i+1}: {header.strip()}")
print("\n=== AUTHENTICATION RESULTS ===")
auth_results = msg.get_all('Authentication-Results')
if auth_results:
for result in auth_results:
print(result)
print(f"\nARC-Authentication-Results: {msg.get('ARC-Authentication-Results', 'Not present')}")
print(f"Received-SPF: {msg.get('Received-SPF', 'Not present')}")
print(f"DKIM-Signature: {msg.get('DKIM-Signature', 'Not present')}")
PYEOF
undefinedpython3 << 'PYEOF'
import email
from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f:
msg = email.message_from_file(f, policy=policy.default)
print("=== KEY HEADER FIELDS ===")
print(f"From: {msg['From']}")
print(f"To: {msg['To']}")
print(f"Subject: {msg['Subject']}")
print(f"Date: {msg['Date']}")
print(f"Message-ID: {msg['Message-ID']}")
print(f"Reply-To: {msg['Reply-To']}")
print(f"Return-Path: {msg['Return-Path']}")
print(f"X-Mailer: {msg['X-Mailer']}")
print(f"X-Originating-IP: {msg['X-Originating-IP']}")
print("\n=== RECEIVED HEADERS (bottom-up = chronological) ===")
received_headers = msg.get_all('Received')
if received_headers:
for i, header in enumerate(reversed(received_headers)):
print(f"\nHop {i+1}: {header.strip()}")
print("\n=== AUTHENTICATION RESULTS ===")
auth_results = msg.get_all('Authentication-Results')
if auth_results:
for result in auth_results:
print(result)
print(f"\nARC-Authentication-Results: {msg.get('ARC-Authentication-Results', 'Not present')}")
print(f"Received-SPF: {msg.get('Received-SPF', 'Not present')}")
print(f"DKIM-Signature: {msg.get('DKIM-Signature', 'Not present')}")
PYEOF
undefinedStep 3: Validate SPF, DKIM, and DMARC Records
步骤3:验证SPF、DKIM和DMARC记录
bash
undefinedbash
undefinedExtract the envelope sender domain
Extract the envelope sender domain
SENDER_DOMAIN="example-corp.com"
SENDER_DOMAIN="example-corp.com"
Check SPF record
Check SPF record
dig TXT $SENDER_DOMAIN +short | grep "v=spf1"
dig TXT $SENDER_DOMAIN +short | grep "v=spf1"
Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"
Example: "v=spf1 include:_spf.google.com include:sendgrid.net ~all"
Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")
Check DKIM record (selector from DKIM-Signature header, e.g., "s=selector1")
DKIM_SELECTOR="selector1"
dig TXT ${DKIM_SELECTOR}._domainkey.${SENDER_DOMAIN} +short
DKIM_SELECTOR="selector1"
dig TXT ${DKIM_SELECTOR}._domainkey.${SENDER_DOMAIN} +short
Check DMARC record
Check DMARC record
dig TXT _dmarc.${SENDER_DOMAIN} +short
dig TXT _dmarc.${SENDER_DOMAIN} +short
Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"
Example: "v=DMARC1; p=reject; rua=mailto:dmarc@example-corp.com; pct=100"
Verify the sending IP against SPF
Verify the sending IP against SPF
Extract IP from first Received header
Extract IP from first Received header
SENDING_IP="203.0.113.45"
SENDING_IP="203.0.113.45"
Manual SPF check using python
Manual SPF check using python
python3 << 'PYEOF'
import spf # pip install pyspf
result, explanation = spf.check2(
i='203.0.113.45',
s='sender@example-corp.com',
h='mail.example-corp.com'
)
print(f"SPF Result: {result}")
print(f"Explanation: {explanation}")
python3 << 'PYEOF'
import spf # pip install pyspf
result, explanation = spf.check2(
i='203.0.113.45',
s='sender@example-corp.com',
h='mail.example-corp.com'
)
print(f"SPF Result: {result}")
print(f"Explanation: {explanation}")
Results: pass, fail, softfail, neutral, none, temperror, permerror
Results: pass, fail, softfail, neutral, none, temperror, permerror
PYEOF
PYEOF
Check if sending IP is in known malicious IP lists
Check if sending IP is in known malicious IP lists
Query AbuseIPDB or VirusTotal
Query AbuseIPDB or VirusTotal
curl -s "https://api.abuseipdb.com/api/v2/check?ipAddress=${SENDING_IP}"
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool
undefinedcurl -s "https://api.abuseipdb.com/api/v2/check?ipAddress=${SENDING_IP}"
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool
-H "Key: YOUR_API_KEY" -H "Accept: application/json" | python3 -m json.tool
undefinedStep 4: Analyze Sender Domain and Infrastructure
步骤4:分析发件人域和基础设施
bash
undefinedbash
undefinedWHOIS lookup on sender domain
WHOIS lookup on sender domain
whois $SENDER_DOMAIN | grep -iE '(registrar|creation|expiration|registrant|nameserver)'
whois $SENDER_DOMAIN | grep -iE '(registrar|creation|expiration|registrant|nameserver)'
Check domain age (recently registered domains are suspicious)
Check domain age (recently registered domains are suspicious)
DNS record investigation
DNS record investigation
dig A $SENDER_DOMAIN +short
dig MX $SENDER_DOMAIN +short
dig NS $SENDER_DOMAIN +short
dig A $SENDER_DOMAIN +short
dig MX $SENDER_DOMAIN +short
dig NS $SENDER_DOMAIN +short
Reverse DNS on sending IP
Reverse DNS on sending IP
dig -x $SENDING_IP +short
dig -x $SENDING_IP +short
Check for lookalike/typosquatting domains
Check for lookalike/typosquatting domains
Compare with legitimate domain using visual similarity
Compare with legitimate domain using visual similarity
python3 << 'PYEOF'
import Levenshtein # pip install python-Levenshtein
legitimate = "microsoft.com"
suspicious = "micr0soft.com"
distance = Levenshtein.distance(legitimate, suspicious)
ratio = Levenshtein.ratio(legitimate, suspicious)
print(f"Edit distance: {distance}")
print(f"Similarity ratio: {ratio:.2%}")
if ratio > 0.8:
print("WARNING: Likely typosquatting/lookalike domain!")
PYEOF
python3 << 'PYEOF'
import Levenshtein # pip install python-Levenshtein
legitimate = "microsoft.com"
suspicious = "micr0soft.com"
distance = Levenshtein.distance(legitimate, suspicious)
ratio = Levenshtein.ratio(legitimate, suspicious)
print(f"Edit distance: {distance}")
print(f"Similarity ratio: {ratio:.2%}")
if ratio > 0.8:
print("WARNING: Likely typosquatting/lookalike domain!")
PYEOF
Check domain reputation on VirusTotal
Check domain reputation on VirusTotal
curl -s "https://www.virustotal.com/api/v3/domains/${SENDER_DOMAIN}"
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool
curl -s "https://www.virustotal.com/api/v3/domains/${SENDER_DOMAIN}"
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool
-H "x-apikey: YOUR_VT_API_KEY" | python3 -m json.tool
Check if the Reply-To differs from From (common phishing indicator)
Check if the Reply-To differs from From (common phishing indicator)
python3 -c "
import email
with open('/cases/case-2024-001/email/phishing_email.eml') as f:
msg = email.message_from_file(f)
from_addr = email.utils.parseaddr(msg['From'])[1]
reply_to = email.utils.parseaddr(msg.get('Reply-To', msg['From']))[1]
if from_addr != reply_to:
print(f'WARNING: From ({from_addr}) != Reply-To ({reply_to})')
else:
print('From and Reply-To match')
"
undefinedpython3 -c "
import email
with open('/cases/case-2024-001/email/phishing_email.eml') as f:
msg = email.message_from_file(f)
from_addr = email.utils.parseaddr(msg['From'])[1]
reply_to = email.utils.parseaddr(msg.get('Reply-To', msg['From']))[1]
if from_addr != reply_to:
print(f'WARNING: From ({from_addr}) != Reply-To ({reply_to})')
else:
print('From and Reply-To match')
"
undefinedStep 5: Examine Email Body and Attachments
步骤5:检查邮件正文和附件
bash
undefinedbash
undefinedExtract URLs from email body
Extract URLs from email body
python3 << 'PYEOF'
import email
import re
from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f:
msg = email.message_from_file(f, policy=policy.default)
body = msg.get_body(preferencelist=('html', 'plain'))
if body:
content = body.get_content()
urls = re.findall(r'https?://[^\s<>"']+', content)
print("=== URLs FOUND IN EMAIL BODY ===")
for url in set(urls):
print(f" {url}")
# Check for URL obfuscation (display text != href)
href_pattern = re.findall(r'<a[^>]*href=["\']([^"\']+)["\'][^>]*>(.*?)</a>', content, re.DOTALL)
print("\n=== HYPERLINK ANALYSIS ===")
for href, text in href_pattern:
display_url = re.findall(r'https?://[^\s<]+', text)
if display_url and display_url[0] != href:
print(f" MISMATCH: Display='{display_url[0]}' -> Actual='{href}'")python3 << 'PYEOF'
import email
import re
from email import policy
with open('/cases/case-2024-001/email/phishing_email.eml', 'r') as f:
msg = email.message_from_file(f, policy=policy.default)
body = msg.get_body(preferencelist=('html', 'plain'))
if body:
content = body.get_content()
urls = re.findall(r'https?://[^\s<>"']+', content)
print("=== URLs FOUND IN EMAIL BODY ===")
for url in set(urls):
print(f" {url}")
# Check for URL obfuscation (display text != href)
href_pattern = re.findall(r'<a[^>]*href=["\']([^"\']+)["\'][^>]*>(.*?)</a>', content, re.DOTALL)
print("\n=== HYPERLINK ANALYSIS ===")
for href, text in href_pattern:
display_url = re.findall(r'https?://[^\s<]+', text)
if display_url and display_url[0] != href:
print(f" MISMATCH: Display='{display_url[0]}' -> Actual='{href}'")Extract and hash attachments
Extract and hash attachments
print("\n=== ATTACHMENTS ===")
for part in msg.walk():
if part.get_content_disposition() == 'attachment':
filename = part.get_filename()
content = part.get_payload(decode=True)
import hashlib
sha256 = hashlib.sha256(content).hexdigest()
print(f" File: {filename}, Size: {len(content)}, SHA-256: {sha256}")
with open(f'/cases/case-2024-001/email/attachments/{filename}', 'wb') as af:
af.write(content)
PYEOF
print("\n=== ATTACHMENTS ===")
for part in msg.walk():
if part.get_content_disposition() == 'attachment':
filename = part.get_filename()
content = part.get_payload(decode=True)
import hashlib
sha256 = hashlib.sha256(content).hexdigest()
print(f" File: {filename}, Size: {len(content)}, SHA-256: {sha256}")
with open(f'/cases/case-2024-001/email/attachments/{filename}', 'wb') as af:
af.write(content)
PYEOF
Submit attachment hashes to VirusTotal
Submit attachment hashes to VirusTotal
Submit URLs to URLhaus or PhishTank for reputation check
Submit URLs to URLhaus or PhishTank for reputation check
undefinedundefinedKey Concepts
核心概念
| Concept | Description |
|---|---|
| SPF (Sender Policy Framework) | DNS record specifying authorized mail servers for a domain |
| DKIM (DomainKeys Identified Mail) | Cryptographic signature verifying email content integrity |
| DMARC | Policy framework combining SPF and DKIM for sender authentication |
| Received headers | Server-added headers showing each hop in the delivery chain (read bottom to top) |
| Return-Path | Envelope sender address used for bounce messages; may differ from From |
| Message-ID | Unique identifier assigned by the originating mail server |
| X-Originating-IP | Original sender IP address (added by some mail services) |
| Header forgery | Attackers can forge From, Reply-To, and other headers but not Received chains |
| 概念 | 描述 |
|---|---|
| SPF (Sender Policy Framework) | 指定域授权邮件服务器的DNS记录 |
| DKIM (DomainKeys Identified Mail) | 用于验证邮件内容完整性的加密签名 |
| DMARC | 结合SPF和DKIM的发件人认证策略框架 |
| Received headers | 服务器添加的邮件头,显示投递链中的每个节点(从下往上阅读为时间顺序) |
| Return-Path | 用于退信的信封发件人地址,可能与From字段不同 |
| Message-ID | 由原始邮件服务器分配的唯一标识符 |
| X-Originating-IP | 原始发件人IP地址(部分邮件服务会添加) |
| Header forgery | 攻击者可伪造From、Reply-To等邮件头,但无法伪造Received链 |
Tools & Systems
工具与系统
| Tool | Purpose |
|---|---|
| MXToolbox | Online email header analyzer and DNS lookup |
| dig/nslookup | DNS record queries for SPF, DKIM, DMARC verification |
| pyspf | Python SPF record validation library |
| dkimpy | Python DKIM signature verification library |
| PhishTool | Specialized phishing email analysis platform |
| VirusTotal | URL and file reputation checking service |
| AbuseIPDB | IP address reputation database |
| whois | Domain registration information lookup |
| 工具 | 用途 |
|---|---|
| MXToolbox | 在线邮件头分析器和DNS查询工具 |
| dig/nslookup | 用于SPF、DKIM、DMARC验证的DNS记录查询工具 |
| pyspf | Python的SPF记录验证库 |
| dkimpy | Python的DKIM签名验证库 |
| PhishTool | 专业钓鱼邮件分析平台 |
| VirusTotal | URL和文件信誉检查服务 |
| AbuseIPDB | IP地址信誉数据库 |
| whois | 域名注册信息查询工具 |
Common Scenarios
常见场景
Scenario 1: CEO Fraud / Business Email Compromise
The email claims to be from the CEO but Reply-To points to a Gmail address, SPF fails because the sending IP is not authorized for the spoofed domain, DKIM is missing, and the From domain is a lookalike (ceo-company.com vs company.com).
Scenario 2: Credential Harvesting Phishing
Email contains a link that displays "login.microsoft.com" but href points to a lookalike domain, the attachment is an HTML file containing a fake login page with credential exfiltration JavaScript, the sending domain was registered 3 days ago.
Scenario 3: Malware Delivery via Attachment
Email with an Office document attachment containing macros, the sender domain passes SPF but the account was compromised, DKIM signature is valid (sent from legitimate infrastructure), attachment SHA-256 matches known malware on VirusTotal.
Scenario 4: Spear Phishing with Legitimate Service
Attacker uses a legitimate email marketing service to send phishing, SPF and DKIM pass because the service is authorized, the phishing is in the content not the infrastructure, requires URL and content analysis rather than header authentication checks.
场景1:CEO诈骗/企业邮件入侵
邮件声称来自CEO,但Reply-To指向Gmail地址;SPF验证失败,因为发件IP未被伪造域授权;DKIM签名缺失;From域为相似域名(ceo-company.com vs company.com)。
场景2:凭证窃取钓鱼邮件
邮件包含显示为“login.microsoft.com”但实际指向相似域名的链接;附件是包含伪造登录页面和凭证窃取JavaScript的HTML文件;发件域注册于3天前。
场景3:通过附件投递恶意软件
邮件包含带宏的Office文档附件;发件域通过SPF验证,但账户已被入侵;DKIM签名有效(来自合法基础设施);附件SHA-256值与VirusTotal上已知恶意软件匹配。
场景4:利用合法服务的鱼叉式钓鱼
攻击者使用合法邮件营销服务发送钓鱼邮件;SPF和DKIM验证通过,因为该服务已被授权;钓鱼行为存在于内容而非基础设施中,需要进行URL和内容分析而非邮件头认证检查。
Output Format
输出格式
Email Header Analysis Report:
Subject: "Urgent: Invoice Payment Required"
From: accounting@examp1e-corp.com (SPOOFED)
Reply-To: payments.urgent@gmail.com (MISMATCH)
Return-Path: <bounce@mail-server.xyz>
Date: 2024-01-15 09:23:45 UTC
Delivery Path (4 hops):
Hop 1: mail-server.xyz [203.0.113.45] -> relay1.isp.com
Hop 2: relay1.isp.com -> mx.target-company.com
Hop 3: mx.target-company.com -> internal-filter.target.com
Hop 4: internal-filter.target.com -> mailbox
Authentication:
SPF: FAIL (203.0.113.45 not authorized for examp1e-corp.com)
DKIM: NONE (no signature present)
DMARC: FAIL (p=none, no enforcement)
Indicators of Phishing:
- Lookalike domain (examp1e-corp.com vs example-corp.com, 96% similar)
- From/Reply-To mismatch
- Domain registered 2 days before email sent
- URL in body points to credential harvesting page
- Attachment: invoice.xlsm (SHA-256: a3f2...) - Known malware on VT
Risk Level: HIGHEmail Header Analysis Report:
Subject: "Urgent: Invoice Payment Required"
From: accounting@examp1e-corp.com (SPOOFED)
Reply-To: payments.urgent@gmail.com (MISMATCH)
Return-Path: <bounce@mail-server.xyz>
Date: 2024-01-15 09:23:45 UTC
Delivery Path (4 hops):
Hop 1: mail-server.xyz [203.0.113.45] -> relay1.isp.com
Hop 2: relay1.isp.com -> mx.target-company.com
Hop 3: mx.target-company.com -> internal-filter.target.com
Hop 4: internal-filter.target.com -> mailbox
Authentication:
SPF: FAIL (203.0.113.45 not authorized for examp1e-corp.com)
DKIM: NONE (no signature present)
DMARC: FAIL (p=none, no enforcement)
Indicators of Phishing:
- Lookalike domain (examp1e-corp.com vs example-corp.com, 96% similar)
- From/Reply-To mismatch
- Domain registered 2 days before email sent
- URL in body points to credential harvesting page
- Attachment: invoice.xlsm (SHA-256: a3f2...) - Known malware on VT
Risk Level: HIGH