email-imap-full-fetch
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEmail IMAP Full Fetch
邮件IMAP完整获取
Core Goal
核心目标
- Fetch one target email by stable message reference from IMAP.
- Enforce lookup order: exact match first, then
HEADER Message-Idfallback.uid - Download full raw MIME via .
BODY.PEEK[] - Parse and return headers, full text body, html body, and attachment metadata.
- Save and attachment files to disk with filename safety and idempotent indexing.
.eml
- 通过稳定的邮件引用从IMAP获取目标邮件。
- 强制查找顺序:优先精确匹配,备选使用
HEADER Message-Id。uid - 通过下载完整的原始MIME内容。
BODY.PEEK[] - 解析并返回邮件标头、纯文本正文、HTML正文以及附件元数据。
- 将文件和附件保存到磁盘,确保文件名安全且支持幂等索引。
.eml
Standard Flow
标准流程
- Input must include from stage-1 routing output (
message_id_norm).mail_ref.message_id_norm - Use as the default path.
fetch --message-id "<message_id_norm or raw Message-Id>" - Use only when no usable message-id is available.
fetch --uid "<uid>" - Keep mailbox selection consistent with stage-1 (or
--mailbox).IMAP_MAILBOX - Read JSON output and continue downstream processing with returned .
mail_ref
- 输入必须包含来自第一阶段路由输出的(即
message_id_norm)。mail_ref.message_id_norm - 默认使用作为查找路径。
fetch --message-id "<message_id_norm或原始Message-Id>" - 仅在没有可用的message-id时,才使用。
fetch --uid "<uid>" - 保持邮箱选择与第一阶段一致(使用或环境变量
--mailbox)。IMAP_MAILBOX - 读取JSON输出,并使用返回的继续下游处理。
mail_ref
Commands
命令
Fetch by Message-Id (preferred):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>"Fetch by UID (fallback only):
bash
python3 scripts/imap_full_fetch.py fetch --uid "123456"Use both when needed (message-id lookup first, uid fallback second):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>" --uid "123456"按Message-Id获取(优先方式):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>"按UID获取(仅作为备选):
bash
python3 scripts/imap_full_fetch.py fetch --uid "123456"必要时同时使用两者(先按message-id查找,备选按uid查找):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>" --uid "123456"Output Contract
输出约定
- Output is a single JSON object.
- Required top-level fields:
mail_refheaderstext_plaintext_htmlattachmentssaved_eml_path
- contains:
mail_ref- ,
account,mailbox,uid,message_id_raw,message_id_normdate
- contains per-file metadata and persistence result:
attachments[]- ,
filename,content_type,bytes,disposition,saved_pathskipped_reason
- 输出为单个JSON对象。
- 必填顶级字段:
mail_refheaderstext_plaintext_htmlattachmentssaved_eml_path
- 包含:
mail_ref- ,
account,mailbox,uid,message_id_raw,message_id_normdate
- 包含每个文件的元数据和持久化结果:
attachments[]- ,
filename,content_type,bytes,disposition,saved_pathskipped_reason
Storage And Idempotency
存储与幂等性
- points to local
saved_eml_pathfile saved from.eml.BODY.PEEK[] - Attachments are saved without returning attachment binary content in JSON.
- Filenames are sanitized to remove path separators and unsafe characters.
- Duplicate attachment names are deduped with content-hash suffix.
- Repeated requests are idempotent by index and return existing persisted JSON record directly.
message_id_norm
- 指向从
saved_eml_path保存的本地BODY.PEEK[]文件。.eml - 附件将被保存,但不会在JSON中返回附件的二进制内容。
- 文件名会被清理,移除路径分隔符和不安全字符。
- 重复的附件名将通过内容哈希后缀进行去重。
- 基于索引,重复请求具有幂等性,会直接返回已持久化的现有JSON记录。
message_id_norm
Parameters
参数
- : primary lookup key.
--message-id - : fallback lookup key.
--uid - : mailbox to query (default
--mailboxorIMAP_MAILBOX).INBOX - : target dir for
--save-eml-dirfiles (env.eml).IMAP_FULL_SAVE_EML_DIR - : target dir for idempotency index JSON files (env
--index-dir, defaultIMAP_FULL_INDEX_DIR).<save-eml-dir>/.index - : target dir for attachments (env
--save-attachments-dir).IMAP_FULL_SAVE_ATTACHMENTS_DIR - : max saved attachment size (env
--max-attachment-bytes).IMAP_FULL_MAX_ATTACHMENT_BYTES - : allowed attachment extensions, comma-separated (env
--allow-ext).IMAP_FULL_ALLOW_EXT - : IMAP connect timeout seconds (default from
--connect-timeout).IMAP_CONNECT_TIMEOUT
- :主要查找键。
--message-id - :备选查找键。
--uid - :要查询的邮箱(默认使用环境变量
--mailbox或IMAP_MAILBOX)。INBOX - :
--save-eml-dir文件的目标存储目录(环境变量.eml)。IMAP_FULL_SAVE_EML_DIR - :幂等性索引JSON文件的目标目录(环境变量
--index-dir,默认值为IMAP_FULL_INDEX_DIR)。<save-eml-dir>/.index - :附件的目标存储目录(环境变量
--save-attachments-dir)。IMAP_FULL_SAVE_ATTACHMENTS_DIR - :允许保存的附件最大大小(环境变量
--max-attachment-bytes)。IMAP_FULL_MAX_ATTACHMENT_BYTES - :允许的附件扩展名,以逗号分隔(环境变量
--allow-ext)。IMAP_FULL_ALLOW_EXT - :IMAP连接超时时间(秒,默认值来自环境变量
--connect-timeout)。IMAP_CONNECT_TIMEOUT
Required Environment
必要环境变量
IMAP_HOSTIMAP_USERNAMEIMAP_PASSWORD
Optional account defaults:
IMAP_NAMEIMAP_PORTIMAP_SSLIMAP_MAILBOX
IMAP_HOSTIMAP_USERNAMEIMAP_PASSWORD
可选的账户默认值:
IMAP_NAMEIMAP_PORTIMAP_SSLIMAP_MAILBOX
Scripts
脚本
scripts/imap_full_fetch.py
scripts/imap_full_fetch.py