pp-archive-is
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesearchive.today — Printing Press CLI
archive.today — Printing Press CLI
Prerequisites: Install the CLI
前置条件:安装CLI
This skill drives the binary. You must verify the CLI is installed before invoking any command from this skill. If it is missing, install it first:
archive-is-pp-cli- Install via the Printing Press installer:
bash
npx -y @mvanhorn/printing-press install archive-is --cli-only - Verify:
archive-is-pp-cli --version - Ensure (or
$GOPATH/bin) is on$HOME/go/bin.$PATH
If the install fails (no Node, offline, etc.), fall back to a direct Go install (requires Go 1.23+):
npxbash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latestIf reports "command not found" after install, the install step did not put the binary on . Do not proceed with skill commands until verification succeeds.
--version$PATH此技能基于二进制文件运行。在调用此技能的任何命令前,必须确认CLI已安装。如果未安装,请先执行以下步骤:
archive-is-pp-cli- 通过Printing Press安装器安装:
bash
npx -y @mvanhorn/printing-press install archive-is --cli-only - 验证安装:
archive-is-pp-cli --version - 确保(或
$GOPATH/bin)已添加到$HOME/go/bin环境变量中。$PATH
如果安装失败(无Node环境、离线等),可改用Go直接安装(需要Go 1.23及以上版本):
npxbash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest如果安装后执行提示“command not found”,说明安装未将二进制文件添加到。请在验证成功前不要执行技能命令。
--version$PATHWhen to Use This CLI
何时使用此CLI
Reach for this whenever a user wants to archive a URL, read a paywalled article, check whether something was previously archived, or batch-capture a list of URLs for research. Specifically good when:
- A user sends a paywalled link and asks "can you read this" → fetches text via archive
read - They want to preserve a URL that might change → forces a fresh capture
save - They want historical versions → lists all known snapshots
history - They have 20+ URLs to archive → runs rate-limited batch archival
bulk
Don't reach for this if the URL is trivially scrapeable without archive services (no paywall, robots-allowed, direct HTTP works), or if the user wants the original source rather than a cached version.
当用户需要归档URL、阅读付费文章、检查内容是否已归档,或者批量捕获URL列表用于研究时,均可使用此工具。尤其适用于以下场景:
- 用户发送付费链接并询问“你能读取这个内容吗” → 使用命令通过归档服务获取文本
read - 用户想要保存可能会变更的URL → 使用命令强制生成新的归档快照
save - 用户需要查看历史版本 → 使用命令列出所有已知快照
history - 用户有20个以上URL需要归档 → 使用命令进行限流批量归档
bulk
如果URL无需归档服务即可轻松抓取(无付费墙、允许爬虫、直接HTTP访问有效),或者用户需要原始源而非缓存版本,则无需使用此工具。
Unique Capabilities
独特功能
The whole CLI is unique — archive.today has no official API. But within this CLI, certain commands are the differentiators.
整个CLI工具都是独特的——archive.today没有官方API。但在这个CLI中,某些命令是核心差异化功能。
The hero commands
核心命令
-
— Find or create an archive for a URL. Looks up existing snapshots first (Memento timegate → CDX fallback); submits a fresh capture only if nothing exists. The "always do the right thing" command.
read <url>This is how 90% of agent calls should start. It's idempotent — calling it twice on the same URL doesn't double-submit. -
/
get <url> [--format text|html]— Fetch article text, optionally LLM-summarized. Automatic Wayback fallback when archive.today serves a CAPTCHA (which happens daily to cloud IPs).tldr <url>pipes the fetched text through a summarization step — useful for agent chains where you want a short take without shipping 20KB of HTML back.tldr
-
— 查找或创建URL的归档。先查询现有快照(Memento timegate → CDX备用);仅当无现有快照时才提交新的捕获请求。这是“始终做正确的事”的命令。
read <url>90%的Agent调用都应从此命令开始。它具有幂等性——对同一URL调用两次不会重复提交。 -
/
get <url> [--format text|html]— 获取文章文本,可选LLM摘要功能。当archive.today出现验证码(云IP每天都会遇到)时,会自动切换到Wayback Machine作为备用。tldr <url>命令会将获取到的文本传入摘要步骤——在Agent链式调用中非常有用,无需返回20KB的HTML内容,只需简短摘要。tldr
Durability operations
持久化操作
-
— Force a fresh capture via
save <url>. Use when/submit/?url=<x>&anyway=1returns an existing snapshot that's too old or missing a paywall update.read -
— List all known snapshots via Memento timemap parsing. Shows every capture date across both archive.today and Wayback.
history <url> -
— Rate-limited batch archiving from a file or stdin. Reads URLs one per line, submits each with backoff, returns a report of successes / failures / pre-existing.
bulk [file]archives every URL in a markdown file.grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk - -
— Fire-and-forget submit with optional wait+poll. Useful for long captures where you want to come back later.
request <url>
-
— 通过
save <url>强制生成新的捕获快照。当/submit/?url=<x>&anyway=1返回的现有快照过旧或未包含付费墙更新内容时使用。read -
— 通过解析Memento timemap列出所有已知快照。显示archive.today和Wayback Machine中的所有捕获日期。
history <url> -
— 从文件或标准输入进行限流批量归档。逐行读取URL,提交时带有退避机制,返回成功/失败/已存在的报告。
bulk [file]可归档markdown文件中的所有URL。grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk - -
— 提交后无需等待的异步请求,可选等待+轮询。适用于需要稍后查看结果的长时捕获任务。
request <url>
Observability
可观测性
-
— Just the newest snapshot URL for a target, useful in scripts.
snapshots newest <url> -
— List your local capture index (post-sync).
captures -
— archive.today's global recent-archives feed.
feeds -
— Every read/get accepts a backend preference. Defaults to archive-is with Wayback fallback; flip the order for Wayback-primary.
--backend archive-is,wayback
-
— 仅返回目标URL的最新快照链接,适用于脚本。
snapshots newest <url> -
— 列出本地捕获索引(同步后)。
captures -
— archive.today的全局近期归档订阅源。
feeds -
— 所有read/get命令都支持指定后端偏好。默认优先使用archive-is,Wayback作为备用;可调整顺序优先使用Wayback。
--backend archive-is,wayback
Command Reference
命令参考
Archive + retrieve:
- — Find or create (hero command)
archive-is-pp-cli read <url> - — Fetch article text (with Wayback fallback)
archive-is-pp-cli get <url> - — Fetch + summarize
archive-is-pp-cli tldr <url> - — Force fresh capture
archive-is-pp-cli save <url> - — Fire-and-forget submit
archive-is-pp-cli request <url> - — Does an archive exist?
archive-is-pp-cli check <url>
Listing + history:
- — All known snapshots
archive-is-pp-cli history <url> - — Newest snapshot URL
archive-is-pp-cli newest <url> - — Local capture index
archive-is-pp-cli captures - — Global recent feed
archive-is-pp-cli feeds
Batch:
- — Batch from file or stdin
archive-is-pp-cli bulk [file]
Local store:
- /
archive-is-pp-cli sync/archive/export— Local SQLite opsimport
Auth + health:
- — Config (no API key needed; auth is a no-op)
archive-is-pp-cli auth - — Verify backend reachability
archive-is-pp-cli doctor
归档与检索:
- — 查找或创建归档(核心命令)
archive-is-pp-cli read <url> - — 获取文章文本(支持Wayback备用)
archive-is-pp-cli get <url> - — 获取文本并生成摘要
archive-is-pp-cli tldr <url> - — 强制生成新捕获快照
archive-is-pp-cli save <url> - — 异步提交请求
archive-is-pp-cli request <url> - — 检查是否存在归档
archive-is-pp-cli check <url>
列表与历史:
- — 所有已知快照
archive-is-pp-cli history <url> - — 最新快照链接
archive-is-pp-cli newest <url> - — 本地捕获索引
archive-is-pp-cli captures - — 全局近期订阅源
archive-is-pp-cli feeds
批量操作:
- — 从文件或标准输入批量归档
archive-is-pp-cli bulk [file]
本地存储:
- /
archive-is-pp-cli sync/archive/export— 本地SQLite操作import
认证与健康检查:
- — 配置(无需API密钥;此命令无实际操作)
archive-is-pp-cli auth - — 验证后端服务可达性
archive-is-pp-cli doctor
Recipes
使用示例
Read a paywalled article
阅读付费文章
bash
archive-is-pp-cli read "https://www.wsj.com/articles/..." --agentbash
archive-is-pp-cli read "https://www.wsj.com/articles/..." --agentor: return just the text
或者:仅返回文本
archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent
`read` returns the archive URL (finding existing or creating new). `get --format text` returns the article body, falling back to Wayback if archive.today CAPTCHAs.archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent
`read`命令返回归档链接(查找现有或创建新归档)。`get --format text`命令返回文章正文,当archive.today出现验证码时会自动切换到Wayback。Preserve a URL before it changes
在URL变更前保存归档
bash
archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent # verifyForce capture, then check history to confirm the new snapshot registered.
bash
archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent # 验证强制生成捕获快照,然后通过history命令确认新快照已记录。
Bulk archive a research batch
批量归档研究用URL
bash
grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agentbash
grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agentor from a file:
或者从文件读取:
archive-is-pp-cli bulk urls.txt --agent
Reads URLs one per line, submits each with exponential backoff, returns per-URL status (archived, pre-existing, failed) as JSON.archive-is-pp-cli bulk urls.txt --agent
逐行读取URL,使用指数退避机制提交每个请求,返回每个URL的状态(已归档、已存在、失败)JSON结果。Wayback-preferred for a reliable-read
优先使用Wayback以确保可靠读取
bash
archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agentUse when the Wayback Machine snapshot is known to be cleaner or archive.today is rate-limiting.
bash
archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agent当Wayback Machine的快照更清晰,或archive.today存在限流时使用此命令。
Auth Setup
认证设置
No API key required. Archive.today and Wayback Machine are both public. The subcommand exists for consistency but is a no-op — reports "Auth: not required" which is the expected state.
authdoctorOptional env:
- — override archive.today host (for mirrors)
ARCHIVE_IS_BASE_URL - — override Wayback Machine host
WAYBACK_BASE_URL
无需API密钥。archive.today和Wayback Machine都是公共服务。子命令仅为保持一致性而存在,无实际操作——命令会显示“Auth: not required”,这是预期状态。
authdoctor可选环境变量:
- — 覆盖archive.today的主机地址(用于镜像站点)
ARCHIVE_IS_BASE_URL - — 覆盖Wayback Machine的主机地址
WAYBACK_BASE_URL
Agent Mode
Agent模式
Add to any command. Expands to . Every action command also prints structured hints on stderr when called non-interactively — the calling agent sees "tried X, got Y, consider Z" automatically.
--agent--json --compact --no-input --no-color --yes --no-promptnext_actionsNotable flags:
- — max wait for a fresh submit (default
--submit-timeout <duration>;10m= unbounded)0 - — backend preference and fallback order
--backend archive-is,wayback - —
--format text|html/getoutput formattldr
在任何命令后添加参数。此参数等价于。当以非交互方式调用时,所有操作命令还会在标准错误输出中打印结构化的提示——调用Agent会自动看到“尝试了X,得到Y,建议考虑Z”的信息。
--agent--json --compact --no-input --no-color --yes --no-promptnext_actions重要参数:
- — 新提交请求的最大等待时间(默认
--submit-timeout <duration>;10m表示无限制)0 - — 后端偏好和备用顺序
--backend archive-is,wayback - —
--format text|html/get命令的输出格式tldr
Filtering output
输出过滤
--selectbash
archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.nameUse this to narrow huge payloads to the fields you actually need — critical for deeply nested API responses.
--selectbash
archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.name使用此参数可将庞大的响应缩小到实际需要的字段——这对深度嵌套的API响应至关重要。
Response envelope
响应信封
Data-layer commands wrap output in . Parse for data and to know whether it's or local. The summary is printed to stderr only when stdout is a TTY; piped/agent consumers see pure JSON on stdout.
{"meta": {...}, "results": <data>}.results.meta.sourceliveN results (live)数据层命令会将输出包装在中。解析获取数据,可了解数据是来自(在线服务)还是本地。当标准输出为终端时,摘要仅会打印到标准错误输出;管道/Agent消费者会在标准输出看到纯JSON内容。
{"meta": {...}, "results": <data>}.results.meta.sourceliveN results (live)Exit Codes
退出码
| Code | Meaning |
|---|---|
| 0 | Success |
| 2 | Usage error |
| 3 | Not found (no snapshot exists) |
| 5 | API error (archive.today or Wayback down) |
| 7 | Rate limited (too many submits) |
| 代码 | 含义 |
|---|---|
| 0 | 成功 |
| 2 | 使用错误 |
| 3 | 未找到(无快照存在) |
| 5 | API错误(archive.today或Wayback服务不可用) |
| 7 | 限流(提交次数过多) |
Installation
安装
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctorbash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctorMCP Server
MCP服务器
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcpbash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcpArgument Parsing
参数解析
Given :
$ARGUMENTS- Empty, , or
help→ run--helparchive-is-pp-cli --help - → CLI;
install→ MCPinstall mcp - Anything that looks like a URL, or "archive <url>" / "bypass paywall on <url>" → is the default — it's idempotent and covers the 90% case.
read <url> --agent - "bulk archive" / "archive these" → from stdin if URLs are pasted, else ask for the file path.
bulk
针对的处理逻辑:
$ARGUMENTS- 为空、或
help→ 执行--helparchive-is-pp-cli --help - → 安装CLI;
install→ 安装MCPinstall mcp - 任何类似URL的内容,或包含"archive <url>" / "bypass paywall on <url>" → 默认执行——此命令具有幂等性,覆盖90%的使用场景。
read <url> --agent - "bulk archive" / "archive these" → 如果已粘贴URL,则从标准输入执行命令;否则询问文件路径。
bulk
Agent Workflow Features
Agent工作流特性
This CLI exposes three shared agent-workflow capabilities patched in from cli-printing-press PR #218.
此CLI提供了三个从cli-printing-press PR #218中引入的通用Agent工作流功能。
Named profiles
命名配置文件
Persist a set of flags under a name and reuse them across invocations.
bash
undefined将一组参数保存为命名配置文件,可在多次调用中复用。
bash
undefinedSave the current non-default flags as a named profile
将当前非默认参数保存为命名配置文件
archive-is-pp-cli profile save <name>
archive-is-pp-cli profile save <name>
Use a profile — overlays its values onto any flag you don't set explicitly
使用配置文件——会覆盖未显式设置的参数
archive-is-pp-cli --profile <name> <command>
archive-is-pp-cli --profile <name> <command>
List / inspect / remove
列出/查看/删除配置文件
archive-is-pp-cli profile list
archive-is-pp-cli profile show <name>
archive-is-pp-cli profile delete <name> --yes
Flag precedence: explicit flag > env var > profile > default.archive-is-pp-cli profile list
archive-is-pp-cli profile show <name>
archive-is-pp-cli profile delete <name> --yes
参数优先级:显式参数 > 环境变量 > 配置文件 > 默认值。--deliver
--deliver
Route command output to a sink other than stdout. Useful when an agent needs to hand a result to a file, a webhook, or another process without plumbing.
bash
archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/inFile sinks write atomically (tmp + rename). Webhook sinks POST (or when is set). Unknown schemes produce a structured refusal listing the supported set.
application/jsonapplication/x-ndjson--compact将命令输出路由到标准输出以外的目标。当Agent需要将结果发送到文件、webhook或其他进程而无需额外处理时非常有用。
bash
archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/in文件目标会原子写入(先写入临时文件再重命名)。Webhook目标会POST (当设置时为)。未知协议会返回结构化的拒绝信息,列出支持的协议类型。
application/json--compactapplication/x-ndjsonfeedback
feedback
Record in-band feedback about this CLI from the agent side of the loop. Local-only by default; safe to call without configuration.
bash
archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list # show local entries
archive-is-pp-cli feedback clear --yes # wipeEntries append to as JSON lines. When is set and either is passed or , the entry is also POSTed upstream (non-blocking — local write always succeeds).
~/.archive-is-pp-cli/feedback.jsonlARCHIVE_IS_FEEDBACK_ENDPOINT--sendARCHIVE_IS_FEEDBACK_AUTO_SEND=true从Agent侧记录关于此CLI的反馈。默认仅本地存储;无需配置即可安全调用。
bash
archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list # 显示本地记录
archive-is-pp-cli feedback clear --yes # 清空记录记录会以JSON行格式追加到。当设置且传递参数或设置时,记录还会被POST到上游服务(非阻塞——本地写入始终成功)。
~/.archive-is-pp-cli/feedback.jsonlARCHIVE_IS_FEEDBACK_ENDPOINT--sendARCHIVE_IS_FEEDBACK_AUTO_SEND=true