pp-archive-is

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

archive.today — Printing Press CLI

Prerequisites: Install the CLI

前置条件：安装CLI

This skill drives the

archive-is-pp-cli

binary. You must verify the CLI is installed before invoking any command from this skill. If it is missing, install it first:

Install via the Printing Press installer:

bash

npx -y @mvanhorn/printing-press install archive-is --cli-only

Verify:
```
archive-is-pp-cli --version
```
Ensure
```
$GOPATH/bin
```
(or
```
$HOME/go/bin
```
) is on
```
$PATH
```
.

If the

npx

install fails (no Node, offline, etc.), fall back to a direct Go install (requires Go 1.23+):

bash

go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest

--version

reports "command not found" after install, the install step did not put the binary on

$PATH

. Do not proceed with skill commands until verification succeeds.

此技能基于

archive-is-pp-cli

二进制文件运行。在调用此技能的任何命令前，必须确认CLI已安装。如果未安装，请先执行以下步骤：

通过Printing Press安装器安装：

bash

npx -y @mvanhorn/printing-press install archive-is --cli-only

验证安装：
```
archive-is-pp-cli --version
```
确保
```
$GOPATH/bin
```
（或
```
$HOME/go/bin
```
）已添加到
```
$PATH
```
环境变量中。

如果

npx

安装失败（无Node环境、离线等），可改用Go直接安装（需要Go 1.23及以上版本）：

bash

go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest

如果安装后执行

--version

提示“command not found”，说明安装未将二进制文件添加到

$PATH

。请在验证成功前不要执行技能命令。

When to Use This CLI

何时使用此CLI

Reach for this whenever a user wants to archive a URL, read a paywalled article, check whether something was previously archived, or batch-capture a list of URLs for research. Specifically good when:

A user sends a paywalled link and asks "can you read this" →
```
read
```
fetches text via archive
They want to preserve a URL that might change →
```
save
```
forces a fresh capture
They want historical versions →
```
history
```
lists all known snapshots
They have 20+ URLs to archive →
```
bulk
```
runs rate-limited batch archival

Don't reach for this if the URL is trivially scrapeable without archive services (no paywall, robots-allowed, direct HTTP works), or if the user wants the original source rather than a cached version.

当用户需要归档URL、阅读付费文章、检查内容是否已归档，或者批量捕获URL列表用于研究时，均可使用此工具。尤其适用于以下场景：

用户发送付费链接并询问“你能读取这个内容吗” → 使用
```
read
```
命令通过归档服务获取文本
用户想要保存可能会变更的URL → 使用
```
save
```
命令强制生成新的归档快照
用户需要查看历史版本 → 使用
```
history
```
命令列出所有已知快照
用户有20个以上URL需要归档 → 使用
```
bulk
```
命令进行限流批量归档

如果URL无需归档服务即可轻松抓取（无付费墙、允许爬虫、直接HTTP访问有效），或者用户需要原始源而非缓存版本，则无需使用此工具。

Unique Capabilities

独特功能

The whole CLI is unique — archive.today has no official API. But within this CLI, certain commands are the differentiators.

整个CLI工具都是独特的——archive.today没有官方API。但在这个CLI中，某些命令是核心差异化功能。

The hero commands

核心命令

read <url>
— Find or create an archive for a URL. Looks up existing snapshots first (Memento timegate → CDX fallback); submits a fresh capture only if nothing exists. The "always do the right thing" command.
This is how 90% of agent calls should start. It's idempotent — calling it twice on the same URL doesn't double-submit.
get <url> [--format text|html]
/
tldr <url>
— Fetch article text, optionally LLM-summarized. Automatic Wayback fallback when archive.today serves a CAPTCHA (which happens daily to cloud IPs).
tldr
pipes the fetched text through a summarization step — useful for agent chains where you want a short take without shipping 20KB of HTML back.

read <url>
— 查找或创建URL的归档。先查询现有快照（Memento timegate → CDX备用）；仅当无现有快照时才提交新的捕获请求。这是“始终做正确的事”的命令。
90%的Agent调用都应从此命令开始。它具有幂等性——对同一URL调用两次不会重复提交。
get <url> [--format text|html]
/
tldr <url>
— 获取文章文本，可选LLM摘要功能。当archive.today出现验证码（云IP每天都会遇到）时，会自动切换到Wayback Machine作为备用。
tldr
命令会将获取到的文本传入摘要步骤——在Agent链式调用中非常有用，无需返回20KB的HTML内容，只需简短摘要。

Durability operations

持久化操作

save <url>
— Force a fresh capture via
```
/submit/?url=<x>&anyway=1
```
. Use when
```
read
```
returns an existing snapshot that's too old or missing a paywall update.
history <url>
— List all known snapshots via Memento timemap parsing. Shows every capture date across both archive.today and Wayback.
bulk [file]
— Rate-limited batch archiving from a file or stdin. Reads URLs one per line, submits each with backoff, returns a report of successes / failures / pre-existing.
grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk -
archives every URL in a markdown file.
request <url>
— Fire-and-forget submit with optional wait+poll. Useful for long captures where you want to come back later.

save <url>
— 通过
```
/submit/?url=<x>&anyway=1
```
强制生成新的捕获快照。当
```
read
```
返回的现有快照过旧或未包含付费墙更新内容时使用。
history <url>
— 通过解析Memento timemap列出所有已知快照。显示archive.today和Wayback Machine中的所有捕获日期。
bulk [file]
— 从文件或标准输入进行限流批量归档。逐行读取URL，提交时带有退避机制，返回成功/失败/已存在的报告。
grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk -
可归档markdown文件中的所有URL。
request <url>
— 提交后无需等待的异步请求，可选等待+轮询。适用于需要稍后查看结果的长时捕获任务。

Observability

可观测性

snapshots newest <url>
— Just the newest snapshot URL for a target, useful in scripts.
captures
— List your local capture index (post-sync).
feeds
— archive.today's global recent-archives feed.
--backend archive-is,wayback
— Every read/get accepts a backend preference. Defaults to archive-is with Wayback fallback; flip the order for Wayback-primary.

snapshots newest <url>
— 仅返回目标URL的最新快照链接，适用于脚本。
captures
— 列出本地捕获索引（同步后）。
feeds
— archive.today的全局近期归档订阅源。
--backend archive-is,wayback
— 所有read/get命令都支持指定后端偏好。默认优先使用archive-is，Wayback作为备用；可调整顺序优先使用Wayback。

Command Reference

命令参考

Archive + retrieve:

```
archive-is-pp-cli read <url>
```
— Find or create (hero command)
```
archive-is-pp-cli get <url>
```
— Fetch article text (with Wayback fallback)
```
archive-is-pp-cli tldr <url>
```
— Fetch + summarize
```
archive-is-pp-cli save <url>
```
— Force fresh capture
```
archive-is-pp-cli request <url>
```
— Fire-and-forget submit
```
archive-is-pp-cli check <url>
```
— Does an archive exist?

Listing + history:

```
archive-is-pp-cli history <url>
```
— All known snapshots
```
archive-is-pp-cli newest <url>
```
— Newest snapshot URL
```
archive-is-pp-cli captures
```
— Local capture index
```
archive-is-pp-cli feeds
```
— Global recent feed

Batch:

```
archive-is-pp-cli bulk [file]
```
— Batch from file or stdin

Local store:

archive-is-pp-cli sync

archive

export

import

— Local SQLite ops

Auth + health:

```
archive-is-pp-cli auth
```
— Config (no API key needed; auth is a no-op)
```
archive-is-pp-cli doctor
```
— Verify backend reachability

归档与检索：

```
archive-is-pp-cli read <url>
```
— 查找或创建归档（核心命令）
```
archive-is-pp-cli get <url>
```
— 获取文章文本（支持Wayback备用）
```
archive-is-pp-cli tldr <url>
```
— 获取文本并生成摘要
```
archive-is-pp-cli save <url>
```
— 强制生成新捕获快照
```
archive-is-pp-cli request <url>
```
— 异步提交请求
```
archive-is-pp-cli check <url>
```
— 检查是否存在归档

列表与历史：

```
archive-is-pp-cli history <url>
```
— 所有已知快照
```
archive-is-pp-cli newest <url>
```
— 最新快照链接
```
archive-is-pp-cli captures
```
— 本地捕获索引
```
archive-is-pp-cli feeds
```
— 全局近期订阅源

批量操作：

```
archive-is-pp-cli bulk [file]
```
— 从文件或标准输入批量归档

本地存储：

archive-is-pp-cli sync

archive

export

import

— 本地SQLite操作

认证与健康检查：

```
archive-is-pp-cli auth
```
— 配置（无需API密钥；此命令无实际操作）
```
archive-is-pp-cli doctor
```
— 验证后端服务可达性

Recipes

使用示例

Read a paywalled article

阅读付费文章

bash

archive-is-pp-cli read "https://www.wsj.com/articles/..." --agent

bash

archive-is-pp-cli read "https://www.wsj.com/articles/..." --agent

or: return just the text

或者：仅返回文本

archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent


`read` returns the archive URL (finding existing or creating new). `get --format text` returns the article body, falling back to Wayback if archive.today CAPTCHAs.

archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent


`read`命令返回归档链接（查找现有或创建新归档）。`get --format text`命令返回文章正文，当archive.today出现验证码时会自动切换到Wayback。

Preserve a URL before it changes

在URL变更前保存归档

bash

archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent  # verify

Force capture, then check history to confirm the new snapshot registered.

bash

archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent  # 验证

强制生成捕获快照，然后通过history命令确认新快照已记录。

Bulk archive a research batch

批量归档研究用URL

bash

grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agent

bash

grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agent

or from a file:

或者从文件读取：

archive-is-pp-cli bulk urls.txt --agent


Reads URLs one per line, submits each with exponential backoff, returns per-URL status (archived, pre-existing, failed) as JSON.

archive-is-pp-cli bulk urls.txt --agent


逐行读取URL，使用指数退避机制提交每个请求，返回每个URL的状态（已归档、已存在、失败）JSON结果。

Wayback-preferred for a reliable-read

优先使用Wayback以确保可靠读取

bash

archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agent

Use when the Wayback Machine snapshot is known to be cleaner or archive.today is rate-limiting.

bash

archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agent

当Wayback Machine的快照更清晰，或archive.today存在限流时使用此命令。

Auth Setup

认证设置

No API key required. Archive.today and Wayback Machine are both public. The

auth

subcommand exists for consistency but is a no-op —

doctor

reports "Auth: not required" which is the expected state.

Optional env:

```
ARCHIVE_IS_BASE_URL
```
— override archive.today host (for mirrors)
```
WAYBACK_BASE_URL
```
— override Wayback Machine host

无需API密钥。archive.today和Wayback Machine都是公共服务。

auth

子命令仅为保持一致性而存在，无实际操作——

doctor

命令会显示“Auth: not required”，这是预期状态。

可选环境变量：

```
ARCHIVE_IS_BASE_URL
```
— 覆盖archive.today的主机地址（用于镜像站点）
```
WAYBACK_BASE_URL
```
— 覆盖Wayback Machine的主机地址

Agent Mode

Agent模式

Add

--agent

to any command. Expands to

--json --compact --no-input --no-color --yes --no-prompt

. Every action command also prints structured

next_actions

hints on stderr when called non-interactively — the calling agent sees "tried X, got Y, consider Z" automatically.

Notable flags:

```
--submit-timeout <duration>
```
— max wait for a fresh submit (default
```
10m
```
;
```
0
```
= unbounded)
```
--backend archive-is,wayback
```
— backend preference and fallback order
```
--format text|html
```
—
```
get
```
/
```
tldr
```
output format

在任何命令后添加

--agent

参数。此参数等价于

--json --compact --no-input --no-color --yes --no-prompt

。当以非交互方式调用时，所有操作命令还会在标准错误输出中打印结构化的

next_actions

提示——调用Agent会自动看到“尝试了X，得到Y，建议考虑Z”的信息。

重要参数：

```
--submit-timeout <duration>
```
— 新提交请求的最大等待时间（默认
```
10m
```
；
```
0
```
表示无限制）
```
--backend archive-is,wayback
```
— 后端偏好和备用顺序
```
--format text|html
```
—
```
get
```
/
```
tldr
```
命令的输出格式

Filtering output

输出过滤

--select

accepts dotted paths to descend into nested responses; arrays traverse element-wise:

bash

archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.name

Use this to narrow huge payloads to the fields you actually need — critical for deeply nested API responses.

--select

参数接受点路径以深入嵌套响应；数组会遍历每个元素：

bash

archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.name

使用此参数可将庞大的响应缩小到实际需要的字段——这对深度嵌套的API响应至关重要。

Response envelope

响应信封

Data-layer commands wrap output in

{"meta": {...}, "results": <data>}

. Parse

.results

for data and

.meta.source

to know whether it's

live

or local. The

N results (live)

summary is printed to stderr only when stdout is a TTY; piped/agent consumers see pure JSON on stdout.

数据层命令会将输出包装在

{"meta": {...}, "results": <data>}

中。解析

.results

获取数据，

.meta.source

可了解数据是来自

live

（在线服务）还是本地。当标准输出为终端时，

N results (live)

摘要仅会打印到标准错误输出；管道/Agent消费者会在标准输出看到纯JSON内容。

Exit Codes

退出码

Code	Meaning
0	Success
2	Usage error
3	Not found (no snapshot exists)
5	API error (archive.today or Wayback down)
7	Rate limited (too many submits)

代码	含义
0	成功
2	使用错误
3	未找到（无快照存在）
5	API错误（archive.today或Wayback服务不可用）
7	限流（提交次数过多）

Installation

安装

bash

go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctor

bash

go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctor

MCP Server

MCP服务器

bash

go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcp

bash

go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcp

Argument Parsing

参数解析

Given

$ARGUMENTS

Empty,
help
, or
--help
→ run
```
archive-is-pp-cli --help
```
install
→ CLI; install mcp
→ MCP
Anything that looks like a URL, or "archive <url>" / "bypass paywall on <url>" →
```
read <url> --agent
```
is the default — it's idempotent and covers the 90% case.
"bulk archive" / "archive these" →
```
bulk
```
from stdin if URLs are pasted, else ask for the file path.

针对

$ARGUMENTS

的处理逻辑：

为空、
help
或
--help
→ 执行
```
archive-is-pp-cli --help
```
install
→ 安装CLI；install mcp
→ 安装MCP
任何类似URL的内容，或包含"archive <url>" / "bypass paywall on <url>" → 默认执行
```
read <url> --agent
```
——此命令具有幂等性，覆盖90%的使用场景。
"bulk archive" / "archive these" → 如果已粘贴URL，则从标准输入执行
```
bulk
```
命令；否则询问文件路径。

Agent Workflow Features

Agent工作流特性

This CLI exposes three shared agent-workflow capabilities patched in from cli-printing-press PR #218.

此CLI提供了三个从cli-printing-press PR #218中引入的通用Agent工作流功能。

Named profiles

命名配置文件

Persist a set of flags under a name and reuse them across invocations.

bash

undefined

将一组参数保存为命名配置文件，可在多次调用中复用。

bash

undefined

Save the current non-default flags as a named profile

将当前非默认参数保存为命名配置文件

archive-is-pp-cli profile save <name>

Use a profile — overlays its values onto any flag you don't set explicitly

使用配置文件——会覆盖未显式设置的参数

archive-is-pp-cli --profile <name> <command>

List / inspect / remove

列出/查看/删除配置文件

archive-is-pp-cli profile list archive-is-pp-cli profile show <name> archive-is-pp-cli profile delete <name> --yes


Flag precedence: explicit flag > env var > profile > default.

archive-is-pp-cli profile list archive-is-pp-cli profile show <name> archive-is-pp-cli profile delete <name> --yes


参数优先级：显式参数 > 环境变量 > 配置文件 > 默认值。

--deliver

Route command output to a sink other than stdout. Useful when an agent needs to hand a result to a file, a webhook, or another process without plumbing.

bash

archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/in

File sinks write atomically (tmp + rename). Webhook sinks POST

application/json

(or

application/x-ndjson

when

--compact

is set). Unknown schemes produce a structured refusal listing the supported set.

将命令输出路由到标准输出以外的目标。当Agent需要将结果发送到文件、webhook或其他进程而无需额外处理时非常有用。

bash

archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/in

文件目标会原子写入（先写入临时文件再重命名）。Webhook目标会POST

application/json

（当设置

--compact

时为

application/x-ndjson

）。未知协议会返回结构化的拒绝信息，列出支持的协议类型。

feedback

Record in-band feedback about this CLI from the agent side of the loop. Local-only by default; safe to call without configuration.

bash

archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list         # show local entries
archive-is-pp-cli feedback clear --yes  # wipe

Entries append to

~/.archive-is-pp-cli/feedback.jsonl

as JSON lines. When

ARCHIVE_IS_FEEDBACK_ENDPOINT

is set and either

--send

is passed or

ARCHIVE_IS_FEEDBACK_AUTO_SEND=true

, the entry is also POSTed upstream (non-blocking — local write always succeeds).

从Agent侧记录关于此CLI的反馈。默认仅本地存储；无需配置即可安全调用。

bash

archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list         # 显示本地记录
archive-is-pp-cli feedback clear --yes  # 清空记录

记录会以JSON行格式追加到

~/.archive-is-pp-cli/feedback.jsonl

。当设置

ARCHIVE_IS_FEEDBACK_ENDPOINT

且传递

--send

参数或设置

ARCHIVE_IS_FEEDBACK_AUTO_SEND=true

时，记录还会被POST到上游服务（非阻塞——本地写入始终成功）。