pp-archive-is
Original:🇺🇸 English
Translated
Use this skill whenever the user wants to archive a URL, bypass a paywall, look up an existing archive, view a cached version of a webpage, pull article text from archive.today or the Wayback Machine, or batch-archive a list of URLs. archive.today + Wayback Machine CLI with lookup-before-submit, automatic fallback when one backend is down, and agent-friendly output. No API key required. Triggers on phrasings like 'archive this article', 'bypass the paywall on this link', 'grab the cached text', 'save this url to archive.today', 'check if this was already archived', 'bulk archive these 20 URLs'.
8installs
Added on
NPX Install
npx skill4agent add mvanhorn/printing-press-library pp-archive-isTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →archive.today — Printing Press CLI
Prerequisites: Install the CLI
This skill drives the binary. You must verify the CLI is installed before invoking any command from this skill. If it is missing, install it first:
archive-is-pp-cli- Install via the Printing Press installer:
bash
npx -y @mvanhorn/printing-press install archive-is --cli-only - Verify:
archive-is-pp-cli --version - Ensure (or
$GOPATH/bin) is on$HOME/go/bin.$PATH
If the install fails (no Node, offline, etc.), fall back to a direct Go install (requires Go 1.23+):
npxbash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latestIf reports "command not found" after install, the install step did not put the binary on . Do not proceed with skill commands until verification succeeds.
--version$PATHWhen to Use This CLI
Reach for this whenever a user wants to archive a URL, read a paywalled article, check whether something was previously archived, or batch-capture a list of URLs for research. Specifically good when:
- A user sends a paywalled link and asks "can you read this" → fetches text via archive
read - They want to preserve a URL that might change → forces a fresh capture
save - They want historical versions → lists all known snapshots
history - They have 20+ URLs to archive → runs rate-limited batch archival
bulk
Don't reach for this if the URL is trivially scrapeable without archive services (no paywall, robots-allowed, direct HTTP works), or if the user wants the original source rather than a cached version.
Unique Capabilities
The whole CLI is unique — archive.today has no official API. But within this CLI, certain commands are the differentiators.
The hero commands
-
— Find or create an archive for a URL. Looks up existing snapshots first (Memento timegate → CDX fallback); submits a fresh capture only if nothing exists. The "always do the right thing" command.
read <url>This is how 90% of agent calls should start. It's idempotent — calling it twice on the same URL doesn't double-submit. -
/
get <url> [--format text|html]— Fetch article text, optionally LLM-summarized. Automatic Wayback fallback when archive.today serves a CAPTCHA (which happens daily to cloud IPs).tldr <url>pipes the fetched text through a summarization step — useful for agent chains where you want a short take without shipping 20KB of HTML back.tldr
Durability operations
-
— Force a fresh capture via
save <url>. Use when/submit/?url=<x>&anyway=1returns an existing snapshot that's too old or missing a paywall update.read -
— List all known snapshots via Memento timemap parsing. Shows every capture date across both archive.today and Wayback.
history <url> -
— Rate-limited batch archiving from a file or stdin. Reads URLs one per line, submits each with backoff, returns a report of successes / failures / pre-existing.
bulk [file]archives every URL in a markdown file.grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk - -
— Fire-and-forget submit with optional wait+poll. Useful for long captures where you want to come back later.
request <url>
Observability
-
— Just the newest snapshot URL for a target, useful in scripts.
snapshots newest <url> -
— List your local capture index (post-sync).
captures -
— archive.today's global recent-archives feed.
feeds -
— Every read/get accepts a backend preference. Defaults to archive-is with Wayback fallback; flip the order for Wayback-primary.
--backend archive-is,wayback
Command Reference
Archive + retrieve:
- — Find or create (hero command)
archive-is-pp-cli read <url> - — Fetch article text (with Wayback fallback)
archive-is-pp-cli get <url> - — Fetch + summarize
archive-is-pp-cli tldr <url> - — Force fresh capture
archive-is-pp-cli save <url> - — Fire-and-forget submit
archive-is-pp-cli request <url> - — Does an archive exist?
archive-is-pp-cli check <url>
Listing + history:
- — All known snapshots
archive-is-pp-cli history <url> - — Newest snapshot URL
archive-is-pp-cli newest <url> - — Local capture index
archive-is-pp-cli captures - — Global recent feed
archive-is-pp-cli feeds
Batch:
- — Batch from file or stdin
archive-is-pp-cli bulk [file]
Local store:
- /
archive-is-pp-cli sync/archive/export— Local SQLite opsimport
Auth + health:
- — Config (no API key needed; auth is a no-op)
archive-is-pp-cli auth - — Verify backend reachability
archive-is-pp-cli doctor
Recipes
Read a paywalled article
bash
archive-is-pp-cli read "https://www.wsj.com/articles/..." --agent
# or: return just the text
archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agentreadget --format textPreserve a URL before it changes
bash
archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent # verifyForce capture, then check history to confirm the new snapshot registered.
Bulk archive a research batch
bash
grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agent
# or from a file:
archive-is-pp-cli bulk urls.txt --agentReads URLs one per line, submits each with exponential backoff, returns per-URL status (archived, pre-existing, failed) as JSON.
Wayback-preferred for a reliable-read
bash
archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agentUse when the Wayback Machine snapshot is known to be cleaner or archive.today is rate-limiting.
Auth Setup
No API key required. Archive.today and Wayback Machine are both public. The subcommand exists for consistency but is a no-op — reports "Auth: not required" which is the expected state.
authdoctorOptional env:
- — override archive.today host (for mirrors)
ARCHIVE_IS_BASE_URL - — override Wayback Machine host
WAYBACK_BASE_URL
Agent Mode
Add to any command. Expands to . Every action command also prints structured hints on stderr when called non-interactively — the calling agent sees "tried X, got Y, consider Z" automatically.
--agent--json --compact --no-input --no-color --yes --no-promptnext_actionsNotable flags:
- — max wait for a fresh submit (default
--submit-timeout <duration>;10m= unbounded)0 - — backend preference and fallback order
--backend archive-is,wayback - —
--format text|html/getoutput formattldr
Filtering output
--selectbash
archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.nameUse this to narrow huge payloads to the fields you actually need — critical for deeply nested API responses.
Response envelope
Data-layer commands wrap output in . Parse for data and to know whether it's or local. The summary is printed to stderr only when stdout is a TTY; piped/agent consumers see pure JSON on stdout.
{"meta": {...}, "results": <data>}.results.meta.sourceliveN results (live)Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 2 | Usage error |
| 3 | Not found (no snapshot exists) |
| 5 | API error (archive.today or Wayback down) |
| 7 | Rate limited (too many submits) |
Installation
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctorMCP Server
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcpArgument Parsing
Given :
$ARGUMENTS- Empty, , or
help→ run--helparchive-is-pp-cli --help - → CLI;
install→ MCPinstall mcp - Anything that looks like a URL, or "archive <url>" / "bypass paywall on <url>" → is the default — it's idempotent and covers the 90% case.
read <url> --agent - "bulk archive" / "archive these" → from stdin if URLs are pasted, else ask for the file path.
bulk
Agent Workflow Features
This CLI exposes three shared agent-workflow capabilities patched in from cli-printing-press PR #218.
Named profiles
Persist a set of flags under a name and reuse them across invocations.
bash
# Save the current non-default flags as a named profile
archive-is-pp-cli profile save <name>
# Use a profile — overlays its values onto any flag you don't set explicitly
archive-is-pp-cli --profile <name> <command>
# List / inspect / remove
archive-is-pp-cli profile list
archive-is-pp-cli profile show <name>
archive-is-pp-cli profile delete <name> --yesFlag precedence: explicit flag > env var > profile > default.
--deliver
Route command output to a sink other than stdout. Useful when an agent needs to hand a result to a file, a webhook, or another process without plumbing.
bash
archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/inFile sinks write atomically (tmp + rename). Webhook sinks POST (or when is set). Unknown schemes produce a structured refusal listing the supported set.
application/jsonapplication/x-ndjson--compactfeedback
Record in-band feedback about this CLI from the agent side of the loop. Local-only by default; safe to call without configuration.
bash
archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list # show local entries
archive-is-pp-cli feedback clear --yes # wipeEntries append to as JSON lines. When is set and either is passed or , the entry is also POSTed upstream (non-blocking — local write always succeeds).
~/.archive-is-pp-cli/feedback.jsonlARCHIVE_IS_FEEDBACK_ENDPOINT--sendARCHIVE_IS_FEEDBACK_AUTO_SEND=true