research-collector

Original🇨🇳 Chinese
Translated

Use this when users need to collect research materials for an article or topic by gathering YouTube videos and web articles into a NotebookLM notebook, then running analysis queries and saving the results as markdown. It is ideal for requests like "collect materials", "find relevant videos and articles on this topic for me", and "organize for NotebookLM analysis". This skill combines yt-dlp YouTube search, NotebookLM `nlm` CLI research, and markdown report output.

6installs
Added on

NPX Install

npx skill4agent add xiaomoboy/claude-writing-skills research-collector

Tags

Translated version includes tags in frontmatter

SKILL.md Content (Chinese)

View Translation Comparison →

Research Collector

This skill does only one thing:
  • Batch collect YouTube videos + web articles on a specific topic, feed them into NotebookLM, run analysis queries, and save the results to a local directory (default
    ./research/<topic>/
    , configurable)
It does NOT handle:
  • Writing the final article (leave this to your own writing tools / skills)
  • Choosing main titles
  • Downloading videos (use the
    yt-dlp-direct
    skill in this repository)
  • Publishing to multiple platforms (use the
    publisher-wechatsync
    skill in this repository)
One-sentence principle: When users say "help me collect materials on topic X" or "pull a batch of YouTube videos + articles into NotebookLM", follow this fixed workflow instead of redesigning it every time.

When To Use

Applicable scenarios:
  • Users need to conduct background research before writing a recommendation/review/opinion article on a topic
  • Users say "help me find popular YouTube videos and articles about X"
  • Users say "collect them into NotebookLM for analysis"
  • Users say "organize a material research report on topic X for me"
Inapplicable scenarios:
  • Users already have a clear list of materials and only want summaries → directly run
    nlm notebook query
  • Users want to conduct real-time conversational research without persisting to a notebook → use WebSearch + WebFetch
  • Users only need to download a single video → use
    yt-dlp-direct

Preconditions

Must confirm the following before starting:
  1. nlm
    CLI is installed and logged in:
    nlm login --check
  2. yt-dlp
    is in PATH:
    which yt-dlp
  3. Users have clearly specified the topic and angle
  4. The output directory is writable (default
    ./research/<topic>/
    , can be configured via the
    RESEARCH_OUTPUT_DIR
    environment variable or directly specified in the conversation)
If preconditions are not met:
  • If
    nlm login --check
    fails → ask the user to run
    nlm login
    ; session validity is ~20 minutes
  • If
    yt-dlp
    is not installed → stop and inform the user

Working Rules

  • Align the topic, angle, and volume with the user before taking action
  • Default to 15 results per ytsearch, adjust as needed
  • NotebookLM deep research can only run one task at a time, no concurrency allowed
  • Sleep for 2 seconds between adding each source to avoid rate limiting
  • All outputs (raw JSON + summary markdown) are saved to
    ./research/<topic>/
    (or the user-specified directory)
  • This skill only handles collection and analysis; do not automatically proceed to write the final article
  • Do not delete the notebook, as users may need to run queries later

Core Workflow

Phase 0: Align Objectives

Before starting, you must confirm with the user:
  1. What is the topic (a keyword phrase that can be directly used for ytsearch)
  2. Angle (e.g., "most commonly used + personal creation" vs "latest release + technical details")
  3. Notebook name (default: "<Topic> Materials")
  4. Volume (default: 15 YouTube videos + ~40 web articles from NotebookLM deep research)

Phase 1: Create Notebook + Set Alias

bash
nlm notebook create "<Topic> Materials"
# Extract ID from output, then:
nlm alias set <short-name> <notebook-id>
Use a short alias, such as
skills-research
or
vps-2026
, and use the alias for all subsequent commands.

Phase 2: Search for Popular YouTube Videos with yt-dlp ytsearch

Run 2-3 searches with different angles in parallel, 15 results each:
bash
yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
  "ytsearch15:<Keyword A>"
yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
  "ytsearch15:<Keyword B>"
Ignore JS runtime warnings in the output.
Filter the top 15 results using the following rules:
  • Remove duplicates (same video appearing in multiple searches)
  • Prioritize official accounts (e.g., Anthropic, OpenAI, etc.)
  • Sort by view count from highest to lowest, but reserve 2-3 mid-tier videos with vertical angles to avoid all being blockbuster press releases
  • Keep at least 5 results for each angle

Phase 3: Add YouTube Videos as Sources

Use a bash loop to add them one by one, sleeping for 2 seconds each time:
bash
cat > /tmp/yt_urls.txt <<'EOF'
https://www.youtube.com/watch?v=XXX1
https://www.youtube.com/watch?v=XXX2
...
EOF

while IFS= read -r url; do
  echo "=== Adding: $url ==="
  nlm source add <alias> --url "$url" 2>&1 | tail -5
  sleep 2
done < /tmp/yt_urls.txt
Occasionally, individual additions may fail (video not public, region-restricted), ignore and continue, then report the success rate at the end.

Phase 4: Run NotebookLM Deep Research to Discover Web Articles

bash
nlm research start "<English query suitable for web research>" \
  --notebook-id <alias> --mode deep
Deep mode takes ~5 minutes and returns ~40 web sources.
Key: Only one research task can run in a notebook at the same time. If you want to run a second round, you must wait for the first round to finish importing or use
--force
.
Wait for completion:
bash
nlm research status <alias> --max-wait 360
The Bash tool has a default timeout of 120 seconds; you must add
timeout: 400000
(i.e., 400 seconds).

Phase 5: Import Research Results

After the research is completed, get the task-id from the output, then:
bash
nlm research import <alias> <task-id> --timeout 600
Add
timeout: 700000
to the Bash tool call.
Note: If the user says "enough materials, no need to import more", stop and proceed directly to Phase 6.

Phase 6: Run 3 Analysis Queries

By default, run queries from 3 angles, redirect commands directly to files to avoid excessive output:
bash
mkdir -p "./research/<topic>"

nlm notebook query <alias> "<Chinese prompt for question 1>" \
  > "./research/<topic>/query1-<slug>-raw.json" 2>&1

nlm notebook query <alias> "<Chinese prompt for question 2>" \
  > "./research/<topic>/query2-<slug>-raw.json" 2>&1

nlm notebook query <alias> "<Chinese prompt for question 3>" \
  > "./research/<topic>/query3-<slug>-raw.json" 2>&1
Add
timeout: 240000
to each Bash query call.
Default 3 query templates (modify keywords as needed):
  1. Top List: "Based on all sources, please list the Top 10 X recommended by the most sources. For each X, explain: (1) Name (2) What it does specifically (3) Main usage scenarios (4) Number of sources recommending it (5) Type classification. Sort by recommendation frequency from highest to lowest, output in Chinese."
  2. Target Audience-Oriented: "I want to write an article for <audience portrait>. Please filter the Top 8 X that are most helpful to <audience>, explain each with: (1) Name (2) Specific pain points (3) One-sentence typical usage (4) Type (5) Most specific source number. Remove irrelevant content, focus on <scenario>, output in Chinese."
  3. Getting Started + Pitfalls: "For <audience> using X, please summarize: (1) Fastest way to get started (2) Where to obtain it (3) 5 easiest pitfalls to fall into (4) When it's actually not needed (5) Latest important updates. Attach source numbers to each point, output in Chinese."

Phase 7: Extract Answer Field and Generate Summary Markdown

The raw output is JSON containing answer + citations; use Python to extract the
value.answer
field:
bash
python3 <<'PY'
import json, pathlib
base = pathlib.Path("./research/<topic>")
files = [
    ("query1-<slug>-raw.json", "## Query 1:<Title>"),
    ("query2-<slug>-raw.json", "## Query 2:<Title>"),
    ("query3-<slug>-raw.json", "## Query 3:<Title>"),
]
out = ["# <Topic> Material Research", "",
       "> Analysis results based on NotebookLM notebook `<notebook-name>`", "",
       "---", ""]
for fname, heading in files:
    out.append(heading)
    out.append("")
    raw = (base/fname).read_text()
    try:
        data = json.loads(raw)
        out.append(data.get("value",{}).get("answer",""))
    except Exception as e:
        out.append(f"(Parsing failed: {e})")
    out.append("")
    out.append("---")
    out.append("")
(base/"Material Research Summary.md").write_text("\n".join(out))
print("Written:", (base/"Material Research Summary.md").stat().st_size, "bytes")
PY

Output Contract

After execution, provide the user with a report including:
  1. Notebook name + alias + actual number of sources
  2. Storage paths of the 3 raw JSON files and 1 summary markdown file
  3. Failed/skipped sources (if any)
  4. Preview of the summary file's header (first 20 lines or so)
  5. Suggested next steps (leave downstream usage to the user; this skill ends here)

Safety and Boundaries

  • Do not run audio/video/slides generation by default, as these consume quotas; only do so if the user requests it
  • Do not automatically run a second round of research; one round is sufficient for most scenarios
  • Do not overwrite existing
    Material Research Summary.md
    ; append
    -v2
    if it exists
  • Do not include users' private information in research queries (notebooks are searchable)

Troubleshooting

nlm Login Expired

bash
nlm login --check  # Tells you if the session is valid
nlm login          # Re-login
Session validity is approximately 20 minutes.

yt-dlp Search Returns No Output

First check the version:
bash
yt-dlp --version
If it's too old, prompt the user to update. JS runtime / ffmpeg warnings can be ignored and do not affect
--simulate
mode.

Research Times Out or Gets Stuck

Check status separately (non-blocking):
bash
nlm research status <alias> --max-wait 0
If the status remains
in_progress
for more than 10 minutes, restart with
--force
:
bash
nlm research start "..." --notebook-id <alias> --mode deep --force

Query Output Is Too Large to View Directly

Redirect all queries to files, then use Python to extract the answer; do not attempt to print large JSON directly in the terminal.

Continuous Failures When Adding Sources

  • Check for rate limiting → increase sleep time to 3-5 seconds
  • Check URL format (YouTube must use the standard
    watch?v=
    format, not shorts/live)
  • Check login status →
    nlm login --check

References

  • Complete NotebookLM CLI Guide:
    notebooklm-mcp-cli
    (pip package by jacob-bd) comes with nlm-skill, or refer to the upstream README https://github.com/jacob-bd/notebooklm-mcp-cli
  • yt-dlp Command Library:
    ../yt-dlp-direct/SKILL.md
    in the same repository
  • Project Own Conventions: If your working directory has
    CLAUDE.md
    /
    AGENTS.md
    , this skill does not depend on them; optional reading