Research Collector

This skill does only one thing:

Batch collect YouTube videos + web articles on a specific topic, feed them into NotebookLM, run analysis queries, and save the results to a local directory (default
```
./research/<topic>/
```
, configurable)

It does NOT handle:

Writing the final article (leave this to your own writing tools / skills)
Choosing main titles
Downloading videos (use the
```
yt-dlp-direct
```
skill in this repository)
Publishing to multiple platforms (use the
```
publisher-wechatsync
```
skill in this repository)

One-sentence principle: When users say "help me collect materials on topic X" or "pull a batch of YouTube videos + articles into NotebookLM", follow this fixed workflow instead of redesigning it every time.

When To Use

Applicable scenarios:

Users need to conduct background research before writing a recommendation/review/opinion article on a topic
Users say "help me find popular YouTube videos and articles about X"
Users say "collect them into NotebookLM for analysis"
Users say "organize a material research report on topic X for me"

Inapplicable scenarios:

Users already have a clear list of materials and only want summaries → directly run
```
nlm notebook query
```
Users want to conduct real-time conversational research without persisting to a notebook → use WebSearch + WebFetch
Users only need to download a single video → use
```
yt-dlp-direct
```

Preconditions

Must confirm the following before starting:

```
nlm
```
CLI is installed and logged in:
```
nlm login --check
```
```
yt-dlp
```
is in PATH:
```
which yt-dlp
```
Users have clearly specified the topic and angle
The output directory is writable (default
```
./research/<topic>/
```
, can be configured via the
```
RESEARCH_OUTPUT_DIR
```
environment variable or directly specified in the conversation)

If preconditions are not met:

If
```
nlm login --check
```
fails → ask the user to run
```
nlm login
```
; session validity is ~20 minutes
If
```
yt-dlp
```
is not installed → stop and inform the user

Working Rules

Align the topic, angle, and volume with the user before taking action
Default to 15 results per ytsearch, adjust as needed
NotebookLM deep research can only run one task at a time, no concurrency allowed
Sleep for 2 seconds between adding each source to avoid rate limiting
All outputs (raw JSON + summary markdown) are saved to
```
./research/<topic>/
```
(or the user-specified directory)
This skill only handles collection and analysis; do not automatically proceed to write the final article
Do not delete the notebook, as users may need to run queries later

Core Workflow

Phase 0: Align Objectives

Before starting, you must confirm with the user:

What is the topic (a keyword phrase that can be directly used for ytsearch)
Angle (e.g., "most commonly used + personal creation" vs "latest release + technical details")
Notebook name (default: "<Topic> Materials")
Volume (default: 15 YouTube videos + ~40 web articles from NotebookLM deep research)

Phase 1: Create Notebook + Set Alias

bash

nlm notebook create "<Topic> Materials"
# Extract ID from output, then:
nlm alias set <short-name> <notebook-id>

Use a short alias, such as

skills-research

vps-2026

, and use the alias for all subsequent commands.

Phase 2: Search for Popular YouTube Videos with yt-dlp ytsearch

Run 2-3 searches with different angles in parallel, 15 results each:

bash

yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
  "ytsearch15:<Keyword A>"
yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
  "ytsearch15:<Keyword B>"

Ignore JS runtime warnings in the output.

Filter the top 15 results using the following rules:

Remove duplicates (same video appearing in multiple searches)
Prioritize official accounts (e.g., Anthropic, OpenAI, etc.)
Sort by view count from highest to lowest, but reserve 2-3 mid-tier videos with vertical angles to avoid all being blockbuster press releases
Keep at least 5 results for each angle

Phase 3: Add YouTube Videos as Sources

Use a bash loop to add them one by one, sleeping for 2 seconds each time:

bash

cat > /tmp/yt_urls.txt <<'EOF'
https://www.youtube.com/watch?v=XXX1
https://www.youtube.com/watch?v=XXX2
...
EOF

while IFS= read -r url; do
  echo "=== Adding: $url ==="
  nlm source add <alias> --url "$url" 2>&1 | tail -5
  sleep 2
done < /tmp/yt_urls.txt

Occasionally, individual additions may fail (video not public, region-restricted), ignore and continue, then report the success rate at the end.

Phase 4: Run NotebookLM Deep Research to Discover Web Articles

bash

nlm research start "<English query suitable for web research>" \
  --notebook-id <alias> --mode deep

Deep mode takes ~5 minutes and returns ~40 web sources.

Key: Only one research task can run in a notebook at the same time. If you want to run a second round, you must wait for the first round to finish importing or use

--force

Wait for completion:

bash

nlm research status <alias> --max-wait 360

The Bash tool has a default timeout of 120 seconds; you must add

timeout: 400000

(i.e., 400 seconds).

Phase 5: Import Research Results

After the research is completed, get the task-id from the output, then:

bash

nlm research import <alias> <task-id> --timeout 600

Add

timeout: 700000

to the Bash tool call.

Note: If the user says "enough materials, no need to import more", stop and proceed directly to Phase 6.

Phase 6: Run 3 Analysis Queries

By default, run queries from 3 angles, redirect commands directly to files to avoid excessive output:

bash

mkdir -p "./research/<topic>"

nlm notebook query <alias> "<Chinese prompt for question 1>" \
  > "./research/<topic>/query1-<slug>-raw.json" 2>&1

nlm notebook query <alias> "<Chinese prompt for question 2>" \
  > "./research/<topic>/query2-<slug>-raw.json" 2>&1

nlm notebook query <alias> "<Chinese prompt for question 3>" \
  > "./research/<topic>/query3-<slug>-raw.json" 2>&1

Add
timeout: 240000
to each Bash query call.

Default 3 query templates (modify keywords as needed):

Top List: "Based on all sources, please list the Top 10 X recommended by the most sources. For each X, explain: (1) Name (2) What it does specifically (3) Main usage scenarios (4) Number of sources recommending it (5) Type classification. Sort by recommendation frequency from highest to lowest, output in Chinese."
Target Audience-Oriented: "I want to write an article for <audience portrait>. Please filter the Top 8 X that are most helpful to <audience>, explain each with: (1) Name (2) Specific pain points (3) One-sentence typical usage (4) Type (5) Most specific source number. Remove irrelevant content, focus on <scenario>, output in Chinese."
Getting Started + Pitfalls: "For <audience> using X, please summarize: (1) Fastest way to get started (2) Where to obtain it (3) 5 easiest pitfalls to fall into (4) When it's actually not needed (5) Latest important updates. Attach source numbers to each point, output in Chinese."

Phase 7: Extract Answer Field and Generate Summary Markdown

The raw output is JSON containing answer + citations; use Python to extract the

value.answer

field:

bash

python3 <<'PY'
import json, pathlib
base = pathlib.Path("./research/<topic>")
files = [
    ("query1-<slug>-raw.json", "## Query 1:<Title>"),
    ("query2-<slug>-raw.json", "## Query 2:<Title>"),
    ("query3-<slug>-raw.json", "## Query 3:<Title>"),
]
out = ["# <Topic> Material Research", "",
       "> Analysis results based on NotebookLM notebook `<notebook-name>`", "",
       "---", ""]
for fname, heading in files:
    out.append(heading)
    out.append("")
    raw = (base/fname).read_text()
    try:
        data = json.loads(raw)
        out.append(data.get("value",{}).get("answer",""))
    except Exception as e:
        out.append(f"(Parsing failed: {e})")
    out.append("")
    out.append("---")
    out.append("")
(base/"Material Research Summary.md").write_text("\n".join(out))
print("Written:", (base/"Material Research Summary.md").stat().st_size, "bytes")
PY

Output Contract

After execution, provide the user with a report including:

Notebook name + alias + actual number of sources
Storage paths of the 3 raw JSON files and 1 summary markdown file
Failed/skipped sources (if any)
Preview of the summary file's header (first 20 lines or so)
Suggested next steps (leave downstream usage to the user; this skill ends here)

Safety and Boundaries

Do not run audio/video/slides generation by default, as these consume quotas; only do so if the user requests it
Do not automatically run a second round of research; one round is sufficient for most scenarios
Do not overwrite existing
```
Material Research Summary.md
```
; append
```
-v2
```
if it exists
Do not include users' private information in research queries (notebooks are searchable)

Troubleshooting

nlm Login Expired

bash

nlm login --check  # Tells you if the session is valid
nlm login          # Re-login

Session validity is approximately 20 minutes.

yt-dlp Search Returns No Output

First check the version:

bash

yt-dlp --version

If it's too old, prompt the user to update. JS runtime / ffmpeg warnings can be ignored and do not affect

--simulate

mode.

Research Times Out or Gets Stuck

Check status separately (non-blocking):

bash

nlm research status <alias> --max-wait 0

If the status remains

in_progress

for more than 10 minutes, restart with

--force

bash

nlm research start "..." --notebook-id <alias> --mode deep --force

Query Output Is Too Large to View Directly

Redirect all queries to files, then use Python to extract the answer; do not attempt to print large JSON directly in the terminal.

Continuous Failures When Adding Sources

Check for rate limiting → increase sleep time to 3-5 seconds
Check URL format (YouTube must use the standard
```
watch?v=
```
format, not shorts/live)
Check login status →
```
nlm login --check
```

References

Complete NotebookLM CLI Guide:
```
notebooklm-mcp-cli
```
(pip package by jacob-bd) comes with nlm-skill, or refer to the upstream README https://github.com/jacob-bd/notebooklm-mcp-cli
yt-dlp Command Library:
```
../yt-dlp-direct/SKILL.md
```
in the same repository
Project Own Conventions: If your working directory has
```
CLAUDE.md
```
/
```
AGENTS.md
```
, this skill does not depend on them; optional reading

research-collector

NPX Install

Tags

SKILL.md Content (Chinese)

Research Collector

When To Use

Preconditions

Working Rules

Core Workflow

Phase 0: Align Objectives

Phase 1: Create Notebook + Set Alias

Phase 2: Search for Popular YouTube Videos with yt-dlp ytsearch

Phase 3: Add YouTube Videos as Sources

Phase 4: Run NotebookLM Deep Research to Discover Web Articles

Phase 5: Import Research Results

Phase 6: Run 3 Analysis Queries

Phase 7: Extract Answer Field and Generate Summary Markdown

Output Contract

Safety and Boundaries

Troubleshooting

nlm Login Expired

yt-dlp Search Returns No Output

Research Times Out or Gets Stuck

Query Output Is Too Large to View Directly

Continuous Failures When Adding Sources

References