research-collector
Original:🇨🇳 Chinese
Translated
Use this when users need to collect research materials for an article or topic by gathering YouTube videos and web articles into a NotebookLM notebook, then running analysis queries and saving the results as markdown. It is ideal for requests like "collect materials", "find relevant videos and articles on this topic for me", and "organize for NotebookLM analysis". This skill combines yt-dlp YouTube search, NotebookLM `nlm` CLI research, and markdown report output.
6installs
Added on
NPX Install
npx skill4agent add xiaomoboy/claude-writing-skills research-collectorTags
Translated version includes tags in frontmatterSKILL.md Content (Chinese)
View Translation Comparison →Research Collector
This skill does only one thing:
- Batch collect YouTube videos + web articles on a specific topic, feed them into NotebookLM, run analysis queries, and save the results to a local directory (default , configurable)
./research/<topic>/
It does NOT handle:
- Writing the final article (leave this to your own writing tools / skills)
- Choosing main titles
- Downloading videos (use the skill in this repository)
yt-dlp-direct - Publishing to multiple platforms (use the skill in this repository)
publisher-wechatsync
One-sentence principle: When users say "help me collect materials on topic X" or "pull a batch of YouTube videos + articles into NotebookLM", follow this fixed workflow instead of redesigning it every time.
When To Use
Applicable scenarios:
- Users need to conduct background research before writing a recommendation/review/opinion article on a topic
- Users say "help me find popular YouTube videos and articles about X"
- Users say "collect them into NotebookLM for analysis"
- Users say "organize a material research report on topic X for me"
Inapplicable scenarios:
- Users already have a clear list of materials and only want summaries → directly run
nlm notebook query - Users want to conduct real-time conversational research without persisting to a notebook → use WebSearch + WebFetch
- Users only need to download a single video → use
yt-dlp-direct
Preconditions
Must confirm the following before starting:
- CLI is installed and logged in:
nlmnlm login --check - is in PATH:
yt-dlpwhich yt-dlp - Users have clearly specified the topic and angle
- The output directory is writable (default , can be configured via the
./research/<topic>/environment variable or directly specified in the conversation)RESEARCH_OUTPUT_DIR
If preconditions are not met:
- If fails → ask the user to run
nlm login --check; session validity is ~20 minutesnlm login - If is not installed → stop and inform the user
yt-dlp
Working Rules
- Align the topic, angle, and volume with the user before taking action
- Default to 15 results per ytsearch, adjust as needed
- NotebookLM deep research can only run one task at a time, no concurrency allowed
- Sleep for 2 seconds between adding each source to avoid rate limiting
- All outputs (raw JSON + summary markdown) are saved to (or the user-specified directory)
./research/<topic>/ - This skill only handles collection and analysis; do not automatically proceed to write the final article
- Do not delete the notebook, as users may need to run queries later
Core Workflow
Phase 0: Align Objectives
Before starting, you must confirm with the user:
- What is the topic (a keyword phrase that can be directly used for ytsearch)
- Angle (e.g., "most commonly used + personal creation" vs "latest release + technical details")
- Notebook name (default: "<Topic> Materials")
- Volume (default: 15 YouTube videos + ~40 web articles from NotebookLM deep research)
Phase 1: Create Notebook + Set Alias
bash
nlm notebook create "<Topic> Materials"
# Extract ID from output, then:
nlm alias set <short-name> <notebook-id>Use a short alias, such as or , and use the alias for all subsequent commands.
skills-researchvps-2026Phase 2: Search for Popular YouTube Videos with yt-dlp ytsearch
Run 2-3 searches with different angles in parallel, 15 results each:
bash
yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
"ytsearch15:<Keyword A>"
yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
"ytsearch15:<Keyword B>"Ignore JS runtime warnings in the output.
Filter the top 15 results using the following rules:
- Remove duplicates (same video appearing in multiple searches)
- Prioritize official accounts (e.g., Anthropic, OpenAI, etc.)
- Sort by view count from highest to lowest, but reserve 2-3 mid-tier videos with vertical angles to avoid all being blockbuster press releases
- Keep at least 5 results for each angle
Phase 3: Add YouTube Videos as Sources
Use a bash loop to add them one by one, sleeping for 2 seconds each time:
bash
cat > /tmp/yt_urls.txt <<'EOF'
https://www.youtube.com/watch?v=XXX1
https://www.youtube.com/watch?v=XXX2
...
EOF
while IFS= read -r url; do
echo "=== Adding: $url ==="
nlm source add <alias> --url "$url" 2>&1 | tail -5
sleep 2
done < /tmp/yt_urls.txtOccasionally, individual additions may fail (video not public, region-restricted), ignore and continue, then report the success rate at the end.
Phase 4: Run NotebookLM Deep Research to Discover Web Articles
bash
nlm research start "<English query suitable for web research>" \
--notebook-id <alias> --mode deepDeep mode takes ~5 minutes and returns ~40 web sources.
Key: Only one research task can run in a notebook at the same time. If you want to run a second round, you must wait for the first round to finish importing or use .
--forceWait for completion:
bash
nlm research status <alias> --max-wait 360The Bash tool has a default timeout of 120 seconds; you must add (i.e., 400 seconds).
timeout: 400000Phase 5: Import Research Results
After the research is completed, get the task-id from the output, then:
bash
nlm research import <alias> <task-id> --timeout 600Add to the Bash tool call.
timeout: 700000Note: If the user says "enough materials, no need to import more", stop and proceed directly to Phase 6.
Phase 6: Run 3 Analysis Queries
By default, run queries from 3 angles, redirect commands directly to files to avoid excessive output:
bash
mkdir -p "./research/<topic>"
nlm notebook query <alias> "<Chinese prompt for question 1>" \
> "./research/<topic>/query1-<slug>-raw.json" 2>&1
nlm notebook query <alias> "<Chinese prompt for question 2>" \
> "./research/<topic>/query2-<slug>-raw.json" 2>&1
nlm notebook query <alias> "<Chinese prompt for question 3>" \
> "./research/<topic>/query3-<slug>-raw.json" 2>&1Add to each Bash query call.
timeout: 240000Default 3 query templates (modify keywords as needed):
- Top List: "Based on all sources, please list the Top 10 X recommended by the most sources. For each X, explain: (1) Name (2) What it does specifically (3) Main usage scenarios (4) Number of sources recommending it (5) Type classification. Sort by recommendation frequency from highest to lowest, output in Chinese."
- Target Audience-Oriented: "I want to write an article for <audience portrait>. Please filter the Top 8 X that are most helpful to <audience>, explain each with: (1) Name (2) Specific pain points (3) One-sentence typical usage (4) Type (5) Most specific source number. Remove irrelevant content, focus on <scenario>, output in Chinese."
- Getting Started + Pitfalls: "For <audience> using X, please summarize: (1) Fastest way to get started (2) Where to obtain it (3) 5 easiest pitfalls to fall into (4) When it's actually not needed (5) Latest important updates. Attach source numbers to each point, output in Chinese."
Phase 7: Extract Answer Field and Generate Summary Markdown
The raw output is JSON containing answer + citations; use Python to extract the field:
value.answerbash
python3 <<'PY'
import json, pathlib
base = pathlib.Path("./research/<topic>")
files = [
("query1-<slug>-raw.json", "## Query 1:<Title>"),
("query2-<slug>-raw.json", "## Query 2:<Title>"),
("query3-<slug>-raw.json", "## Query 3:<Title>"),
]
out = ["# <Topic> Material Research", "",
"> Analysis results based on NotebookLM notebook `<notebook-name>`", "",
"---", ""]
for fname, heading in files:
out.append(heading)
out.append("")
raw = (base/fname).read_text()
try:
data = json.loads(raw)
out.append(data.get("value",{}).get("answer",""))
except Exception as e:
out.append(f"(Parsing failed: {e})")
out.append("")
out.append("---")
out.append("")
(base/"Material Research Summary.md").write_text("\n".join(out))
print("Written:", (base/"Material Research Summary.md").stat().st_size, "bytes")
PYOutput Contract
After execution, provide the user with a report including:
- Notebook name + alias + actual number of sources
- Storage paths of the 3 raw JSON files and 1 summary markdown file
- Failed/skipped sources (if any)
- Preview of the summary file's header (first 20 lines or so)
- Suggested next steps (leave downstream usage to the user; this skill ends here)
Safety and Boundaries
- Do not run audio/video/slides generation by default, as these consume quotas; only do so if the user requests it
- Do not automatically run a second round of research; one round is sufficient for most scenarios
- Do not overwrite existing ; append
Material Research Summary.mdif it exists-v2 - Do not include users' private information in research queries (notebooks are searchable)
Troubleshooting
nlm Login Expired
bash
nlm login --check # Tells you if the session is valid
nlm login # Re-loginSession validity is approximately 20 minutes.
yt-dlp Search Returns No Output
First check the version:
bash
yt-dlp --versionIf it's too old, prompt the user to update. JS runtime / ffmpeg warnings can be ignored and do not affect mode.
--simulateResearch Times Out or Gets Stuck
Check status separately (non-blocking):
bash
nlm research status <alias> --max-wait 0If the status remains for more than 10 minutes, restart with :
in_progress--forcebash
nlm research start "..." --notebook-id <alias> --mode deep --forceQuery Output Is Too Large to View Directly
Redirect all queries to files, then use Python to extract the answer; do not attempt to print large JSON directly in the terminal.
Continuous Failures When Adding Sources
- Check for rate limiting → increase sleep time to 3-5 seconds
- Check URL format (YouTube must use the standard format, not shorts/live)
watch?v= - Check login status →
nlm login --check
References
- Complete NotebookLM CLI Guide: (pip package by jacob-bd) comes with nlm-skill, or refer to the upstream README https://github.com/jacob-bd/notebooklm-mcp-cli
notebooklm-mcp-cli - yt-dlp Command Library: in the same repository
../yt-dlp-direct/SKILL.md - Project Own Conventions: If your working directory has /
CLAUDE.md, this skill does not depend on them; optional readingAGENTS.md