gs-export

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Google Scholar Export to Zotero

将Google Scholar论文导出至Zotero

Export Google Scholar paper citation data via BibTeX extraction and push to Zotero desktop.
通过提取BibTeX格式的引用数据,将Google Scholar论文的引用信息推送至Zotero桌面端。

Arguments

参数说明

$ARGUMENTS contains one or more data-cids (space-separated), e.g.:
  • TFS2GgoGiNUJ
    — single paper
  • TFS2GgoGiNUJ abc123XYZ def456UVW
    — batch export
$ARGUMENTS包含一个或多个data-cid(以空格分隔),例如:
  • TFS2GgoGiNUJ
    — 单篇论文
  • TFS2GgoGiNUJ abc123XYZ def456UVW
    — 批量导出

Steps

操作步骤

Step 1: Get BibTeX for each paper

步骤1:获取每篇论文的BibTeX数据

For each data-cid, perform 3 tool calls to bypass CORS:
针对每个data-cid,执行3次工具调用以绕过CORS限制:

1a. Fetch cite dialog to get BibTeX link (evaluate_script)

1a. 获取引用对话框中的BibTeX链接(evaluate_script)

javascript
async () => {
  const cid = "DATA_CID_HERE";
  const resp = await fetch(
    `https://scholar.google.com/scholar?q=info:${cid}:scholar.google.com/&output=cite`,
    { credentials: 'include' }
  );
  const html = await resp.text();
  const doc = new DOMParser().parseFromString(html, 'text/html');

  // Extract export links
  const links = Array.from(doc.querySelectorAll('#gs_citi a')).map(a => ({
    format: a.textContent.trim(),
    url: a.href
  }));

  // Extract citation format texts
  const citations = Array.from(doc.querySelectorAll('#gs_citt tr')).map(tr => {
    const cells = tr.querySelectorAll('td');
    return {
      style: cells[0]?.textContent?.trim() || '',
      text: cells[1]?.textContent?.trim() || ''
    };
  });

  const bibtexLink = links.find(l => l.format === 'BibTeX');
  return { cid, bibtexLink: bibtexLink?.url || '', links, citations };
}
javascript
async () => {
  const cid = "DATA_CID_HERE";
  const resp = await fetch(
    `https://scholar.google.com/scholar?q=info:${cid}:scholar.google.com/&output=cite`,
    { credentials: 'include' }
  );
  const html = await resp.text();
  const doc = new DOMParser().parseFromString(html, 'text/html');

  // 提取导出链接
  const links = Array.from(doc.querySelectorAll('#gs_citi a')).map(a => ({
    format: a.textContent.trim(),
    url: a.href
  }));

  // 提取引用格式文本
  const citations = Array.from(doc.querySelectorAll('#gs_citt tr')).map(tr => {
    const cells = tr.querySelectorAll('td');
    return {
      style: cells[0]?.textContent?.trim() || '',
      text: cells[1]?.textContent?.trim() || ''
    };
  });

  const bibtexLink = links.find(l => l.format === 'BibTeX');
  return { cid, bibtexLink: bibtexLink?.url || '', links, citations };
}

1b. Navigate to BibTeX URL (navigate_page)

1b. 跳转到BibTeX链接页面(navigate_page)

Use
mcp__chrome-devtools__navigate_page
:
  • url: the
    bibtexLink
    URL from step 1a (on
    scholar.googleusercontent.com
    )
This bypasses CORS restrictions that block fetch() to googleusercontent.com.
使用
mcp__chrome-devtools__navigate_page
工具:
  • url:步骤1a中获取的
    bibtexLink
    链接(位于
    scholar.googleusercontent.com
    域名下)
此操作可绕过阻止向googleusercontent.com发起fetch()请求的CORS限制。

1c. Read BibTeX content (evaluate_script)

1c. 读取BibTeX内容(evaluate_script)

javascript
async () => {
  return { bibtex: document.body.innerText || document.body.textContent || '' };
}
javascript
async () => {
  return { bibtex: document.body.innerText || document.body.textContent || '' };
}

Step 2: Parse BibTeX and push to Zotero

步骤2:解析BibTeX并推送至Zotero

Save the BibTeX data as JSON, then call the push script:
bash
python "E:/gscholar-skills/.claude/skills/gs-export/scripts/push_to_zotero.py" /tmp/gs_papers.json
Before calling the script, construct a JSON file at
/tmp/gs_papers.json
containing paper data parsed from BibTeX. Parse the BibTeX yourself and create the JSON array:
json
[
  {
    "pmid": "",
    "title": "The title from BibTeX",
    "authors": [
      {"lastName": "Smith", "firstName": "John"}
    ],
    "journal": "Journal Name",
    "journalAbbr": "",
    "pubdate": "2022",
    "volume": "14",
    "issue": "4",
    "pages": "1054",
    "doi": "",
    "pdfUrl": "https://example.com/paper.pdf",
    "abstract": "",
    "keywords": [],
    "language": "en",
    "pubtype": ["Journal Article"]
  }
]
IMPORTANT: Set
pdfUrl
from the search result's
fullTextUrl
field (the PDF link extracted by gs-search). The Python script will download the PDF and upload it to Zotero via
/connector/saveAttachment
(Zotero 7.x ignores attachments in saveItems). PDF download may fail for some publishers (403, JS-redirect); these are reported as "PDF skip".
BibTeX fields mapping:
  • @article{key,
    itemType: journalArticle
  • @inproceedings{key,
    itemType: conferencePaper
  • @book{key,
    itemType: book
  • title={...}
    title
  • author={Last1, First1 and Last2, First2}
    authors
    array
  • journal={...}
    journal
  • year={...}
    pubdate
  • volume={...}
    volume
  • number={...}
    issue
  • pages={...}
    pages
  • publisher={...}
    → (included in extra or publisher field)
将BibTeX数据保存为JSON格式,然后调用推送脚本:
bash
python "E:/gscholar-skills/.claude/skills/gs-export/scripts/push_to_zotero.py" /tmp/gs_papers.json
调用脚本前,需在
/tmp/gs_papers.json
路径下创建包含从BibTeX解析出的论文数据的JSON文件。自行解析BibTeX并构建JSON数组:
json
[
  {
    "pmid": "",
    "title": "The title from BibTeX",
    "authors": [
      {"lastName": "Smith", "firstName": "John"}
    ],
    "journal": "Journal Name",
    "journalAbbr": "",
    "pubdate": "2022",
    "volume": "14",
    "issue": "4",
    "pages": "1054",
    "doi": "",
    "pdfUrl": "https://example.com/paper.pdf",
    "abstract": "",
    "keywords": [],
    "language": "en",
    "pubtype": ["Journal Article"]
  }
]
重要提示:从搜索结果的
fullTextUrl
字段(由gs-search提取的PDF链接)设置
pdfUrl
。Python脚本将下载PDF并通过
/connector/saveAttachment
上传至Zotero(Zotero 7.x会忽略saveItems中的附件)。部分出版商可能导致PDF下载失败(403错误、JS重定向),此类情况会被标记为“PDF跳过”。
BibTeX字段映射关系:
  • @article{key,
    itemType: journalArticle
  • @inproceedings{key,
    itemType: conferencePaper
  • @book{key,
    itemType: book
  • title={...}
    title
  • author={Last1, First1 and Last2, First2}
    authors
    数组
  • journal={...}
    journal
  • year={...}
    pubdate
  • volume={...}
    volume
  • number={...}
    issue
  • pages={...}
    pages
  • publisher={...}
    →(包含在extra字段或publisher字段中)

Step 3: Report

步骤3:导出报告

Single paper:
Exported to Zotero from Google Scholar:
  Title: {title}
  Authors: {authors}
  Journal: {journal} ({year})
  Data-CID: {dataCid}
Batch:
Exported {count} papers to Zotero from Google Scholar:
  1. {title1} ({journal1}, {year1})
  2. {title2} ({journal2}, {year2})
  ...
单篇论文导出报告:
已从Google Scholar导出至Zotero:
  标题:{title}
  作者:{authors}
  期刊:{journal} ({year})
  Data-CID:{dataCid}
批量导出报告:
已从Google Scholar导出{count}篇论文至Zotero:
  1. {title1} ({journal1}, {year1})
  2. {title2} ({journal2}, {year2})
  ...

Batch Export Optimization

批量导出优化方案

For multiple papers, process sequentially to avoid CAPTCHA:
  1. Get all BibTeX links in one evaluate_script call (fetch all cite dialogs)
  2. Navigate to each BibTeX URL one at a time
  3. Collect all BibTeX entries
  4. Push all to Zotero in a single batch
针对多篇论文,按顺序处理以避免触发CAPTCHA:
  1. 通过一次evaluate_script调用获取所有BibTeX链接(获取所有引用对话框)
  2. 依次跳转到每个BibTeX链接页面
  3. 收集所有BibTeX条目
  4. 一次性批量推送至Zotero

Notes

注意事项

  • Single paper export uses 3-4 tool calls:
    evaluate_script
    (cite dialog) +
    navigate_page
    (BibTeX URL) +
    evaluate_script
    (read BibTeX) +
    bash python
    (Zotero push)
  • Batch export: 2N+1 tool calls (N papers: N navigate + N evaluate + 1 bash)
  • BibTeX links are on
    scholar.googleusercontent.com
    — CORS blocks fetch(), so we use navigate_page to bypass
  • Reuses
    push_to_zotero.py
    for Zotero Connector API communication
  • Google Scholar BibTeX does NOT include abstract or DOI — these fields will be empty in Zotero
  • After export, navigate back to Google Scholar page:
    navigate_page
    with type
    back
  • 单篇论文导出需使用3-4次工具调用:
    evaluate_script
    (获取引用对话框)+
    navigate_page
    (跳转至BibTeX链接)+
    evaluate_script
    (读取BibTeX内容)+
    bash python
    (推送至Zotero)
  • 批量导出:需使用2N+1次工具调用(N篇论文:N次跳转 + N次读取 + 1次bash调用)
  • BibTeX链接位于
    scholar.googleusercontent.com
    域名下——CORS会阻止fetch()请求,因此我们使用navigate_page来绕过限制
  • 复用
    push_to_zotero.py
    脚本与Zotero Connector API进行通信
  • Google Scholar的BibTeX不包含摘要或DOI信息——这些字段在Zotero中会为空
  • 导出完成后,跳转回Google Scholar页面:调用
    navigate_page
    并设置类型为
    back