cnki-parse-results
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCNKI Parse Search Results
CNKI 解析搜索结果
Extract structured paper data from the current CNKI search results page.
从当前CNKI搜索结果页提取结构化论文数据。
Prerequisites
前置条件
The current Chrome page must be a CNKI search results page (URL contains and page shows "条结果").
kns.cnki.net当前Chrome页面必须是CNKI搜索结果页(URL包含且页面显示“条结果”)。
kns.cnki.netSteps
步骤
1. Verify we are on a results page
1. 验证当前处于结果页
Use . Verify the page contains "条结果". If not, inform the user that no search results page is currently open.
mcp__chrome-devtools__take_snapshotCheck for captcha ("拖动下方拼图完成验证") - if found, notify user to solve it manually.
使用。验证页面包含“条结果”,如果不存在,告知用户当前未打开搜索结果页。
mcp__chrome-devtools__take_snapshot检查是否存在验证码(“拖动下方拼图完成验证”)——如果发现验证码,通知用户手动完成验证。
2. Extract results via JavaScript
2. 通过JavaScript提取结果
Use with this function:
mcp__chrome-devtools__evaluate_scriptjavascript
() => {
const rows = document.querySelectorAll('.result-table-list tbody tr');
const checkboxes = document.querySelectorAll('.result-table-list tbody input.cbItem');
const results = Array.from(rows).map((row, index) => {
const nameCell = row.querySelector('td.name');
const titleLink = nameCell?.querySelector('a.fz14');
const authorCell = row.querySelector('td.author');
const sourceCell = row.querySelector('td.source');
const dateCell = row.querySelector('td.date');
const dataCell = row.querySelector('td.data');
const quoteCell = row.querySelector('td.quote');
const downloadCell = row.querySelector('td.download');
const isOnlineFirst = !!nameCell?.querySelector('.marktip');
return {
number: index + 1,
title: titleLink?.innerText?.trim() || '',
url: titleLink?.href || '',
exportId: checkboxes[index]?.value || '',
authors: Array.from(authorCell?.querySelectorAll('a.KnowledgeNetLink') || []).map(a => a.innerText?.trim()),
journal: sourceCell?.querySelector('a')?.innerText?.trim() || '',
date: dateCell?.innerText?.trim() || '',
database: dataCell?.innerText?.trim() || '',
citations: quoteCell?.innerText?.trim() || '',
downloads: downloadCell?.innerText?.trim() || '',
isOnlineFirst: isOnlineFirst
};
});
const totalText = document.querySelector('.pagerTitleCell')?.innerText || '';
const totalMatch = totalText.match(/([\d,]+)/);
const pageInfo = document.querySelector('.countPageMark')?.innerText || '';
return {
papers: results,
totalCount: totalMatch ? totalMatch[1] : 'unknown',
pageInfo: pageInfo
};
}使用运行以下函数:
mcp__chrome-devtools__evaluate_scriptjavascript
() => {
const rows = document.querySelectorAll('.result-table-list tbody tr');
const checkboxes = document.querySelectorAll('.result-table-list tbody input.cbItem');
const results = Array.from(rows).map((row, index) => {
const nameCell = row.querySelector('td.name');
const titleLink = nameCell?.querySelector('a.fz14');
const authorCell = row.querySelector('td.author');
const sourceCell = row.querySelector('td.source');
const dateCell = row.querySelector('td.date');
const dataCell = row.querySelector('td.data');
const quoteCell = row.querySelector('td.quote');
const downloadCell = row.querySelector('td.download');
const isOnlineFirst = !!nameCell?.querySelector('.marktip');
return {
number: index + 1,
title: titleLink?.innerText?.trim() || '',
url: titleLink?.href || '',
exportId: checkboxes[index]?.value || '',
authors: Array.from(authorCell?.querySelectorAll('a.KnowledgeNetLink') || []).map(a => a.innerText?.trim()),
journal: sourceCell?.querySelector('a')?.innerText?.trim() || '',
date: dateCell?.innerText?.trim() || '',
database: dataCell?.innerText?.trim() || '',
citations: quoteCell?.innerText?.trim() || '',
downloads: downloadCell?.innerText?.trim() || '',
isOnlineFirst: isOnlineFirst
};
});
const totalText = document.querySelector('.pagerTitleCell')?.innerText || '';
const totalMatch = totalText.match(/([\d,]+)/);
const pageInfo = document.querySelector('.countPageMark')?.innerText || '';
return {
papers: results,
totalCount: totalMatch ? totalMatch[1] : 'unknown',
pageInfo: pageInfo
};
}3. Present results
3. 展示结果
Format as a numbered list:
CNKI search results ({totalCount} total, page {pageInfo}):
1. {title} {isOnlineFirst ? "[网络首发]" : ""}
Authors: {authors joined by "; "}
Journal: {journal} | Date: {date} | Type: {database}
Citations: {citations} | Downloads: {downloads}
URL: {url}
2. ...格式化为编号列表:
CNKI search results ({totalCount} total, page {pageInfo}):
1. {title} {isOnlineFirst ? "[网络首发]" : ""}
Authors: {authors joined by "; "}
Journal: {journal} | Date: {date} | Type: {database}
Citations: {citations} | Downloads: {downloads}
URL: {url}
2. ...4. Fallback: snapshot-based parsing
4. 兜底方案:基于快照的解析
If JavaScript returns empty (DOM structure changed), use and parse the accessibility tree manually:
mcp__chrome-devtools__take_snapshotLook for the repeating pattern:
- →
checkbox(number) →StaticTextwith URL containinglink(title) →kcms2/article/abstracts with URL containinglink(authors) →kcms2/author/detailwith URL containinglink(journal) →navi.cnki.net/knavi/detail(date) →StaticText(database type)StaticText
如果JavaScript返回为空(DOM结构变更),使用并手动解析可访问性树:
mcp__chrome-devtools__take_snapshot查找重复模式:
- →
checkbox(编号)→ URL包含StaticText的kcms2/article/abstract(标题)→ URL包含link的kcms2/author/detail(作者)→ URL包含link的navi.cnki.net/knavi/detail(期刊)→link(日期)→StaticText(数据库类型)StaticText
Verified DOM Selectors (CNKI uses jQuery, stable semantic class names)
已验证的DOM选择器(CNKI使用jQuery,语义类名稳定)
| Data | Selector | Notes |
|---|---|---|
| Table | | Each row = one paper |
| Checkbox | | value = export encrypted ID |
| Number | | Row sequence number |
| Title | | Paper title link |
| Authors | | Author name links |
| Journal | | Journal/source link |
| Date | | Publication date text |
| DB Type | | Database type (期刊/学位论文) |
| Citations | | Citation count |
| Downloads | | Download count |
| Online 1st | | "网络首发" label |
| Total | | "共找到 X 条结果" |
| Page | | "1/300" format |
| 数据 | 选择器 | 备注 |
|---|---|---|
| 表格 | | 每行对应一篇论文 |
| 复选框 | | value = 导出加密ID |
| 编号 | | 行序号 |
| 标题 | | 论文标题链接 |
| 作者 | | 作者姓名链接 |
| 期刊 | | 期刊/来源链接 |
| 日期 | | 出版日期文本 |
| 数据库类型 | | 数据库类型(期刊/学位论文) |
| 引用量 | | 引用次数 |
| 下载量 | | 下载次数 |
| 网络首发 | | “网络首发”标签 |
| 总数 | | “共找到 X 条结果” |
| 页码 | | “1/300”格式 |