cnki-parse-results

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

CNKI Parse Search Results

CNKI 解析搜索结果

Extract structured paper data from the current CNKI search results page.

从当前CNKI搜索结果页提取结构化论文数据。

Prerequisites

前置条件

The current Chrome page must be a CNKI search results page (URL contains

kns.cnki.net

and page shows "条结果").

当前Chrome页面必须是CNKI搜索结果页（URL包含

kns.cnki.net

且页面显示“条结果”）。

Steps

步骤

1. Verify we are on a results page

1. 验证当前处于结果页

Use

mcp__chrome-devtools__take_snapshot

. Verify the page contains "条结果". If not, inform the user that no search results page is currently open.

Check for captcha ("拖动下方拼图完成验证") - if found, notify user to solve it manually.

使用

mcp__chrome-devtools__take_snapshot

。验证页面包含“条结果”，如果不存在，告知用户当前未打开搜索结果页。

检查是否存在验证码（“拖动下方拼图完成验证”）——如果发现验证码，通知用户手动完成验证。

2. Extract results via JavaScript

2. 通过JavaScript提取结果

Use

mcp__chrome-devtools__evaluate_script

with this function:

javascript

() => {
  const rows = document.querySelectorAll('.result-table-list tbody tr');
  const checkboxes = document.querySelectorAll('.result-table-list tbody input.cbItem');
  const results = Array.from(rows).map((row, index) => {
    const nameCell = row.querySelector('td.name');
    const titleLink = nameCell?.querySelector('a.fz14');
    const authorCell = row.querySelector('td.author');
    const sourceCell = row.querySelector('td.source');
    const dateCell = row.querySelector('td.date');
    const dataCell = row.querySelector('td.data');
    const quoteCell = row.querySelector('td.quote');
    const downloadCell = row.querySelector('td.download');
    const isOnlineFirst = !!nameCell?.querySelector('.marktip');

    return {
      number: index + 1,
      title: titleLink?.innerText?.trim() || '',
      url: titleLink?.href || '',
      exportId: checkboxes[index]?.value || '',
      authors: Array.from(authorCell?.querySelectorAll('a.KnowledgeNetLink') || []).map(a => a.innerText?.trim()),
      journal: sourceCell?.querySelector('a')?.innerText?.trim() || '',
      date: dateCell?.innerText?.trim() || '',
      database: dataCell?.innerText?.trim() || '',
      citations: quoteCell?.innerText?.trim() || '',
      downloads: downloadCell?.innerText?.trim() || '',
      isOnlineFirst: isOnlineFirst
    };
  });

  const totalText = document.querySelector('.pagerTitleCell')?.innerText || '';
  const totalMatch = totalText.match(/([\d,]+)/);
  const pageInfo = document.querySelector('.countPageMark')?.innerText || '';

  return {
    papers: results,
    totalCount: totalMatch ? totalMatch[1] : 'unknown',
    pageInfo: pageInfo
  };
}

使用

mcp__chrome-devtools__evaluate_script

运行以下函数：

javascript

() => {
  const rows = document.querySelectorAll('.result-table-list tbody tr');
  const checkboxes = document.querySelectorAll('.result-table-list tbody input.cbItem');
  const results = Array.from(rows).map((row, index) => {
    const nameCell = row.querySelector('td.name');
    const titleLink = nameCell?.querySelector('a.fz14');
    const authorCell = row.querySelector('td.author');
    const sourceCell = row.querySelector('td.source');
    const dateCell = row.querySelector('td.date');
    const dataCell = row.querySelector('td.data');
    const quoteCell = row.querySelector('td.quote');
    const downloadCell = row.querySelector('td.download');
    const isOnlineFirst = !!nameCell?.querySelector('.marktip');

    return {
      number: index + 1,
      title: titleLink?.innerText?.trim() || '',
      url: titleLink?.href || '',
      exportId: checkboxes[index]?.value || '',
      authors: Array.from(authorCell?.querySelectorAll('a.KnowledgeNetLink') || []).map(a => a.innerText?.trim()),
      journal: sourceCell?.querySelector('a')?.innerText?.trim() || '',
      date: dateCell?.innerText?.trim() || '',
      database: dataCell?.innerText?.trim() || '',
      citations: quoteCell?.innerText?.trim() || '',
      downloads: downloadCell?.innerText?.trim() || '',
      isOnlineFirst: isOnlineFirst
    };
  });

  const totalText = document.querySelector('.pagerTitleCell')?.innerText || '';
  const totalMatch = totalText.match(/([\d,]+)/);
  const pageInfo = document.querySelector('.countPageMark')?.innerText || '';

  return {
    papers: results,
    totalCount: totalMatch ? totalMatch[1] : 'unknown',
    pageInfo: pageInfo
  };
}

3. Present results

3. 展示结果

Format as a numbered list:

CNKI search results ({totalCount} total, page {pageInfo}):

1. {title} {isOnlineFirst ? "[网络首发]" : ""}
   Authors: {authors joined by "; "}
   Journal: {journal} | Date: {date} | Type: {database}
   Citations: {citations} | Downloads: {downloads}
   URL: {url}

2. ...

格式化为编号列表：

CNKI search results ({totalCount} total, page {pageInfo}):

1. {title} {isOnlineFirst ? "[网络首发]" : ""}
   Authors: {authors joined by "; "}
   Journal: {journal} | Date: {date} | Type: {database}
   Citations: {citations} | Downloads: {downloads}
   URL: {url}

2. ...

4. Fallback: snapshot-based parsing

4. 兜底方案：基于快照的解析

If JavaScript returns empty (DOM structure changed), use

mcp__chrome-devtools__take_snapshot

and parse the accessibility tree manually:

Look for the repeating pattern:

```
checkbox
```
→
```
StaticText
```
(number) →
```
link
```
with URL containing
```
kcms2/article/abstract
```
(title) →
```
link
```
s with URL containing
```
kcms2/author/detail
```
(authors) →
```
link
```
with URL containing
```
navi.cnki.net/knavi/detail
```
(journal) →
```
StaticText
```
(date) →
```
StaticText
```
(database type)

如果JavaScript返回为空（DOM结构变更），使用

mcp__chrome-devtools__take_snapshot

并手动解析可访问性树：

查找重复模式：

```
checkbox
```
→
```
StaticText
```
（编号）→ URL包含
```
kcms2/article/abstract
```
的
```
link
```
（标题）→ URL包含
```
kcms2/author/detail
```
的
```
link
```
（作者）→ URL包含
```
navi.cnki.net/knavi/detail
```
的
```
link
```
（期刊）→
```
StaticText
```
（日期）→
```
StaticText
```
（数据库类型）

Verified DOM Selectors (CNKI uses jQuery, stable semantic class names)

已验证的DOM选择器（CNKI使用jQuery，语义类名稳定）

Data	Selector	Notes
Table	`.result-table-list tbody tr`	Each row = one paper
Checkbox	`input.cbItem`	value = export encrypted ID
Number	`td.seq`	Row sequence number
Title	`td.name a.fz14`	Paper title link
Authors	`td.author a.KnowledgeNetLink`	Author name links
Journal	`td.source a`	Journal/source link
Date	`td.date`	Publication date text
DB Type	`td.data`	Database type (期刊/学位论文)
Citations	`td.quote`	Citation count
Downloads	`td.download`	Download count
Online 1st	`td.name .marktip`	"网络首发" label
Total	`.pagerTitleCell`	"共找到 X 条结果"
Page	`.countPageMark`	"1/300" format

数据	选择器	备注
表格	`.result-table-list tbody tr`	每行对应一篇论文
复选框	`input.cbItem`	value = 导出加密ID
编号	`td.seq`	行序号
标题	`td.name a.fz14`	论文标题链接
作者	`td.author a.KnowledgeNetLink`	作者姓名链接
期刊	`td.source a`	期刊/来源链接
日期	`td.date`	出版日期文本
数据库类型	`td.data`	数据库类型（期刊/学位论文）
引用量	`td.quote`	引用次数
下载量	`td.download`	下载次数
网络首发	`td.name .marktip`	“网络首发”标签
总数	`.pagerTitleCell`	“共找到 X 条结果”
页码	`.countPageMark`	“1/300”格式