cnki-parse-results

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

CNKI Parse Search Results

CNKI 解析搜索结果

Extract structured paper data from the current CNKI search results page.
从当前CNKI搜索结果页提取结构化论文数据。

Prerequisites

前置条件

The current Chrome page must be a CNKI search results page (URL contains
kns.cnki.net
and page shows "条结果").
当前Chrome页面必须是CNKI搜索结果页(URL包含
kns.cnki.net
且页面显示“条结果”)。

Steps

步骤

1. Verify we are on a results page

1. 验证当前处于结果页

Use
mcp__chrome-devtools__take_snapshot
. Verify the page contains "条结果". If not, inform the user that no search results page is currently open.
Check for captcha ("拖动下方拼图完成验证") - if found, notify user to solve it manually.
使用
mcp__chrome-devtools__take_snapshot
。验证页面包含“条结果”,如果不存在,告知用户当前未打开搜索结果页。
检查是否存在验证码(“拖动下方拼图完成验证”)——如果发现验证码,通知用户手动完成验证。

2. Extract results via JavaScript

2. 通过JavaScript提取结果

Use
mcp__chrome-devtools__evaluate_script
with this function:
javascript
() => {
  const rows = document.querySelectorAll('.result-table-list tbody tr');
  const checkboxes = document.querySelectorAll('.result-table-list tbody input.cbItem');
  const results = Array.from(rows).map((row, index) => {
    const nameCell = row.querySelector('td.name');
    const titleLink = nameCell?.querySelector('a.fz14');
    const authorCell = row.querySelector('td.author');
    const sourceCell = row.querySelector('td.source');
    const dateCell = row.querySelector('td.date');
    const dataCell = row.querySelector('td.data');
    const quoteCell = row.querySelector('td.quote');
    const downloadCell = row.querySelector('td.download');
    const isOnlineFirst = !!nameCell?.querySelector('.marktip');

    return {
      number: index + 1,
      title: titleLink?.innerText?.trim() || '',
      url: titleLink?.href || '',
      exportId: checkboxes[index]?.value || '',
      authors: Array.from(authorCell?.querySelectorAll('a.KnowledgeNetLink') || []).map(a => a.innerText?.trim()),
      journal: sourceCell?.querySelector('a')?.innerText?.trim() || '',
      date: dateCell?.innerText?.trim() || '',
      database: dataCell?.innerText?.trim() || '',
      citations: quoteCell?.innerText?.trim() || '',
      downloads: downloadCell?.innerText?.trim() || '',
      isOnlineFirst: isOnlineFirst
    };
  });

  const totalText = document.querySelector('.pagerTitleCell')?.innerText || '';
  const totalMatch = totalText.match(/([\d,]+)/);
  const pageInfo = document.querySelector('.countPageMark')?.innerText || '';

  return {
    papers: results,
    totalCount: totalMatch ? totalMatch[1] : 'unknown',
    pageInfo: pageInfo
  };
}
使用
mcp__chrome-devtools__evaluate_script
运行以下函数:
javascript
() => {
  const rows = document.querySelectorAll('.result-table-list tbody tr');
  const checkboxes = document.querySelectorAll('.result-table-list tbody input.cbItem');
  const results = Array.from(rows).map((row, index) => {
    const nameCell = row.querySelector('td.name');
    const titleLink = nameCell?.querySelector('a.fz14');
    const authorCell = row.querySelector('td.author');
    const sourceCell = row.querySelector('td.source');
    const dateCell = row.querySelector('td.date');
    const dataCell = row.querySelector('td.data');
    const quoteCell = row.querySelector('td.quote');
    const downloadCell = row.querySelector('td.download');
    const isOnlineFirst = !!nameCell?.querySelector('.marktip');

    return {
      number: index + 1,
      title: titleLink?.innerText?.trim() || '',
      url: titleLink?.href || '',
      exportId: checkboxes[index]?.value || '',
      authors: Array.from(authorCell?.querySelectorAll('a.KnowledgeNetLink') || []).map(a => a.innerText?.trim()),
      journal: sourceCell?.querySelector('a')?.innerText?.trim() || '',
      date: dateCell?.innerText?.trim() || '',
      database: dataCell?.innerText?.trim() || '',
      citations: quoteCell?.innerText?.trim() || '',
      downloads: downloadCell?.innerText?.trim() || '',
      isOnlineFirst: isOnlineFirst
    };
  });

  const totalText = document.querySelector('.pagerTitleCell')?.innerText || '';
  const totalMatch = totalText.match(/([\d,]+)/);
  const pageInfo = document.querySelector('.countPageMark')?.innerText || '';

  return {
    papers: results,
    totalCount: totalMatch ? totalMatch[1] : 'unknown',
    pageInfo: pageInfo
  };
}

3. Present results

3. 展示结果

Format as a numbered list:
CNKI search results ({totalCount} total, page {pageInfo}):

1. {title} {isOnlineFirst ? "[网络首发]" : ""}
   Authors: {authors joined by "; "}
   Journal: {journal} | Date: {date} | Type: {database}
   Citations: {citations} | Downloads: {downloads}
   URL: {url}

2. ...
格式化为编号列表:
CNKI search results ({totalCount} total, page {pageInfo}):

1. {title} {isOnlineFirst ? "[网络首发]" : ""}
   Authors: {authors joined by "; "}
   Journal: {journal} | Date: {date} | Type: {database}
   Citations: {citations} | Downloads: {downloads}
   URL: {url}

2. ...

4. Fallback: snapshot-based parsing

4. 兜底方案:基于快照的解析

If JavaScript returns empty (DOM structure changed), use
mcp__chrome-devtools__take_snapshot
and parse the accessibility tree manually:
Look for the repeating pattern:
  • checkbox
    StaticText
    (number) →
    link
    with URL containing
    kcms2/article/abstract
    (title) →
    link
    s with URL containing
    kcms2/author/detail
    (authors) →
    link
    with URL containing
    navi.cnki.net/knavi/detail
    (journal) →
    StaticText
    (date) →
    StaticText
    (database type)
如果JavaScript返回为空(DOM结构变更),使用
mcp__chrome-devtools__take_snapshot
并手动解析可访问性树:
查找重复模式:
  • checkbox
    StaticText
    (编号)→ URL包含
    kcms2/article/abstract
    link
    (标题)→ URL包含
    kcms2/author/detail
    link
    (作者)→ URL包含
    navi.cnki.net/knavi/detail
    link
    (期刊)→
    StaticText
    (日期)→
    StaticText
    (数据库类型)

Verified DOM Selectors (CNKI uses jQuery, stable semantic class names)

已验证的DOM选择器(CNKI使用jQuery,语义类名稳定)

DataSelectorNotes
Table
.result-table-list tbody tr
Each row = one paper
Checkbox
input.cbItem
value = export encrypted ID
Number
td.seq
Row sequence number
Title
td.name a.fz14
Paper title link
Authors
td.author a.KnowledgeNetLink
Author name links
Journal
td.source a
Journal/source link
Date
td.date
Publication date text
DB Type
td.data
Database type (期刊/学位论文)
Citations
td.quote
Citation count
Downloads
td.download
Download count
Online 1st
td.name .marktip
"网络首发" label
Total
.pagerTitleCell
"共找到 X 条结果"
Page
.countPageMark
"1/300" format
数据选择器备注
表格
.result-table-list tbody tr
每行对应一篇论文
复选框
input.cbItem
value = 导出加密ID
编号
td.seq
行序号
标题
td.name a.fz14
论文标题链接
作者
td.author a.KnowledgeNetLink
作者姓名链接
期刊
td.source a
期刊/来源链接
日期
td.date
出版日期文本
数据库类型
td.data
数据库类型(期刊/学位论文)
引用量
td.quote
引用次数
下载量
td.download
下载次数
网络首发
td.name .marktip
“网络首发”标签
总数
.pagerTitleCell
“共找到 X 条结果”
页码
.countPageMark
“1/300”格式