blockchain-spider-toolkit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

BlockchainSpider — on-chain data collection toolkit

BlockchainSpider — 链上数据收集工具包

Reference skill. This repository does not vendor BlockchainSpider; read the upstream README and docs for install, spiders, and options.
参考技能。本仓库并未内置BlockchainSpider;请阅读上游README文档了解安装、爬虫及相关选项。
  • 仓库地址: github.com/wuzhy1ng/BlockchainSpider(MIT许可证)
  • 技术栈: Python、Scrapy爬虫,默认输出CSV/JSON格式文件至
    ./data
    目录(路径可按项目配置调整)。

What it is for

适用场景

Typical capabilities described in the project (confirm against current docs):
AreaExamples
Transfer subgraphMoney-flow graph centered on a source address or transaction (e.g.
txs.blockscan
-style crawls).
EVM transactionsBlock ranges, latest listener, receipts, logs, token transfers; multiple EVM-compatible providers.
SolanaSlot ranges or live streams via JSON-RPC providers.
LabelsOptional plugins (e.g. label-oriented crawls)—scope and ethics depend on source and law.
Academic background appears in project references (e.g. TRacer / transaction semantics papers—see repo Reference section).
项目中描述的典型功能(请以当前文档为准):
领域示例
转账子图源地址或交易为中心的资金流向图(例如
txs.blockscan
风格的爬取)。
EVM交易区块范围、最新区块监听、收据、日志、代币转账;支持多个兼容EVM的提供商
Solana通过JSON-RPC提供商实现插槽范围爬取或实时流数据获取。
标签可选插件(例如面向标签的爬取)——范围和合规性取决于数据源法律规定
项目参考资料中包含学术背景(例如TRacer / 交易语义相关论文——请查看仓库的Reference章节)。

Prerequisites

前置条件

  • Python environment and
    pip install -r requirements.txt
    from the cloned repo.
  • RPC / indexer API endpoints you are authorized to use (respect ToS, rate limits, and billing).
  • API keys for third-party explorers (Etherscan-class APIs, etc.) must be supplied by you—never commit keys or paste live keys into chats.
  • Python环境,以及从克隆的仓库中执行
    pip install -r requirements.txt
    安装依赖。
  • 已获授权使用的RPC / 索引器API端点(请遵守服务条款(ToS)、速率限制计费规则)。
  • 第三方浏览器(如Etherscan类API)的API密钥必须由您自行提供——切勿提交密钥或将可用密钥粘贴到聊天中。

Example command shapes (placeholders only)

示例命令格式(仅为占位符)

Upstream examples use
scrapy crawl <spider> -a ...
. Illustrative patterns (replace placeholders):
bash
undefined
上游示例使用
scrapy crawl <spider> -a ...
格式。以下为示例模式(请替换占位符):
bash
undefined

EVM transfer / subgraph style (example spider name from upstream docs)

EVM转账/子图风格(示例爬虫名称来自上游文档)

scrapy crawl txs.blockscan -a source=<ADDRESS> -a apikeys=<YOUR_ETHERSCAN_API_KEY> -a endpoint=<ETHERSCAN_COMPATIBLE_API_URL>
scrapy crawl txs.blockscan -a source=<ADDRESS> -a apikeys=<YOUR_ETHERSCAN_API_KEY> -a endpoint=<ETHERSCAN_COMPATIBLE_API_URL>

EVM blocks / transactions over a range

指定区块范围的EVM区块/交易爬取

scrapy crawl trans.block.evm -a start_blk=<N> -a end_blk=<M> -a providers=<YOUR_ETH_HTTP_RPC_URL>
scrapy crawl trans.block.evm -a start_blk=<N> -a end_blk=<M> -a providers=<YOUR_ETH_HTTP_RPC_URL>

Solana slot range

Solana插槽范围爬取

scrapy crawl trans.block.solana -a start_slot=<S1> -a end_slot=<S2> -a providers=<YOUR_SOLANA_JSON_RPC_URL>

Exact **spider names** and **arguments** change with releases—always copy from the **current** README.
scrapy crawl trans.block.solana -a start_slot=<S1> -a end_slot=<S2> -a providers=<YOUR_SOLANA_JSON_RPC_URL>

具体的**爬虫名称**和**参数**会随版本更新而变化——请始终从**当前**README中复制。

How to combine with blockint

如何与blockint结合使用

TaskSkill
High-level analytics / AML contextblockchain-analytics-operations
Solana forensic tracing methodologysolana-tracing-specialist
Multi-chain clusteringcross-chain-clustering-techniques-agent
Web surface crawling (HTTP), not chain RPCkatana-web-crawling
任务技能
高级分析/反洗钱场景blockchain-analytics-operations
Solana取证追踪方法论solana-tracing-specialist
跨链聚类cross-chain-clustering-techniques-agent
Web表面爬取(HTTP),非链上RPCkatana-web-crawling

Guardrails

约束规则

  • Lawful use only — comply with sanctions, privacy, and computer misuse rules in your jurisdiction; do not use spiders to harass or dox.
  • Darknet / sensitive label sources — some demo commands in upstream docs point to Tor or sensitive data sources; obtain legal and security approval before running.
  • Do not store or share API keys, customer identifiers, or non-public investigation exports in public repos.
  • Outputs are raw or heuristic—validate critical facts against primary chain data.
  • 仅合法使用——遵守您所在地区的制裁规定隐私法计算机滥用相关法规;请勿使用爬虫进行骚扰人肉搜索
  • 暗网/敏感标签数据源——上游文档中的部分演示命令指向Tor或敏感数据源;运行前需获得法律安全审批。
  • 请勿公共仓库中存储或分享API密钥客户标识符或非公开调查导出数据。
  • 输出结果为原始数据启发式数据——请针对关键事实,对照原始链上数据进行验证。

Related research codebase

相关研究代码库

  • mots-transaction-semanticsMoTS (WWW 2023 “Know Your Transactions”); upstream notes MoTS merged into BlockchainSpider—use MoTS skill for legacy spider names (
    blocks.eth
    ,
    blocks.semantic.eth
    ,
    labels.action
    ) and the bundled PDF.
Goal: a stable pointer and safe usage framing for BlockchainSpider inside blockint workflows.
  • mots-transaction-semanticsMoTS(WWW 2023《Know Your Transactions》);上游说明提到MoTS已合并至BlockchainSpider——如需使用旧版爬虫名称(
    blocks.eth
    blocks.semantic.eth
    labels.action
    )及附带PDF,请使用MoTS技能。
目标: 在blockint工作流中为BlockchainSpider提供一个稳定的指引和安全的使用框架。