blockchain-spider-toolkit
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBlockchainSpider — on-chain data collection toolkit
BlockchainSpider — 链上数据收集工具包
Reference skill. This repository does not vendor BlockchainSpider; read the upstream README and docs for install, spiders, and options.
- Repository: github.com/wuzhy1ng/BlockchainSpider (MIT license)
- Stack: Python, Scrapy spiders, CSV/JSON outputs under by default (paths per project config).
./data
- 仓库地址: github.com/wuzhy1ng/BlockchainSpider(MIT许可证)
- 技术栈: Python、Scrapy爬虫,默认输出CSV/JSON格式文件至目录(路径可按项目配置调整)。
./data
What it is for
适用场景
Typical capabilities described in the project (confirm against current docs):
| Area | Examples |
|---|---|
| Transfer subgraph | Money-flow graph centered on a source address or transaction (e.g. |
| EVM transactions | Block ranges, latest listener, receipts, logs, token transfers; multiple EVM-compatible providers. |
| Solana | Slot ranges or live streams via JSON-RPC providers. |
| Labels | Optional plugins (e.g. label-oriented crawls)—scope and ethics depend on source and law. |
Academic background appears in project references (e.g. TRacer / transaction semantics papers—see repo Reference section).
项目中描述的典型功能(请以当前文档为准):
| 领域 | 示例 |
|---|---|
| 转账子图 | 以源地址或交易为中心的资金流向图(例如 |
| EVM交易 | 区块范围、最新区块监听、收据、日志、代币转账;支持多个兼容EVM的提供商。 |
| Solana | 通过JSON-RPC提供商实现插槽范围爬取或实时流数据获取。 |
| 标签 | 可选插件(例如面向标签的爬取)——范围和合规性取决于数据源和法律规定。 |
项目参考资料中包含学术背景(例如TRacer / 交易语义相关论文——请查看仓库的Reference章节)。
Prerequisites
前置条件
- Python environment and from the cloned repo.
pip install -r requirements.txt - RPC / indexer API endpoints you are authorized to use (respect ToS, rate limits, and billing).
- API keys for third-party explorers (Etherscan-class APIs, etc.) must be supplied by you—never commit keys or paste live keys into chats.
- Python环境,以及从克隆的仓库中执行安装依赖。
pip install -r requirements.txt - 您已获授权使用的RPC / 索引器API端点(请遵守服务条款(ToS)、速率限制和计费规则)。
- 第三方浏览器(如Etherscan类API)的API密钥必须由您自行提供——切勿提交密钥或将可用密钥粘贴到聊天中。
Example command shapes (placeholders only)
示例命令格式(仅为占位符)
Upstream examples use . Illustrative patterns (replace placeholders):
scrapy crawl <spider> -a ...bash
undefined上游示例使用格式。以下为示例模式(请替换占位符):
scrapy crawl <spider> -a ...bash
undefinedEVM transfer / subgraph style (example spider name from upstream docs)
EVM转账/子图风格(示例爬虫名称来自上游文档)
scrapy crawl txs.blockscan -a source=<ADDRESS> -a apikeys=<YOUR_ETHERSCAN_API_KEY> -a endpoint=<ETHERSCAN_COMPATIBLE_API_URL>
scrapy crawl txs.blockscan -a source=<ADDRESS> -a apikeys=<YOUR_ETHERSCAN_API_KEY> -a endpoint=<ETHERSCAN_COMPATIBLE_API_URL>
EVM blocks / transactions over a range
指定区块范围的EVM区块/交易爬取
scrapy crawl trans.block.evm -a start_blk=<N> -a end_blk=<M> -a providers=<YOUR_ETH_HTTP_RPC_URL>
scrapy crawl trans.block.evm -a start_blk=<N> -a end_blk=<M> -a providers=<YOUR_ETH_HTTP_RPC_URL>
Solana slot range
Solana插槽范围爬取
scrapy crawl trans.block.solana -a start_slot=<S1> -a end_slot=<S2> -a providers=<YOUR_SOLANA_JSON_RPC_URL>
Exact **spider names** and **arguments** change with releases—always copy from the **current** README.scrapy crawl trans.block.solana -a start_slot=<S1> -a end_slot=<S2> -a providers=<YOUR_SOLANA_JSON_RPC_URL>
具体的**爬虫名称**和**参数**会随版本更新而变化——请始终从**当前**README中复制。How to combine with blockint
如何与blockint结合使用
| Task | Skill |
|---|---|
| High-level analytics / AML context | blockchain-analytics-operations |
| Solana forensic tracing methodology | solana-tracing-specialist |
| Multi-chain clustering | cross-chain-clustering-techniques-agent |
| Web surface crawling (HTTP), not chain RPC | katana-web-crawling |
| 任务 | 技能 |
|---|---|
| 高级分析/反洗钱场景 | blockchain-analytics-operations |
| Solana取证追踪方法论 | solana-tracing-specialist |
| 跨链聚类 | cross-chain-clustering-techniques-agent |
| Web表面爬取(HTTP),非链上RPC | katana-web-crawling |
Guardrails
约束规则
- Lawful use only — comply with sanctions, privacy, and computer misuse rules in your jurisdiction; do not use spiders to harass or dox.
- Darknet / sensitive label sources — some demo commands in upstream docs point to Tor or sensitive data sources; obtain legal and security approval before running.
- Do not store or share API keys, customer identifiers, or non-public investigation exports in public repos.
- Outputs are raw or heuristic—validate critical facts against primary chain data.
- 仅合法使用——遵守您所在地区的制裁规定、隐私法和计算机滥用相关法规;请勿使用爬虫进行骚扰或人肉搜索。
- 暗网/敏感标签数据源——上游文档中的部分演示命令指向Tor或敏感数据源;运行前需获得法律和安全审批。
- 请勿在公共仓库中存储或分享API密钥、客户标识符或非公开调查导出数据。
- 输出结果为原始数据或启发式数据——请针对关键事实,对照原始链上数据进行验证。
Related research codebase
相关研究代码库
- mots-transaction-semantics — MoTS (WWW 2023 “Know Your Transactions”); upstream notes MoTS merged into BlockchainSpider—use MoTS skill for legacy spider names (,
blocks.eth,blocks.semantic.eth) and the bundled PDF.labels.action
Goal: a stable pointer and safe usage framing for BlockchainSpider inside blockint workflows.
- mots-transaction-semantics — MoTS(WWW 2023《Know Your Transactions》);上游说明提到MoTS已合并至BlockchainSpider——如需使用旧版爬虫名称(、
blocks.eth、blocks.semantic.eth)及附带PDF,请使用MoTS技能。labels.action
目标: 在blockint工作流中为BlockchainSpider提供一个稳定的指引和安全的使用框架。