sec-edgar-pipeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSEC EDGAR Pipeline
SEC EDGAR 数据处理流水线
Overview
概述
This pipeline is centered on and the EDGAR data sources. The core loop is: configure credentials, create a project with examples, analyze patterns, generate code, run extraction, and export reports.
edgar-analyzer该流水线围绕和EDGAR数据源构建。核心流程为:配置凭证、创建带示例的项目、分析模式、生成代码、运行提取任务以及导出报告。
edgar-analyzerSetup (Keys + User Agent)
配置设置(密钥 + 用户代理)
Use the setup wizard to configure required keys:
bash
python -m edgar_analyzer setup使用设置向导配置所需密钥:
bash
python -m edgar_analyzer setupor
或
edgar-analyzer setup
Required entries:
- `OPENROUTER_API_KEY`
- (Optional) `JINA_API_KEY`
- `EDGAR` user agent string ("Name email@example.com")edgar-analyzer setup
必填项:
- `OPENROUTER_API_KEY`
-(可选)`JINA_API_KEY`
- `EDGAR`用户代理字符串(格式为“姓名 email@example.com”)End-to-End CLI Workflow
端到端CLI工作流
bash
undefinedbash
undefined1. Create project
1. 创建项目
edgar-analyzer project create my_project --template minimal
edgar-analyzer project create my_project --template minimal
2. Add examples + project.yaml
2. 添加示例文件 + 配置project.yaml
projects/my_project/examples/*.json
路径:projects/my_project/examples/*.json
3. Analyze examples
3. 分析示例文件
edgar-analyzer analyze-project projects/my_project
edgar-analyzer analyze-project projects/my_project
4. Generate extraction code
4. 生成提取代码
edgar-analyzer generate-code projects/my_project
edgar-analyzer generate-code projects/my_project
5. Run extraction
5. 运行提取任务
edgar-analyzer run-extraction projects/my_project --output-format csv
Outputs land in `projects/<name>/output/`.edgar-analyzer run-extraction projects/my_project --output-format csv
输出文件将保存至`projects/<name>/output/`目录下。EDGAR-Specific Conventions
EDGAR专属规范
- CIK values are 10-digit, zero-padded (e.g., ).
0000320193 - Rate limit: SEC API allows 10 requests/sec. Scripts use ~0.11s delays.
- User agent is mandatory; include name + email.
- CIK值为10位数字,不足位数需补零(例如:)。
0000320193 - 速率限制:SEC API允许每秒10次请求,脚本默认设置约0.11秒的请求延迟。
- 用户代理为必填项,需包含姓名+邮箱信息。
Scripted Example (Apple DEF 14A)
脚本示例(苹果公司DEF 14A文件)
edgar/scripts/fetch_apple_def14a.py- Fetch latest DEF 14A metadata
- Download HTML
- Parse Summary Compensation Table (SCT)
- Save raw HTML + extracted JSON + ground truth
edgar/scripts/fetch_apple_def14a.py- 获取最新DEF 14A文件元数据
- 下载HTML文件
- 解析摘要薪酬表(SCT)
- 保存原始HTML文件 + 提取后的JSON数据 + 基准真值数据
Recipe-Driven Extraction
基于规则的提取
edgar/recipes/sct_extraction/config.yaml- Fetch DEF 14A filings by company list
- Extract SCT tables with
SCTAdapter - Validate with
sct_validator - Write results to
output/sct
edgar/recipes/sct_extraction/config.yaml- 按公司列表获取DEF 14A申报文件
- 使用提取SCT表格数据
SCTAdapter - 通过验证数据
sct_validator - 将结果写入目录
output/sct
Report Generation
报告生成
edgar/scripts/create_csv_reports.pyexecutive_compensation_<timestamp>.csvtop_25_executives_<timestamp>.csvcompany_summary_<timestamp>.csv
edgar/scripts/create_csv_reports.pyexecutive_compensation_<timestamp>.csvtop_25_executives_<timestamp>.csvcompany_summary_<timestamp>.csv
Troubleshooting
故障排查
- No filings found: confirm CIK formatting and filing type (DEF 14A vs DEF 14A/A).
- API errors: slow down requests and confirm user-agent is set.
- Extraction errors: regenerate code or use manual ground truth in POC scripts.
- 未找到申报文件:确认CIK格式是否正确,以及申报文件类型是否匹配(区分DEF 14A与DEF 14A/A)。
- API调用错误:降低请求速率,确认用户代理信息已正确设置。
- 提取错误:重新生成提取代码,或在POC脚本中使用手动配置的基准真值数据。
Related Skills
相关技能
universal/data/reporting-pipelinestoolchains/python/testing/pytest
universal/data/reporting-pipelinestoolchains/python/testing/pytest