bigdata-skill

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Bigdata.com SDK + REST Toolkit

Bigdata.com SDK + REST工具包

Get the structured substrate the Bigdata.com MCP server doesn't hand over. The MCP returns clean prose and pre-synthesized tearsheets, but its search tool gives chunks with no per-chunk sentiment or entity spans, and its tearsheets give aggregate values — not the fiscal-period time series, universe screener, or per-field JSON you'd build a pipeline on. The official

bigdata-client

SDK plus a thin REST passthrough over the same backend, same JWT reach the official

/v1/*

endpoints that hold it. This skill bundles a toolkit that does exactly that — already debugged, already cost-guarded — so you don't re-pay the discovery cost.

获取Bigdata.com MCP服务器未提供的结构化底层数据。MCP返回的是简洁的文本内容和预合成的分析报告，但其搜索工具返回的片段无单片段情绪值或实体位置信息，分析报告仅提供汇总值——而非您构建数据管道所需的财季时间序列、市场筛选器或按字段拆分的JSON数据。官方

bigdata-client

SDK加上基于同一后端、同一JWT的轻量REST透传层，可访问存储这些数据的官方

/v1/*

端点。本技能封装的工具包正是为此而生——已完成调试、已实现成本控制，让您无需重复付出探索成本。

The core problem this solves (read this first)

本技能解决的核心问题（请先阅读）

The Bigdata MCP server answers "what's the sentiment around NVIDIA?" with a readable paragraph or a pre-synthesized tearsheet — genuinely useful for a chat turn. But the moment you need the machine-readable substrate to build a pipeline on, the MCP doesn't hand it over:

its search tool returns chunks with text + relevance only — no per-chunk sentiment number, no entity character spans;
its tearsheets give aggregate values (a single sentiment score, a summary of estimates) — not a fiscal-period time series you can compute on, a universe screener, or per-field JSON.

The fix is a general pattern, not a Bigdata trick:

When an MCP data source returns only synthesized output but you need the structured fields underneath, drop to the vendor SDK or REST. MCP optimizes for a chat turn, not a pipeline.

Crucially, for Bigdata these structured fields are official, publicly documented REST endpoints (

docs.bigdata.com/api-reference/...

), not a hidden backend — and Bigdata is sunsetting the SDK (EOL 2026-12-31) in favour of this REST API, so the REST layer here is the forward-compatible path, not a hack. The SDK (

bigdata_client.Bigdata

) covers search + knowledge-graph; bd._api.http
reaches every

/v1/*

endpoint the SDK never wrapped. The bundled

bigdata_toolkit

packages both behind one

BigdataClient

当您询问“NVIDIA的市场情绪如何？”时，Bigdata MCP服务器会返回一段易读的段落或预合成的分析报告——这在聊天场景中确实有用。但当您需要机器可读的底层数据来构建数据管道时，MCP无法提供：

其搜索工具仅返回包含文本+相关性评分的片段——无单片段情绪数值、无实体字符位置信息；
其分析报告提供的是汇总值（单一情绪评分、预测摘要）——而非可计算的财季时间序列、市场筛选器或按字段拆分的JSON数据。

解决方案是通用模式，而非针对Bigdata的技巧：

当MCP数据源仅返回合成输出，而您需要底层结构化字段时，直接使用厂商SDK或REST接口。 MCP针对聊天场景优化，而非数据管道。

至关重要的是，Bigdata的这些结构化字段来自官方公开文档的REST端点（

docs.bigdata.com/api-reference/...

），并非隐藏后端——而且Bigdata正逐步淘汰SDK（终止日期2026-12-31），转而推荐该REST API，因此此处的REST层是向前兼容的方案，而非 hack 手段。SDK（

bigdata_client.Bigdata

）覆盖搜索+知识图谱；**

bd._api.http

**可访问SDK未封装的所有

/v1/*

端点。封装的

bigdata_toolkit

将两者整合到一个

BigdataClient

对象中。

When to use this skill

何时使用本技能

Trigger on any of these, in any language:

The user is using Bigdata.com / RavenPack and the MCP result feels thin — "where's the sentiment score?", "I need entity-level data", "the calendar".
They want forward / structured financials for a ticker: analyst estimates, earnings or event calendar, earnings surprise, analyst ratings, price targets, a company screener / universe.
They want annotated news chunks with numeric sentiment + entity spans, or a sentiment time series / co-mention graph.
They mention a bd_v2_
API key,
```
rp_entity_id
```
,
```
query_unit
```
/ chunk cost,
```
bigdata-client
```
, or "the bigdata MCP isn't enough".
They're building an investment-research dataset and need a reusable, cost-aware data-pull layer rather than one-off MCP calls.

当出现以下任一情况时触发（支持任意语言）：

用户正在使用Bigdata.com / RavenPack，且MCP返回结果不够详尽——例如询问“情绪评分在哪里？”、“我需要实体级数据”、“事件日历”。
用户需要某一股票代码的前瞻性/结构化金融数据：分析师预测、盈利或事件日历、盈利超预期、分析师评级、目标价、公司筛选器/市场范围。
用户需要带数值情绪值+实体位置信息的带注释新闻片段，或情绪时间序列/共提及图谱。
用户提及**
```
bd_v2_
```
API密钥**、
```
rp_entity_id
```
、
```
query_unit
```
/片段成本、
```
bigdata-client
```
，或表示“bigdata MCP不够用”。
用户正在构建投资研究数据集，需要可复用、成本可控的数据拉取层，而非一次性MCP调用。

Setup (one time)

初始化设置（仅需一次）

1 — API key (never hardcode it). The client fail-fasts if it's missing:

bash

export BIGDATA_API_KEY=bd_v2_xxxxxxxx

2 — An isolated Python env with the official SDK. The bundled toolkit imports

bigdata_client

; install it once:

bash

uv venv .venv --python 3.12
uv pip install --python .venv/bin/python bigdata-client

1 — API密钥（切勿硬编码）。若缺少密钥，客户端会直接报错：

bash

export BIGDATA_API_KEY=bd_v2_xxxxxxxx

2 — 安装官方SDK的独立Python环境。封装的工具包依赖

bigdata_client

；只需安装一次：

bash

uv venv .venv --python 3.12
uv pip install --python .venv/bin/python bigdata-client

Behind a slow/blocked PyPI (e.g. mainland China) add a mirror, and unset any

若PyPI访问缓慢/受限（如中国大陆），可添加镜像源，并在安装步骤中取消出站代理，让uv直接访问索引：

outbound proxy for the install step so uv reaches the index directly:

--index-url https://pypi.tuna.tsinghua.edu.cn/simple

—


**3 — Outbound proxy (only if your network needs one to reach
`api.bigdata.com`).** Two equivalent options — the official SDK accepts both: an
env var, or `BigdataClient(proxy=...)` in code. The env var is simplest:

```bash
export HTTPS_PROXY=http://<host>:<port>     # plus WSS_PROXY for chat/WebSocket

If a proxy does TLS interception (self-signed CA) and you hit SSL handshake errors, the official fix is

BigdataClient(verify_ssl="<proxy-CA>.pem")

— not blind retries.

4 — Make the bundled package importable by putting this skill's

scripts/

PYTHONPATH

(or

sys.path.insert(0, "<this-skill>/scripts")

Smoke-test the whole path (entity resolve + quota are free;

--with-search

adds one ~1 query_unit chunk search):

bash

BIGDATA_API_KEY=bd_v2_xxx PYTHONPATH=scripts .venv/bin/python scripts/probe_example.py


**3 — 出站代理（仅当您的网络需要代理才能访问`api.bigdata.com`时）**。有两种等效方式——官方SDK均支持：环境变量，或在代码中使用`BigdataClient(proxy=...)`。环境变量是最简单的方式：

```bash
export HTTPS_PROXY=http://<host>:<port>     # 若使用聊天/WebSocket，还需设置WSS_PROXY

若代理进行TLS拦截（自签名CA）导致SSL握手错误，官方解决方案是使用

BigdataClient(verify_ssl="<proxy-CA>.pem")

——切勿盲目重试。

4 — 让封装包可被导入，将本技能的

scripts/

目录添加到

PYTHONPATH

（或在代码中添加

sys.path.insert(0, "<本技能路径>/scripts")

）。

完整路径冒烟测试（实体解析和配额查询免费；添加

--with-search

会触发一次约1 query_unit的片段搜索）：

bash

BIGDATA_API_KEY=bd_v2_xxx PYTHONPATH=scripts .venv/bin/python scripts/probe_example.py

Quickstart

快速开始

python

import sys
sys.path.insert(0, "<this-skill>/scripts")          # so `import bigdata_toolkit` resolves
from bigdata_toolkit import (
    BigdataClient, EntityResolver, AnnotatedSearcher,
    StructuredDataREST, CostTracker, CostModel, rc,   # rc = SSL-retry wrapper
)

c  = BigdataClient()                                  # SDK + REST escape hatch, one object
er = EntityResolver(c)
nvda = rc(lambda: er.resolve_id("NVIDIA", country="US"))   # -> 'E09E2B'  (rp_entity_id is the gateway key)

python

import sys
sys.path.insert(0, "<本技能路径>/scripts")          # 确保`import bigdata_toolkit`可被解析
from bigdata_toolkit import (
    BigdataClient, EntityResolver, AnnotatedSearcher,
    StructuredDataREST, CostTracker, CostModel, rc,   # rc = SSL重试包装器
)

c  = BigdataClient()                                  # 整合SDK + REST逃逸接口的单一对象
er = EntityResolver(c)
nvda = rc(lambda: er.resolve_id("NVIDIA", country="US"))   # -> 'E09E2B'  (rp_entity_id是核心密钥)

--- Structured financials the MCP does NOT expose (REST escape hatch) ---

--- MCP未公开的结构化金融数据（REST逃逸接口） ---

rest = StructuredDataREST(c) est = rc(lambda: rest.analyst_estimates(nvda, period="quarter", limit=5)) # forward consensus surp = rc(lambda: rest.latest_surprise(nvda)) # last EPS/revenue surprise cal = rc(lambda: rest.events_calendar(nvda, categories=["earnings-call"], start_date="2026-06-01", end_date="2026-12-31"))

rest = StructuredDataREST(c) est = rc(lambda: rest.analyst_estimates(nvda, period="quarter", limit=5)) # 前瞻性一致预期 surp = rc(lambda: rest.latest_surprise(nvda)) # 最新EPS/营收超预期数据 cal = rc(lambda: rest.events_calendar(nvda, categories=["earnings-call"], start_date="2026-06-01", end_date="2026-12-31"))

--- Annotated chunks the MCP STRIPS: sentiment + entity spans (cost-guarded) ---

--- MCP剥离的带注释片段：情绪值 + 实体位置（成本可控） ---

s = AnnotatedSearcher(c) docs = rc(lambda: s.search_entity(nvda, keyword="data center", chunk_limit=10))

each chunk dict: {"sentiment": float, "entities": [{"key": rp_id, "start", "end"}], "text", ...}

每个片段字典包含：{"sentiment": float, "entities": [{"key": rp_id, "start", "end"}], "text", ...}

--- Always know your spend (chunk-billed; see Cost discipline) ---

--- 随时掌握支出情况（按片段计费；详见成本管控） ---

ct = CostTracker(c); ct.snapshot()

... run a batch ...

... 运行批量任务 ...

print(ct.delta()) # {'delta_chunks':..., 'delta_query_units':..., 'usd_fast':...}


Wrap **every** network call in `rc(lambda: ...)` — a first-handshake `SSL:
UNEXPECTED_EOF` is common and the SDK's internal retry doesn't cover it.

print(ct.delta()) # {'delta_chunks':..., 'delta_query_units':..., 'usd_fast':...}


请将**所有**网络调用包装在`rc(lambda: ...)`中——首次握手时常见`SSL: UNEXPECTED_EOF`错误，而SDK内部重试机制无法覆盖该情况。

Routing — which capability answers the question

路由——用哪个功能回答问题

The user wants…	Use	Module
Company name / ISIN / CUSIP / SEDOL → `rp_entity_id`	`EntityResolver.resolve_id` / `.resolve_by_isin`	`kg.py` (SDK)
Forward analyst consensus (revenue/EPS by fiscal period)	`StructuredDataREST.analyst_estimates`	`rest_ext.py`
Latest earnings surprise (actual vs estimate)	`.latest_surprise`	`rest_ext.py`
Upcoming earnings / event calendar (one name or whole market)	`.events_calendar`	`rest_ext.py`
Analyst ratings / price-target consensus	`.analyst_ratings` / `.price_target`	`rest_ext.py`
Full financial statements (income / balance / cash-flow, multi-year)	`.income_statement` / `.balance_sheet` / `.cash_flow_statement`	`rest_ext.py`
TTM valuation metrics & ratios (EV/EBITDA, ROE, P/E, margins)	`.key_metrics_ttm` / `.company_ratios_ttm`	`rest_ext.py`
Company profile (CEO, sector, employees, IPO date)	`.company_profile`	`rest_ext.py`
Daily OHLC prices / dividend history	`.daily_prices` / `.dividends`	`rest_ext.py`
Revenue by geography / product segment	`.revenue_geographic_segments` / `.revenue_product_segments`	`rest_ext.py`
Daily entity-sentiment time series (don't self-aggregate from chunks!)	`.entity_sentiment`	`rest_ext.py`
Co-mention graph (supply-chain / competitor / customer — ⚠️ chunk-billed)	`.connected_entities`	`rest_ext.py`
Build a universe by market-cap / sector / country	`.company_screener`	`rest_ext.py`
News/filing/transcript chunks with sentiment + entity spans	`AnnotatedSearcher.search_entity`	`search.py` (SDK)
Bulk-pull many searches 50% cheaper (portfolio backfill)	`BatchSearch` (create→upload→poll→download)	`rest_ext.py`
Track / forecast quota spend before a backfill	`CostTracker` / `CostModel`	`cost.py`
Hit an endpoint the toolkit hasn't wrapped yet	`client.http.post("v1/<resource>/query", body)`	`client.py`

income/balance/cash-flow/daily-prices/dividends/revenue-segments
return
{fields, values}
— wrap them in
fields_values_to_records()
to get
[{field: value}]
. The
*_ttm
/
company_profile
endpoints are already flat. All structured endpoints above are free (0 chunks) except
connected_entities
and
AnnotatedSearcher
(chunk-billed).

用户需求…	使用方法	模块
公司名称 / ISIN / CUSIP / SEDOL → `rp_entity_id`	`EntityResolver.resolve_id` / `.resolve_by_isin`	`kg.py` （SDK）
前瞻性分析师一致预期（按财季划分的营收/EPS）	`StructuredDataREST.analyst_estimates`	`rest_ext.py`
最新盈利超预期数据（实际值vs预期值）	`.latest_surprise`	`rest_ext.py`
即将到来的盈利/事件日历（单公司或全市场）	`.events_calendar`	`rest_ext.py`
分析师评级 / 目标价一致预期	`.analyst_ratings` / `.price_target`	`rest_ext.py`
完整财务报表（利润表 / 资产负债表 / 现金流量表，多年数据）	`.income_statement` / `.balance_sheet` / `.cash_flow_statement`	`rest_ext.py`
TTM估值指标与比率（EV/EBITDA、ROE、市盈率、利润率）	`.key_metrics_ttm` / `.company_ratios_ttm`	`rest_ext.py`
公司概况（CEO、行业、员工数、IPO日期）	`.company_profile`	`rest_ext.py`
每日OHLC价格 / 股息历史	`.daily_prices` / `.dividends`	`rest_ext.py`
按地域/产品划分的营收细分	`.revenue_geographic_segments` / `.revenue_product_segments`	`rest_ext.py`
每日实体情绪时间序列（切勿自行从片段聚合！）	`.entity_sentiment`	`rest_ext.py`
共提及图谱（供应链/竞争对手/客户 — ⚠️按片段计费）	`.connected_entities`	`rest_ext.py`
按市值/行业/国家构建市场范围	`.company_screener`	`rest_ext.py`
带情绪值+实体位置的新闻/公告/会议纪要片段	`AnnotatedSearcher.search_entity`	`search.py` （SDK）
批量拉取多个搜索请求（成本降低50%，适合投资组合回填）	`BatchSearch` （创建→上传→轮询→下载）	`rest_ext.py`
在回填前追踪/预测配额支出	`CostTracker` / `CostModel`	`cost.py`
访问工具包未封装的端点	`client.http.post("v1/<resource>/query", body)`	`client.py`

income/balance/cash-flow/daily-prices/dividends/revenue-segments
返回
{fields, values}
格式——可使用
fields_values_to_records()
转换为
[{field: value}]
格式。
*_ttm
/
company_profile
端点已为扁平结构。上述所有结构化端点均免费（0片段），仅
connected_entities
和
AnnotatedSearcher
按片段计费。

The two data faces (do NOT say "Bigdata fails for Chinese / A-shares")

两种数据形态（切勿说“Bigdata对中国A股无效”）

This split is the most important non-obvious conclusion — state it precisely:

Face	Path	A-share / Chinese verdict
Structured financial (estimates, calendar, surprise, ratings, target, screener, financials, prices, dividends, revenue segments, daily entity-sentiment)	REST ( `rest_ext.py` )	Works — via `rp_entity_id` resolved from the English name or ISIN (not the Chinese name). Data is fresh. Minor holes (some A-share price-targets return the entity with no numeric target). The daily `entity_sentiment` series lives here and works for any resolvable entity — it is not the dead end below.
Unstructured Chinese NLP (Chinese-news entity detection, per-chunk Chinese sentiment)	SDK search ( `search.py` )	Dead end — a data-source-level gap, not an SDK bug: Chinese entity detection ≈ 0, per-chunk CJK sentiment is a doc-level inherited value, and `language` mislabels Chinese filings as English. Pair Bigdata with a China-domestic source for Chinese-language chunk content; use Bigdata for the structured face (incl. aggregate `entity_sentiment` ) + ISIN/KG crosswalk + English-language chunk sentiment.

这是最重要的非显性结论——请准确表述：

形态	路径	A股/中国市场结论
结构化金融数据（预测、日历、盈利超预期、评级、目标价、筛选器、财务报表、价格、股息、营收细分、每日实体情绪）	REST（ `rest_ext.py` ）	可用——通过英文名称或ISIN（而非中文名称）解析得到的 `rp_entity_id` 访问。数据实时更新。存在少量缺口（部分A股目标价仅返回实体，无数值）。每日 `entity_sentiment` 序列在此处，可用于任何可解析的实体——并非下文所述的死胡同。
非结构化中文NLP（中文新闻实体检测、单片段中文情绪）	SDK搜索（ `search.py` ）	死胡同——这是数据源层面的缺口，而非SDK bug：中文实体检测准确率≈0，单片段CJK情绪值是文档级继承值，且 `language` 字段会错误地将中文公告标记为英文。如需中文片段内容，可将Bigdata与国内数据源搭配使用；Bigdata可用于结构化数据（包括汇总 `entity_sentiment` ）+ ISIN/知识图谱交叉映射 + 英文片段情绪值。

Cost discipline

成本管控

1 query_unit = 10 chunks

(official). Only chunk-search is billed — the structured

/v1/*

endpoints (estimates, financials, prices, calendar, surprise, ratings, the sentiment time series, screener…) are free (0 chunks, contract-tested).

connected_entities

(co-mentions) and

AnnotatedSearcher

are chunk-billed.

Three levers when you do pay for chunks:

ChunkLimit
, never a bare
int
.
```
Search.run(int)
```
is a document limit billed by the full chunk page;
```
ChunkLimit(n)
```
bills per chunk.
```
AnnotatedSearcher.search
```
forces
```
ChunkLimit
```
for you. (We observed roughly a 52x gap once — a single measured data point, not stated in the official docs; treat the exact multiple as indicative. The rule "use
```
ChunkLimit
```
" holds regardless, because
```
max_chunks
```
is the official billing unit.)
Rerank bills only the returned chunks (official) — pass a
```
rerank_threshold
```
to recall broadly but pay only for the high-relevance hits.
Batch search is 50% cheaper (
```
$0.0075
```
vs
```
$0.015
```
/ qu) — use
```
BatchSearch
```
for a large multi-query backfill.

Use

CostModel

to veto an over-budget job before running it, and

CostTracker.snapshot()

delta()

to measure real spend. Full accounting →

references/cost_accounting.md

官方规定

1 query_unit = 10 chunks

（片段）。仅片段搜索会产生费用——结构化

/v1/*

端点（预测、财务报表、价格、日历、盈利超预期、评级、情绪时间序列、筛选器…）均免费（0片段，经合同验证）。

connected_entities

（共提及）和

AnnotatedSearcher

按片段计费。

当需要为片段付费时，可使用三个控制手段：

使用
ChunkLimit
，切勿直接传入整数。
```
Search.run(int)
```
是按文档数量限制，按完整片段页计费；
```
ChunkLimit(n)
```
按实际片段数量计费。
```
AnnotatedSearcher.search
```
强制使用
```
ChunkLimit
```
。（我们曾观察到约52倍的成本差距——仅为单个实测数据点，未在官方文档中说明；具体倍数仅供参考。无论如何，“使用
```
ChunkLimit
```
”的规则始终成立，因为
```
max_chunks
```
是官方计费单位。）
重排仅对返回的片段计费（官方规定）——传入
```
rerank_threshold
```
可广泛召回内容，但仅为高相关性结果付费。
批量搜索成本降低50%（
```
$0.0075
```
vs
```
$0.015
```
/ query_unit）——如需大规模多查询回填，使用
```
BatchSearch
```
。

使用

CostModel

可在运行前否决超出预算的任务，使用

CostTracker.snapshot()

delta()

可测量实际支出。完整核算说明请查看

references/cost_accounting.md

。

Known pitfalls (already solved — don't re-debug these)

已知陷阱（已解决——无需重复调试）

Each cost real debugging time and is fixed or guarded in the toolkit. Full reproductions and fixes in references/known_pitfalls.md
:

First-handshake
SSL: UNEXPECTED_EOF
→ wrap calls in
```
rc()
```
; the SDK's urllib3 retry only covers HTTP status, not the SSL EOF.

All(entity, Keyword(kw))
raises
TypeError
→ combine with the

operator (

entity & Keyword(kw)

);

All

takes a single iterable. (Fixed in

AnnotatedSearcher.entity_query

The 52x doc-limit billing trap → always
```
ChunkLimit
```
, never a bare
```
int
```
.
Closure capture in loops → bind loop vars:
```
rc(lambda q=q, dr=dr: ...)
```
.

analyst_estimates(period="quarter")
400s above
limit≈20
.

company_screener
filters must nest under
"filters"
— flat top-level keys don't 400, they're silently dropped → unfiltered universe.
Document.reporting_period
is always
None
(the SDK model drops a field present on the REST wire) →
```
fetch_reporting_period_raw
```
.

每个陷阱都耗费了大量调试时间，工具包中已修复或添加防护。完整复现步骤和修复方案请查看**

references/known_pitfalls.md

**：

首次握手时的
SSL: UNEXPECTED_EOF
错误 → 使用
```
rc()
```
包装调用；SDK的urllib3重试仅覆盖HTTP状态码，不包含SSL EOF错误。
All(entity, Keyword(kw))
抛出
TypeError
→ 使用
```
&
```
运算符组合（
```
entity & Keyword(kw)
```
）；
```
All
```
仅接受单个可迭代对象。（已在
```
AnnotatedSearcher.entity_query
```
中修复。）
52倍文档限制计费陷阱 → 始终使用
```
ChunkLimit
```
，切勿直接传入整数。
循环中的闭包捕获问题 → 绑定循环变量：
```
rc(lambda q=q, dr=dr: ...)
```
。

analyst_estimates(period="quarter")
当
limit≈20
时返回400错误。

company_screener
筛选器必须嵌套在
"filters"
下——顶层扁平键不会返回400错误，但会被静默丢弃→返回未筛选的全市场数据。
Document.reporting_period
始终为
None
（SDK模型丢弃了REST返回的字段）→ 使用
```
fetch_reporting_period_raw
```
。

What this skill will not do

本技能不会执行的操作

Never hardcode an API key.
```
BigdataClient
```
reads
```
BIGDATA_API_KEY
```
and fail-fasts if absent — no plaintext fallback (that is exactly the pattern secret scanners catch).
Only ever reads — never writes or uploads. Every method is a read-only query (
```
uploads
```
is
```
NotImplementedError
```
in API-key mode anyway), so the toolkit can't mutate your account or push data anywhere.
Never invent an endpoint or a schema. Every signature here is runtime L4-verified or marked L3 (doc-confirmed, not yet run); see
```
references/verified_api_signatures.md
```
. For a new endpoint, confirm the path via
```
docs.bigdata.com/llms.txt
```
rather than guessing.

绝不硬编码API密钥。
```
BigdataClient
```
读取
```
BIGDATA_API_KEY
```
环境变量，若缺失则直接报错——无明文 fallback（这正是秘密扫描器检测的模式）。
仅读取数据——绝不写入或上传。所有方法均为只读查询（API密钥模式下
```
uploads
```
接口本身就返回
```
NotImplementedError
```
），因此工具包无法修改您的账户或向任何位置推送数据。
绝不凭空创建端点或 schema。此处的所有方法签名均经过运行时L4验证或标记为L3（文档确认，尚未运行）；请查看
```
references/verified_api_signatures.md
```
。如需新增端点，请通过
```
docs.bigdata.com/llms.txt
```
确认路径，切勿猜测。

File layout

文件结构

bigdata-skill/
├── SKILL.md                       # this file — routing + setup + quickstart
├── scripts/
│   ├── bigdata_toolkit/           # the verified, cost-guarded package
│   │   ├── client.py              # BigdataClient: SDK (.bd) + REST escape hatch (.http/.conn)
│   │   ├── kg.py                  # EntityResolver: name/ISIN/CUSIP/SEDOL → rp_entity_id
│   │   ├── search.py              # AnnotatedSearcher: chunks + sentiment + entity spans (SDK)
│   │   ├── rest_ext.py            # StructuredDataREST (estimates/financials/prices/dividends/sentiment/co-mentions/screener) + BatchSearch + fields_values_to_records — official REST
│   │   ├── cost.py                # CostTracker + CostModel: chunk billing + budget veto
│   │   └── retry.py               # rc(): SSL/transient-error retry passthrough
│   └── probe_example.py           # runnable end-to-end smoke test
└── references/
    ├── escape_hatch_architecture.md  # WHY the MCP is lossy; bd._api.http mechanism; adding endpoints
    ├── verified_api_signatures.md    # L4/L3-verified signatures + the two data faces, with evidence
    ├── cost_accounting.md            # chunk billing, the 52x trap, CostModel/CostTracker, budgeting
    └── known_pitfalls.md             # every pitfall above, with reproduction + fix

bigdata-skill/
├── SKILL.md                       # 本文档——路由+设置+快速开始
├── scripts/
│   ├── bigdata_toolkit/           # 已验证、成本可控的封装包
│   │   ├── client.py              # BigdataClient：SDK（.bd）+ REST逃逸接口（.http/.conn）
│   │   ├── kg.py                  # EntityResolver：名称/ISIN/CUSIP/SEDOL → rp_entity_id
│   │   ├── search.py              # AnnotatedSearcher：片段+情绪值+实体位置（SDK）
│   │   ├── rest_ext.py            # StructuredDataREST（预测/财务报表/价格/股息/情绪/共提及/筛选器）+ BatchSearch + fields_values_to_records —— 官方REST接口
│   │   ├── cost.py                # CostTracker + CostModel：片段计费+预算否决
│   │   └── retry.py               # rc()：SSL/临时错误重试透传
│   └── probe_example.py           # 可运行的端到端冒烟测试
└── references/
    ├── escape_hatch_architecture.md  # 为何MCP存在信息丢失；bd._api.http机制；新增端点方法
    ├── verified_api_signatures.md    # L4/L3验证的方法签名+两种数据形态，附验证依据
    ├── cost_accounting.md            # 片段计费、52倍陷阱、CostModel/CostTracker、预算管理
    └── known_pitfalls.md             # 上述所有陷阱，附复现步骤+修复方案

References

参考文档

Read when you need to…	File
Understand why the MCP is insufficient and how the REST escape hatch works (and how to wrap a new `/v1/*` endpoint)	`references/escape_hatch_architecture.md`
Look up an exact verified method signature + its verification level	`references/verified_api_signatures.md`
Budget a backfill or debug a surprise quota burn	`references/cost_accounting.md`
Diagnose an error you hit while pulling data	`references/known_pitfalls.md`

当您需要…时阅读	文件
理解为何MCP不够用、REST逃逸接口的工作原理以及如何封装新的 `/v1/*` 端点	`references/escape_hatch_architecture.md`
查询已验证的精确方法签名及其验证级别	`references/verified_api_signatures.md`
为回填任务做预算或调试意外配额消耗	`references/cost_accounting.md`
诊断拉取数据时遇到的错误	`references/known_pitfalls.md`