bigdata-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bigdata.com SDK + REST Toolkit

Bigdata.com SDK + REST工具包

Get the structured substrate the Bigdata.com MCP server doesn't hand over. The MCP returns clean prose and pre-synthesized tearsheets, but its search tool gives chunks with no per-chunk sentiment or entity spans, and its tearsheets give aggregate values — not the fiscal-period time series, universe screener, or per-field JSON you'd build a pipeline on. The official
bigdata-client
SDK plus a thin REST passthrough over the same backend, same JWT reach the official
/v1/*
endpoints that hold it. This skill bundles a toolkit that does exactly that — already debugged, already cost-guarded — so you don't re-pay the discovery cost.
获取Bigdata.com MCP服务器未提供的结构化底层数据。MCP返回的是简洁的文本内容和预合成的分析报告,但其搜索工具返回的片段无单片段情绪值或实体位置信息,分析报告仅提供汇总值——而非您构建数据管道所需的财季时间序列、市场筛选器或按字段拆分的JSON数据。官方
bigdata-client
SDK加上基于同一后端、同一JWT的轻量REST透传层,可访问存储这些数据的官方
/v1/*
端点。本技能封装的工具包正是为此而生——已完成调试、已实现成本控制,让您无需重复付出探索成本。

The core problem this solves (read this first)

本技能解决的核心问题(请先阅读)

The Bigdata MCP server answers "what's the sentiment around NVIDIA?" with a readable paragraph or a pre-synthesized tearsheet — genuinely useful for a chat turn. But the moment you need the machine-readable substrate to build a pipeline on, the MCP doesn't hand it over:
  • its search tool returns chunks with text + relevance only — no per-chunk sentiment number, no entity character spans;
  • its tearsheets give aggregate values (a single sentiment score, a summary of estimates) — not a fiscal-period time series you can compute on, a universe screener, or per-field JSON.
The fix is a general pattern, not a Bigdata trick:
When an MCP data source returns only synthesized output but you need the structured fields underneath, drop to the vendor SDK or REST. MCP optimizes for a chat turn, not a pipeline.
Crucially, for Bigdata these structured fields are official, publicly documented REST endpoints (
docs.bigdata.com/api-reference/...
), not a hidden backend — and Bigdata is sunsetting the SDK (EOL 2026-12-31) in favour of this REST API, so the REST layer here is the forward-compatible path, not a hack. The SDK (
bigdata_client.Bigdata
) covers search + knowledge-graph;
bd._api.http
reaches every
/v1/*
endpoint the SDK never wrapped. The bundled
bigdata_toolkit
packages both behind one
BigdataClient
.
当您询问“NVIDIA的市场情绪如何?”时,Bigdata MCP服务器会返回一段易读的段落或预合成的分析报告——这在聊天场景中确实有用。但当您需要机器可读的底层数据来构建数据管道时,MCP无法提供:
  • 搜索工具仅返回包含文本+相关性评分的片段——无单片段情绪数值、无实体字符位置信息
  • 分析报告提供的是汇总值(单一情绪评分、预测摘要)——而非可计算的财季时间序列、市场筛选器或按字段拆分的JSON数据。
解决方案是通用模式,而非针对Bigdata的技巧:
当MCP数据源仅返回合成输出,而您需要底层结构化字段时,直接使用厂商SDK或REST接口。 MCP针对聊天场景优化,而非数据管道。
至关重要的是,Bigdata的这些结构化字段来自官方公开文档的REST端点
docs.bigdata.com/api-reference/...
),并非隐藏后端——而且Bigdata正逐步淘汰SDK(终止日期2026-12-31),转而推荐该REST API,因此此处的REST层是向前兼容的方案,而非 hack 手段。SDK(
bigdata_client.Bigdata
)覆盖搜索+知识图谱;**
bd._api.http
**可访问SDK未封装的所有
/v1/*
端点。封装的
bigdata_toolkit
将两者整合到一个
BigdataClient
对象中。

When to use this skill

何时使用本技能

Trigger on any of these, in any language:
  • The user is using Bigdata.com / RavenPack and the MCP result feels thin — "where's the sentiment score?", "I need entity-level data", "the calendar".
  • They want forward / structured financials for a ticker: analyst estimates, earnings or event calendar, earnings surprise, analyst ratings, price targets, a company screener / universe.
  • They want annotated news chunks with numeric sentiment + entity spans, or a sentiment time series / co-mention graph.
  • They mention a
    bd_v2_
    API key
    ,
    rp_entity_id
    ,
    query_unit
    / chunk cost,
    bigdata-client
    , or "the bigdata MCP isn't enough".
  • They're building an investment-research dataset and need a reusable, cost-aware data-pull layer rather than one-off MCP calls.
当出现以下任一情况时触发(支持任意语言):
  • 用户正在使用Bigdata.com / RavenPack,且MCP返回结果不够详尽——例如询问“情绪评分在哪里?”、“我需要实体级数据”、“事件日历”。
  • 用户需要某一股票代码的前瞻性/结构化金融数据:分析师预测、盈利或事件日历、盈利超预期、分析师评级、目标价、公司筛选器/市场范围。
  • 用户需要带数值情绪值+实体位置信息的带注释新闻片段,或情绪时间序列/共提及图谱。
  • 用户提及**
    bd_v2_
    API密钥**、
    rp_entity_id
    query_unit
    /片段成本、
    bigdata-client
    ,或表示“bigdata MCP不够用”。
  • 用户正在构建投资研究数据集,需要可复用、成本可控的数据拉取层,而非一次性MCP调用。

Setup (one time)

初始化设置(仅需一次)

1 — API key (never hardcode it). The client fail-fasts if it's missing:
bash
export BIGDATA_API_KEY=bd_v2_xxxxxxxx
2 — An isolated Python env with the official SDK. The bundled toolkit imports
bigdata_client
; install it once:
bash
uv venv .venv --python 3.12
uv pip install --python .venv/bin/python bigdata-client
1 — API密钥(切勿硬编码)。若缺少密钥,客户端会直接报错:
bash
export BIGDATA_API_KEY=bd_v2_xxxxxxxx
2 — 安装官方SDK的独立Python环境。封装的工具包依赖
bigdata_client
;只需安装一次:
bash
uv venv .venv --python 3.12
uv pip install --python .venv/bin/python bigdata-client

Behind a slow/blocked PyPI (e.g. mainland China) add a mirror, and unset any

若PyPI访问缓慢/受限(如中国大陆),可添加镜像源,并在安装步骤中取消出站代理,让uv直接访问索引:

outbound proxy for the install step so uv reaches the index directly:


**3 — Outbound proxy (only if your network needs one to reach
`api.bigdata.com`).** Two equivalent options — the official SDK accepts both: an
env var, or `BigdataClient(proxy=...)` in code. The env var is simplest:

```bash
export HTTPS_PROXY=http://<host>:<port>     # plus WSS_PROXY for chat/WebSocket
If a proxy does TLS interception (self-signed CA) and you hit SSL handshake errors, the official fix is
BigdataClient(verify_ssl="<proxy-CA>.pem")
— not blind retries.
4 — Make the bundled package importable by putting this skill's
scripts/
on
PYTHONPATH
(or
sys.path.insert(0, "<this-skill>/scripts")
).
Smoke-test the whole path (entity resolve + quota are free;
--with-search
adds one ~1 query_unit chunk search):
bash
BIGDATA_API_KEY=bd_v2_xxx PYTHONPATH=scripts .venv/bin/python scripts/probe_example.py

**3 — 出站代理(仅当您的网络需要代理才能访问`api.bigdata.com`时)**。有两种等效方式——官方SDK均支持:环境变量,或在代码中使用`BigdataClient(proxy=...)`。环境变量是最简单的方式:

```bash
export HTTPS_PROXY=http://<host>:<port>     # 若使用聊天/WebSocket,还需设置WSS_PROXY
若代理进行TLS拦截(自签名CA)导致SSL握手错误,官方解决方案是使用
BigdataClient(verify_ssl="<proxy-CA>.pem")
——切勿盲目重试。
4 — 让封装包可被导入,将本技能的
scripts/
目录添加到
PYTHONPATH
(或在代码中添加
sys.path.insert(0, "<本技能路径>/scripts")
)。
完整路径冒烟测试(实体解析和配额查询免费;添加
--with-search
会触发一次约1 query_unit的片段搜索):
bash
BIGDATA_API_KEY=bd_v2_xxx PYTHONPATH=scripts .venv/bin/python scripts/probe_example.py

Quickstart

快速开始

python
import sys
sys.path.insert(0, "<this-skill>/scripts")          # so `import bigdata_toolkit` resolves
from bigdata_toolkit import (
    BigdataClient, EntityResolver, AnnotatedSearcher,
    StructuredDataREST, CostTracker, CostModel, rc,   # rc = SSL-retry wrapper
)

c  = BigdataClient()                                  # SDK + REST escape hatch, one object
er = EntityResolver(c)
nvda = rc(lambda: er.resolve_id("NVIDIA", country="US"))   # -> 'E09E2B'  (rp_entity_id is the gateway key)
python
import sys
sys.path.insert(0, "<本技能路径>/scripts")          # 确保`import bigdata_toolkit`可被解析
from bigdata_toolkit import (
    BigdataClient, EntityResolver, AnnotatedSearcher,
    StructuredDataREST, CostTracker, CostModel, rc,   # rc = SSL重试包装器
)

c  = BigdataClient()                                  # 整合SDK + REST逃逸接口的单一对象
er = EntityResolver(c)
nvda = rc(lambda: er.resolve_id("NVIDIA", country="US"))   # -> 'E09E2B'  (rp_entity_id是核心密钥)

--- Structured financials the MCP does NOT expose (REST escape hatch) ---

--- MCP未公开的结构化金融数据(REST逃逸接口) ---

rest = StructuredDataREST(c) est = rc(lambda: rest.analyst_estimates(nvda, period="quarter", limit=5)) # forward consensus surp = rc(lambda: rest.latest_surprise(nvda)) # last EPS/revenue surprise cal = rc(lambda: rest.events_calendar(nvda, categories=["earnings-call"], start_date="2026-06-01", end_date="2026-12-31"))
rest = StructuredDataREST(c) est = rc(lambda: rest.analyst_estimates(nvda, period="quarter", limit=5)) # 前瞻性一致预期 surp = rc(lambda: rest.latest_surprise(nvda)) # 最新EPS/营收超预期数据 cal = rc(lambda: rest.events_calendar(nvda, categories=["earnings-call"], start_date="2026-06-01", end_date="2026-12-31"))

--- Annotated chunks the MCP STRIPS: sentiment + entity spans (cost-guarded) ---

--- MCP剥离的带注释片段:情绪值 + 实体位置(成本可控) ---

s = AnnotatedSearcher(c) docs = rc(lambda: s.search_entity(nvda, keyword="data center", chunk_limit=10))
s = AnnotatedSearcher(c) docs = rc(lambda: s.search_entity(nvda, keyword="data center", chunk_limit=10))

each chunk dict: {"sentiment": float, "entities": [{"key": rp_id, "start", "end"}], "text", ...}

每个片段字典包含:{"sentiment": float, "entities": [{"key": rp_id, "start", "end"}], "text", ...}

--- Always know your spend (chunk-billed; see Cost discipline) ---

--- 随时掌握支出情况(按片段计费;详见成本管控) ---

ct = CostTracker(c); ct.snapshot()
ct = CostTracker(c); ct.snapshot()

... run a batch ...

... 运行批量任务 ...

print(ct.delta()) # {'delta_chunks':..., 'delta_query_units':..., 'usd_fast':...}

Wrap **every** network call in `rc(lambda: ...)` — a first-handshake `SSL:
UNEXPECTED_EOF` is common and the SDK's internal retry doesn't cover it.
print(ct.delta()) # {'delta_chunks':..., 'delta_query_units':..., 'usd_fast':...}

请将**所有**网络调用包装在`rc(lambda: ...)`中——首次握手时常见`SSL: UNEXPECTED_EOF`错误,而SDK内部重试机制无法覆盖该情况。

Routing — which capability answers the question

路由——用哪个功能回答问题

The user wants…UseModule
Company name / ISIN / CUSIP / SEDOL →
rp_entity_id
EntityResolver.resolve_id
/
.resolve_by_isin
kg.py
(SDK)
Forward analyst consensus (revenue/EPS by fiscal period)
StructuredDataREST.analyst_estimates
rest_ext.py
Latest earnings surprise (actual vs estimate)
.latest_surprise
rest_ext.py
Upcoming earnings / event calendar (one name or whole market)
.events_calendar
rest_ext.py
Analyst ratings / price-target consensus
.analyst_ratings
/
.price_target
rest_ext.py
Full financial statements (income / balance / cash-flow, multi-year)
.income_statement
/
.balance_sheet
/
.cash_flow_statement
rest_ext.py
TTM valuation metrics & ratios (EV/EBITDA, ROE, P/E, margins)
.key_metrics_ttm
/
.company_ratios_ttm
rest_ext.py
Company profile (CEO, sector, employees, IPO date)
.company_profile
rest_ext.py
Daily OHLC prices / dividend history
.daily_prices
/
.dividends
rest_ext.py
Revenue by geography / product segment
.revenue_geographic_segments
/
.revenue_product_segments
rest_ext.py
Daily entity-sentiment time series (don't self-aggregate from chunks!)
.entity_sentiment
rest_ext.py
Co-mention graph (supply-chain / competitor / customer — ⚠️ chunk-billed)
.connected_entities
rest_ext.py
Build a universe by market-cap / sector / country
.company_screener
rest_ext.py
News/filing/transcript chunks with sentiment + entity spans
AnnotatedSearcher.search_entity
search.py
(SDK)
Bulk-pull many searches 50% cheaper (portfolio backfill)
BatchSearch
(create→upload→poll→download)
rest_ext.py
Track / forecast quota spend before a backfill
CostTracker
/
CostModel
cost.py
Hit an endpoint the toolkit hasn't wrapped yet
client.http.post("v1/<resource>/query", body)
client.py
income/balance/cash-flow/daily-prices/dividends/revenue-segments
return
{fields, values}
— wrap them in
fields_values_to_records()
to get
[{field: value}]
. The
*_ttm
/
company_profile
endpoints are already flat. All structured endpoints above are free (0 chunks) except
connected_entities
and
AnnotatedSearcher
(chunk-billed).
用户需求…使用方法模块
公司名称 / ISIN / CUSIP / SEDOL →
rp_entity_id
EntityResolver.resolve_id
/
.resolve_by_isin
kg.py
(SDK)
前瞻性分析师一致预期(按财季划分的营收/EPS)
StructuredDataREST.analyst_estimates
rest_ext.py
最新盈利超预期数据(实际值vs预期值)
.latest_surprise
rest_ext.py
即将到来的盈利/事件日历(单公司或全市场)
.events_calendar
rest_ext.py
分析师评级 / 目标价一致预期
.analyst_ratings
/
.price_target
rest_ext.py
完整财务报表(利润表 / 资产负债表 / 现金流量表,多年数据)
.income_statement
/
.balance_sheet
/
.cash_flow_statement
rest_ext.py
TTM估值指标与比率(EV/EBITDA、ROE、市盈率、利润率)
.key_metrics_ttm
/
.company_ratios_ttm
rest_ext.py
公司概况(CEO、行业、员工数、IPO日期)
.company_profile
rest_ext.py
每日OHLC价格 / 股息历史
.daily_prices
/
.dividends
rest_ext.py
按地域/产品划分的营收细分
.revenue_geographic_segments
/
.revenue_product_segments
rest_ext.py
每日实体情绪时间序列(切勿自行从片段聚合!)
.entity_sentiment
rest_ext.py
共提及图谱(供应链/竞争对手/客户 — ⚠️按片段计费)
.connected_entities
rest_ext.py
按市值/行业/国家构建市场范围
.company_screener
rest_ext.py
带情绪值+实体位置的新闻/公告/会议纪要片段
AnnotatedSearcher.search_entity
search.py
(SDK)
批量拉取多个搜索请求(成本降低50%,适合投资组合回填)
BatchSearch
(创建→上传→轮询→下载)
rest_ext.py
在回填前追踪/预测配额支出
CostTracker
/
CostModel
cost.py
访问工具包未封装的端点
client.http.post("v1/<resource>/query", body)
client.py
income/balance/cash-flow/daily-prices/dividends/revenue-segments
返回
{fields, values}
格式——可使用
fields_values_to_records()
转换为
[{field: value}]
格式。
*_ttm
/
company_profile
端点已为扁平结构。上述所有结构化端点均免费(0片段),仅
connected_entities
AnnotatedSearcher
按片段计费。

The two data faces (do NOT say "Bigdata fails for Chinese / A-shares")

两种数据形态(切勿说“Bigdata对中国A股无效”)

This split is the most important non-obvious conclusion — state it precisely:
FacePathA-share / Chinese verdict
Structured financial (estimates, calendar, surprise, ratings, target, screener, financials, prices, dividends, revenue segments, daily entity-sentiment)REST (
rest_ext.py
)
Works — via
rp_entity_id
resolved from the English name or ISIN (not the Chinese name). Data is fresh. Minor holes (some A-share price-targets return the entity with no numeric target). The daily
entity_sentiment
series lives here and works for any resolvable entity — it is not the dead end below.
Unstructured Chinese NLP (Chinese-news entity detection, per-chunk Chinese sentiment)SDK search (
search.py
)
Dead end — a data-source-level gap, not an SDK bug: Chinese entity detection ≈ 0, per-chunk CJK sentiment is a doc-level inherited value, and
language
mislabels Chinese filings as English. Pair Bigdata with a China-domestic source for Chinese-language chunk content; use Bigdata for the structured face (incl. aggregate
entity_sentiment
) + ISIN/KG crosswalk + English-language chunk sentiment.
这是最重要的非显性结论——请准确表述:
形态路径A股/中国市场结论
结构化金融数据(预测、日历、盈利超预期、评级、目标价、筛选器、财务报表、价格、股息、营收细分、每日实体情绪REST(
rest_ext.py
可用——通过英文名称或ISIN(而非中文名称)解析得到的
rp_entity_id
访问。数据实时更新。存在少量缺口(部分A股目标价仅返回实体,无数值)。每日
entity_sentiment
序列在此处,可用于任何可解析的实体——并非下文所述的死胡同。
非结构化中文NLP(中文新闻实体检测、单片段中文情绪)SDK搜索(
search.py
死胡同——这是数据源层面的缺口,而非SDK bug:中文实体检测准确率≈0,单片段CJK情绪值是文档级继承值,且
language
字段会错误地将中文公告标记为英文。如需中文片段内容,可将Bigdata与国内数据源搭配使用;Bigdata可用于结构化数据(包括汇总
entity_sentiment
)+ ISIN/知识图谱交叉映射 + 英文片段情绪值。

Cost discipline

成本管控

1 query_unit = 10 chunks
(official). Only chunk-search is billed — the structured
/v1/*
endpoints (estimates, financials, prices, calendar, surprise, ratings, the sentiment time series, screener…) are free (0 chunks, contract-tested).
connected_entities
(co-mentions) and
AnnotatedSearcher
are chunk-billed.
Three levers when you do pay for chunks:
  1. ChunkLimit
    , never a bare
    int
    .
    Search.run(int)
    is a document limit billed by the full chunk page;
    ChunkLimit(n)
    bills per chunk.
    AnnotatedSearcher.search
    forces
    ChunkLimit
    for you. (We observed roughly a 52x gap once — a single measured data point, not stated in the official docs; treat the exact multiple as indicative. The rule "use
    ChunkLimit
    " holds regardless, because
    max_chunks
    is the official billing unit.)
  2. Rerank bills only the returned chunks (official) — pass a
    rerank_threshold
    to recall broadly but pay only for the high-relevance hits.
  3. Batch search is 50% cheaper (
    $0.0075
    vs
    $0.015
    / qu) — use
    BatchSearch
    for a large multi-query backfill.
Use
CostModel
to veto an over-budget job before running it, and
CostTracker.snapshot()
/
delta()
to measure real spend. Full accounting →
references/cost_accounting.md
.
官方规定
1 query_unit = 10 chunks
(片段)。仅片段搜索会产生费用——结构化
/v1/*
端点(预测、财务报表、价格、日历、盈利超预期、评级、情绪时间序列、筛选器…)均免费(0片段,经合同验证)。
connected_entities
(共提及)和
AnnotatedSearcher
按片段计费
当需要为片段付费时,可使用三个控制手段:
  1. 使用
    ChunkLimit
    ,切勿直接传入整数
    Search.run(int)
    是按文档数量限制,按完整片段页计费;
    ChunkLimit(n)
    按实际片段数量计费。
    AnnotatedSearcher.search
    强制使用
    ChunkLimit
    。(我们曾观察到约52倍的成本差距——仅为单个实测数据点,未在官方文档中说明;具体倍数仅供参考。无论如何,“使用
    ChunkLimit
    ”的规则始终成立,因为
    max_chunks
    是官方计费单位。)
  2. 重排仅对返回的片段计费(官方规定)——传入
    rerank_threshold
    可广泛召回内容,但仅为高相关性结果付费。
  3. 批量搜索成本降低50%
    $0.0075
    vs
    $0.015
    / query_unit)——如需大规模多查询回填,使用
    BatchSearch
使用
CostModel
可在运行前否决超出预算的任务,使用
CostTracker.snapshot()
/
delta()
可测量实际支出。完整核算说明请查看
references/cost_accounting.md

Known pitfalls (already solved — don't re-debug these)

已知陷阱(已解决——无需重复调试)

Each cost real debugging time and is fixed or guarded in the toolkit. Full reproductions and fixes in
references/known_pitfalls.md
:
  1. First-handshake
    SSL: UNEXPECTED_EOF
    → wrap calls in
    rc()
    ; the SDK's urllib3 retry only covers HTTP status, not the SSL EOF.
  2. All(entity, Keyword(kw))
    raises
    TypeError
    → combine with the
    &
    operator (
    entity & Keyword(kw)
    );
    All
    takes a single iterable. (Fixed in
    AnnotatedSearcher.entity_query
    .)
  3. The 52x doc-limit billing trap → always
    ChunkLimit
    , never a bare
    int
    .
  4. Closure capture in loops → bind loop vars:
    rc(lambda q=q, dr=dr: ...)
    .
  5. analyst_estimates(period="quarter")
    400s above
    limit≈20
    .
  6. company_screener
    filters must nest under
    "filters"
    — flat top-level keys don't 400, they're silently dropped → unfiltered universe.
  7. Document.reporting_period
    is always
    None
    (the SDK model drops a field present on the REST wire) →
    fetch_reporting_period_raw
    .
每个陷阱都耗费了大量调试时间,工具包中已修复或添加防护。完整复现步骤和修复方案请查看**
references/known_pitfalls.md
**:
  1. 首次握手时的
    SSL: UNEXPECTED_EOF
    错误
    → 使用
    rc()
    包装调用;SDK的urllib3重试仅覆盖HTTP状态码,不包含SSL EOF错误。
  2. All(entity, Keyword(kw))
    抛出
    TypeError
    → 使用
    &
    运算符组合(
    entity & Keyword(kw)
    );
    All
    仅接受单个可迭代对象。(已在
    AnnotatedSearcher.entity_query
    中修复。)
  3. 52倍文档限制计费陷阱 → 始终使用
    ChunkLimit
    ,切勿直接传入整数。
  4. 循环中的闭包捕获问题 → 绑定循环变量:
    rc(lambda q=q, dr=dr: ...)
  5. analyst_estimates(period="quarter")
    limit≈20
    时返回400错误
  6. company_screener
    筛选器必须嵌套在
    "filters"
    ——顶层扁平键不会返回400错误,但会被静默丢弃→返回未筛选的全市场数据。
  7. Document.reporting_period
    始终为
    None
    (SDK模型丢弃了REST返回的字段)→ 使用
    fetch_reporting_period_raw

What this skill will not do

本技能不会执行的操作

  • Never hardcode an API key.
    BigdataClient
    reads
    BIGDATA_API_KEY
    and fail-fasts if absent — no plaintext fallback (that is exactly the pattern secret scanners catch).
  • Only ever reads — never writes or uploads. Every method is a read-only query (
    uploads
    is
    NotImplementedError
    in API-key mode anyway), so the toolkit can't mutate your account or push data anywhere.
  • Never invent an endpoint or a schema. Every signature here is runtime L4-verified or marked L3 (doc-confirmed, not yet run); see
    references/verified_api_signatures.md
    . For a new endpoint, confirm the path via
    docs.bigdata.com/llms.txt
    rather than guessing.
  • 绝不硬编码API密钥
    BigdataClient
    读取
    BIGDATA_API_KEY
    环境变量,若缺失则直接报错——无明文 fallback(这正是秘密扫描器检测的模式)。
  • 仅读取数据——绝不写入或上传。所有方法均为只读查询(API密钥模式下
    uploads
    接口本身就返回
    NotImplementedError
    ),因此工具包无法修改您的账户或向任何位置推送数据。
  • 绝不凭空创建端点或 schema。此处的所有方法签名均经过运行时L4验证或标记为L3(文档确认,尚未运行);请查看
    references/verified_api_signatures.md
    。如需新增端点,请通过
    docs.bigdata.com/llms.txt
    确认路径,切勿猜测。

File layout

文件结构

bigdata-skill/
├── SKILL.md                       # this file — routing + setup + quickstart
├── scripts/
│   ├── bigdata_toolkit/           # the verified, cost-guarded package
│   │   ├── client.py              # BigdataClient: SDK (.bd) + REST escape hatch (.http/.conn)
│   │   ├── kg.py                  # EntityResolver: name/ISIN/CUSIP/SEDOL → rp_entity_id
│   │   ├── search.py              # AnnotatedSearcher: chunks + sentiment + entity spans (SDK)
│   │   ├── rest_ext.py            # StructuredDataREST (estimates/financials/prices/dividends/sentiment/co-mentions/screener) + BatchSearch + fields_values_to_records — official REST
│   │   ├── cost.py                # CostTracker + CostModel: chunk billing + budget veto
│   │   └── retry.py               # rc(): SSL/transient-error retry passthrough
│   └── probe_example.py           # runnable end-to-end smoke test
└── references/
    ├── escape_hatch_architecture.md  # WHY the MCP is lossy; bd._api.http mechanism; adding endpoints
    ├── verified_api_signatures.md    # L4/L3-verified signatures + the two data faces, with evidence
    ├── cost_accounting.md            # chunk billing, the 52x trap, CostModel/CostTracker, budgeting
    └── known_pitfalls.md             # every pitfall above, with reproduction + fix
bigdata-skill/
├── SKILL.md                       # 本文档——路由+设置+快速开始
├── scripts/
│   ├── bigdata_toolkit/           # 已验证、成本可控的封装包
│   │   ├── client.py              # BigdataClient:SDK(.bd)+ REST逃逸接口(.http/.conn)
│   │   ├── kg.py                  # EntityResolver:名称/ISIN/CUSIP/SEDOL → rp_entity_id
│   │   ├── search.py              # AnnotatedSearcher:片段+情绪值+实体位置(SDK)
│   │   ├── rest_ext.py            # StructuredDataREST(预测/财务报表/价格/股息/情绪/共提及/筛选器)+ BatchSearch + fields_values_to_records —— 官方REST接口
│   │   ├── cost.py                # CostTracker + CostModel:片段计费+预算否决
│   │   └── retry.py               # rc():SSL/临时错误重试透传
│   └── probe_example.py           # 可运行的端到端冒烟测试
└── references/
    ├── escape_hatch_architecture.md  # 为何MCP存在信息丢失;bd._api.http机制;新增端点方法
    ├── verified_api_signatures.md    # L4/L3验证的方法签名+两种数据形态,附验证依据
    ├── cost_accounting.md            # 片段计费、52倍陷阱、CostModel/CostTracker、预算管理
    └── known_pitfalls.md             # 上述所有陷阱,附复现步骤+修复方案

References

参考文档

Read when you need to…File
Understand why the MCP is insufficient and how the REST escape hatch works (and how to wrap a new
/v1/*
endpoint)
references/escape_hatch_architecture.md
Look up an exact verified method signature + its verification level
references/verified_api_signatures.md
Budget a backfill or debug a surprise quota burn
references/cost_accounting.md
Diagnose an error you hit while pulling data
references/known_pitfalls.md
当您需要…时阅读文件
理解为何MCP不够用、REST逃逸接口的工作原理以及如何封装新的
/v1/*
端点
references/escape_hatch_architecture.md
查询已验证的精确方法签名及其验证级别
references/verified_api_signatures.md
为回填任务做预算或调试意外配额消耗
references/cost_accounting.md
诊断拉取数据时遇到的错误
references/known_pitfalls.md