Bigdata.com SDK + REST Toolkit

Get the structured substrate the Bigdata.com MCP server doesn't hand over. The MCP returns clean prose and pre-synthesized tearsheets, but its search tool gives chunks with no per-chunk sentiment or entity spans, and its tearsheets give aggregate values — not the fiscal-period time series, universe screener, or per-field JSON you'd build a pipeline on. The official

bigdata-client

SDK plus a thin REST passthrough over the same backend, same JWT reach the official

/v1/*

endpoints that hold it. This skill bundles a toolkit that does exactly that — already debugged, already cost-guarded — so you don't re-pay the discovery cost.

The core problem this solves (read this first)

The Bigdata MCP server answers "what's the sentiment around NVIDIA?" with a readable paragraph or a pre-synthesized tearsheet — genuinely useful for a chat turn. But the moment you need the machine-readable substrate to build a pipeline on, the MCP doesn't hand it over:

its search tool returns chunks with text + relevance only — no per-chunk sentiment number, no entity character spans;
its tearsheets give aggregate values (a single sentiment score, a summary of estimates) — not a fiscal-period time series you can compute on, a universe screener, or per-field JSON.

The fix is a general pattern, not a Bigdata trick:

When an MCP data source returns only synthesized output but you need the structured fields underneath, drop to the vendor SDK or REST. MCP optimizes for a chat turn, not a pipeline.

Crucially, for Bigdata these structured fields are official, publicly documented REST endpoints (

docs.bigdata.com/api-reference/...

), not a hidden backend — and Bigdata is sunsetting the SDK (EOL 2026-12-31) in favour of this REST API, so the REST layer here is the forward-compatible path, not a hack. The SDK (

bigdata_client.Bigdata

) covers search + knowledge-graph; bd._api.http
reaches every

/v1/*

endpoint the SDK never wrapped. The bundled

bigdata_toolkit

packages both behind one

BigdataClient

When to use this skill

Trigger on any of these, in any language:

The user is using Bigdata.com / RavenPack and the MCP result feels thin — "where's the sentiment score?", "I need entity-level data", "the calendar".
They want forward / structured financials for a ticker: analyst estimates, earnings or event calendar, earnings surprise, analyst ratings, price targets, a company screener / universe.
They want annotated news chunks with numeric sentiment + entity spans, or a sentiment time series / co-mention graph.
They mention a bd_v2_
API key,
```
rp_entity_id
```
,
```
query_unit
```
/ chunk cost,
```
bigdata-client
```
, or "the bigdata MCP isn't enough".
They're building an investment-research dataset and need a reusable, cost-aware data-pull layer rather than one-off MCP calls.

Setup (one time)

1 — API key (never hardcode it). The client fail-fasts if it's missing:

bash

export BIGDATA_API_KEY=bd_v2_xxxxxxxx

2 — An isolated Python env with the official SDK. The bundled toolkit imports

bigdata_client

; install it once:

bash

uv venv .venv --python 3.12
uv pip install --python .venv/bin/python bigdata-client
# Behind a slow/blocked PyPI (e.g. mainland China) add a mirror, and unset any
# outbound proxy for the install step so uv reaches the index directly:
#   --index-url https://pypi.tuna.tsinghua.edu.cn/simple

3 — Outbound proxy (only if your network needs one to reach
api.bigdata.com
). Two equivalent options — the official SDK accepts both: an env var, or

BigdataClient(proxy=...)

in code. The env var is simplest:

bash

export HTTPS_PROXY=http://<host>:<port>     # plus WSS_PROXY for chat/WebSocket

If a proxy does TLS interception (self-signed CA) and you hit SSL handshake errors, the official fix is

BigdataClient(verify_ssl="<proxy-CA>.pem")

— not blind retries.

4 — Make the bundled package importable by putting this skill's

scripts/

PYTHONPATH

(or

sys.path.insert(0, "<this-skill>/scripts")

Smoke-test the whole path (entity resolve + quota are free;

--with-search

adds one ~1 query_unit chunk search):

bash

BIGDATA_API_KEY=bd_v2_xxx PYTHONPATH=scripts .venv/bin/python scripts/probe_example.py

Quickstart

python

import sys
sys.path.insert(0, "<this-skill>/scripts")          # so `import bigdata_toolkit` resolves
from bigdata_toolkit import (
    BigdataClient, EntityResolver, AnnotatedSearcher,
    StructuredDataREST, CostTracker, CostModel, rc,   # rc = SSL-retry wrapper
)

c  = BigdataClient()                                  # SDK + REST escape hatch, one object
er = EntityResolver(c)
nvda = rc(lambda: er.resolve_id("NVIDIA", country="US"))   # -> 'E09E2B'  (rp_entity_id is the gateway key)

# --- Structured financials the MCP does NOT expose (REST escape hatch) ---
rest = StructuredDataREST(c)
est  = rc(lambda: rest.analyst_estimates(nvda, period="quarter", limit=5))  # forward consensus
surp = rc(lambda: rest.latest_surprise(nvda))                               # last EPS/revenue surprise
cal  = rc(lambda: rest.events_calendar(nvda, categories=["earnings-call"],
                                       start_date="2026-06-01", end_date="2026-12-31"))

# --- Annotated chunks the MCP STRIPS: sentiment + entity spans (cost-guarded) ---
s    = AnnotatedSearcher(c)
docs = rc(lambda: s.search_entity(nvda, keyword="data center", chunk_limit=10))
# each chunk dict: {"sentiment": float, "entities": [{"key": rp_id, "start", "end"}], "text", ...}

# --- Always know your spend (chunk-billed; see Cost discipline) ---
ct = CostTracker(c); ct.snapshot()
# ... run a batch ...
print(ct.delta())     # {'delta_chunks':..., 'delta_query_units':..., 'usd_fast':...}

Wrap every network call in

rc(lambda: ...)

— a first-handshake

SSL: UNEXPECTED_EOF

is common and the SDK's internal retry doesn't cover it.

Routing — which capability answers the question

The user wants…	Use	Module
Company name / ISIN / CUSIP / SEDOL → `rp_entity_id`	`EntityResolver.resolve_id` / `.resolve_by_isin`	`kg.py` (SDK)
Forward analyst consensus (revenue/EPS by fiscal period)	`StructuredDataREST.analyst_estimates`	`rest_ext.py`
Latest earnings surprise (actual vs estimate)	`.latest_surprise`	`rest_ext.py`
Upcoming earnings / event calendar (one name or whole market)	`.events_calendar`	`rest_ext.py`
Analyst ratings / price-target consensus	`.analyst_ratings` / `.price_target`	`rest_ext.py`
Full financial statements (income / balance / cash-flow, multi-year)	`.income_statement` / `.balance_sheet` / `.cash_flow_statement`	`rest_ext.py`
TTM valuation metrics & ratios (EV/EBITDA, ROE, P/E, margins)	`.key_metrics_ttm` / `.company_ratios_ttm`	`rest_ext.py`
Company profile (CEO, sector, employees, IPO date)	`.company_profile`	`rest_ext.py`
Daily OHLC prices / dividend history	`.daily_prices` / `.dividends`	`rest_ext.py`
Revenue by geography / product segment	`.revenue_geographic_segments` / `.revenue_product_segments`	`rest_ext.py`
Daily entity-sentiment time series (don't self-aggregate from chunks!)	`.entity_sentiment`	`rest_ext.py`
Co-mention graph (supply-chain / competitor / customer — ⚠️ chunk-billed)	`.connected_entities`	`rest_ext.py`
Build a universe by market-cap / sector / country	`.company_screener`	`rest_ext.py`
News/filing/transcript chunks with sentiment + entity spans	`AnnotatedSearcher.search_entity`	`search.py` (SDK)
Bulk-pull many searches 50% cheaper (portfolio backfill)	`BatchSearch` (create→upload→poll→download)	`rest_ext.py`
Track / forecast quota spend before a backfill	`CostTracker` / `CostModel`	`cost.py`
Hit an endpoint the toolkit hasn't wrapped yet	`client.http.post("v1/<resource>/query", body)`	`client.py`

income/balance/cash-flow/daily-prices/dividends/revenue-segments
return
{fields, values}
— wrap them in
fields_values_to_records()
to get
[{field: value}]
. The
*_ttm
/
company_profile
endpoints are already flat. All structured endpoints above are free (0 chunks) except
connected_entities
and
AnnotatedSearcher
(chunk-billed).

The two data faces (do NOT say "Bigdata fails for Chinese / A-shares")

This split is the most important non-obvious conclusion — state it precisely:

Face	Path	A-share / Chinese verdict
Structured financial (estimates, calendar, surprise, ratings, target, screener, financials, prices, dividends, revenue segments, daily entity-sentiment)	REST ( `rest_ext.py` )	Works — via `rp_entity_id` resolved from the English name or ISIN (not the Chinese name). Data is fresh. Minor holes (some A-share price-targets return the entity with no numeric target). The daily `entity_sentiment` series lives here and works for any resolvable entity — it is not the dead end below.
Unstructured Chinese NLP (Chinese-news entity detection, per-chunk Chinese sentiment)	SDK search ( `search.py` )	Dead end — a data-source-level gap, not an SDK bug: Chinese entity detection ≈ 0, per-chunk CJK sentiment is a doc-level inherited value, and `language` mislabels Chinese filings as English. Pair Bigdata with a China-domestic source for Chinese-language chunk content; use Bigdata for the structured face (incl. aggregate `entity_sentiment` ) + ISIN/KG crosswalk + English-language chunk sentiment.

Cost discipline

1 query_unit = 10 chunks

(official). Only chunk-search is billed — the structured

/v1/*

endpoints (estimates, financials, prices, calendar, surprise, ratings, the sentiment time series, screener…) are free (0 chunks, contract-tested).

connected_entities

(co-mentions) and

AnnotatedSearcher

are chunk-billed.

Three levers when you do pay for chunks:

ChunkLimit
, never a bare
int
.
```
Search.run(int)
```
is a document limit billed by the full chunk page;
```
ChunkLimit(n)
```
bills per chunk.
```
AnnotatedSearcher.search
```
forces
```
ChunkLimit
```
for you. (We observed roughly a 52x gap once — a single measured data point, not stated in the official docs; treat the exact multiple as indicative. The rule "use
```
ChunkLimit
```
" holds regardless, because
```
max_chunks
```
is the official billing unit.)
Rerank bills only the returned chunks (official) — pass a
```
rerank_threshold
```
to recall broadly but pay only for the high-relevance hits.
Batch search is 50% cheaper (
```
$0.0075
```
vs
```
$0.015
```
/ qu) — use
```
BatchSearch
```
for a large multi-query backfill.

Use

CostModel

to veto an over-budget job before running it, and

CostTracker.snapshot()

delta()

to measure real spend. Full accounting →

references/cost_accounting.md

Known pitfalls (already solved — don't re-debug these)

Each cost real debugging time and is fixed or guarded in the toolkit. Full reproductions and fixes in references/known_pitfalls.md
:

First-handshake
SSL: UNEXPECTED_EOF
→ wrap calls in
```
rc()
```
; the SDK's urllib3 retry only covers HTTP status, not the SSL EOF.

All(entity, Keyword(kw))
raises
TypeError
→ combine with the

operator (

entity & Keyword(kw)

);

All

takes a single iterable. (Fixed in

AnnotatedSearcher.entity_query

The 52x doc-limit billing trap → always
```
ChunkLimit
```
, never a bare
```
int
```
.
Closure capture in loops → bind loop vars:
```
rc(lambda q=q, dr=dr: ...)
```
.

analyst_estimates(period="quarter")
400s above
limit≈20
.

company_screener
filters must nest under
"filters"
— flat top-level keys don't 400, they're silently dropped → unfiltered universe.
Document.reporting_period
is always
None
(the SDK model drops a field present on the REST wire) →
```
fetch_reporting_period_raw
```
.

What this skill will not do

Never hardcode an API key.
```
BigdataClient
```
reads
```
BIGDATA_API_KEY
```
and fail-fasts if absent — no plaintext fallback (that is exactly the pattern secret scanners catch).
Only ever reads — never writes or uploads. Every method is a read-only query (
```
uploads
```
is
```
NotImplementedError
```
in API-key mode anyway), so the toolkit can't mutate your account or push data anywhere.
Never invent an endpoint or a schema. Every signature here is runtime L4-verified or marked L3 (doc-confirmed, not yet run); see
```
references/verified_api_signatures.md
```
. For a new endpoint, confirm the path via
```
docs.bigdata.com/llms.txt
```
rather than guessing.

File layout

bigdata-skill/
├── SKILL.md                       # this file — routing + setup + quickstart
├── scripts/
│   ├── bigdata_toolkit/           # the verified, cost-guarded package
│   │   ├── client.py              # BigdataClient: SDK (.bd) + REST escape hatch (.http/.conn)
│   │   ├── kg.py                  # EntityResolver: name/ISIN/CUSIP/SEDOL → rp_entity_id
│   │   ├── search.py              # AnnotatedSearcher: chunks + sentiment + entity spans (SDK)
│   │   ├── rest_ext.py            # StructuredDataREST (estimates/financials/prices/dividends/sentiment/co-mentions/screener) + BatchSearch + fields_values_to_records — official REST
│   │   ├── cost.py                # CostTracker + CostModel: chunk billing + budget veto
│   │   └── retry.py               # rc(): SSL/transient-error retry passthrough
│   └── probe_example.py           # runnable end-to-end smoke test
└── references/
    ├── escape_hatch_architecture.md  # WHY the MCP is lossy; bd._api.http mechanism; adding endpoints
    ├── verified_api_signatures.md    # L4/L3-verified signatures + the two data faces, with evidence
    ├── cost_accounting.md            # chunk billing, the 52x trap, CostModel/CostTracker, budgeting
    └── known_pitfalls.md             # every pitfall above, with reproduction + fix

References

Read when you need to…	File
Understand why the MCP is insufficient and how the REST escape hatch works (and how to wrap a new `/v1/*` endpoint)	`references/escape_hatch_architecture.md`
Look up an exact verified method signature + its verification level	`references/verified_api_signatures.md`
Budget a backfill or debug a surprise quota burn	`references/cost_accounting.md`
Diagnose an error you hit while pulling data	`references/known_pitfalls.md`

bigdata-skill

NPX Install

Tags

SKILL.md Content

Bigdata.com SDK + REST Toolkit

The core problem this solves (read this first)

When to use this skill

Setup (one time)

Quickstart

Routing — which capability answers the question

The two data faces (do NOT say "Bigdata fails for Chinese / A-shares")

Cost discipline

Known pitfalls (already solved — don't re-debug these)

What this skill will not do

File layout

References