sdmx-explorer
Guided, interactive exploration of statistical data via SDMX providers (Eurostat, OECD, ECB, World Bank, ISTAT, and others) using the opensdmx CLI. Use this skill whenever the user asks ANY question about statistics or data that could be answered with SDMX data — even if they don't mention SDMX, Eurostat, or any provider by name. Topics include demographics, economy, employment, births, deaths, population, prices, trade, health, agriculture, GDP, inflation, unemployment, fertility rates, migration, energy, education, poverty, housing, and any other statistical topic. Also use it when the user mentions a specific dataflow ID they want to explore. Trigger this skill even for implicit questions like "how many births were there in Italy last year?" or "I need EU unemployment data by age group" — these clearly need SDMX data even if the user doesn't say so. The skill guides the user step by step: discovers relevant datasets, proposes the most meaningful candidates, explores the schema using real constraints (not codelists), explains the dataset structure, and invites the user to make informed filter choices before fetching any data.
NPX Install
npx skill4agent add aborruso/opensdmx sdmx-explorerTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →SDMX Explorer — Guided Dataset Discovery
opensdmx--helpopensdmx --help # list all commands
opensdmx search --help # options and examples for search
opensdmx constraints --help # options and examples for constraints
opensdmx get --help # options and examples for get
opensdmx run --help # options and examples for run
# ... same for info, values, constraints, embed, blacklist, plotPhase 1 — Discovery: find candidate dataflows
Step 1a — Extract keywords AND expected dimensions
-
Topic keywords (2–4 terms) for thecall. Example: "unemployment", "labour force"
opensdmx search -
Expected dimensions — the analytical angles the user wants to slice by. These are often NOT in the dataset title or description, but must appear as dimensions in the dataflow structure. Extract them explicitly:
User says Expected dimension "by age group" / "per fascia di età" age"by sex" / "per sesso" sex"by country" / "per paese" geo"by region" (NUTS level)geo"by education level" or similarisced11"quarterly" / "monthly" freqExample: "unemployment for EU countries, by age group and sex" → topic keywords:; expected dimensions:unemployment,age,sex.geo
Step 1b — Search and pre-filter candidates
- Eurostat (default provider — no flag needed):
--provideropensdmx search "<keyword>" - ISTAT:
opensdmx search "<keyword>" --provider istat - Other providers: (available:
opensdmx search "<keyword>" --provider <name>,oecd,ecb,worldbank,insee,bundesbank)abs
opensdmx providersopensdmx info <id>opensdmx search "unemployment" --page 2 # results 21-40
opensdmx search "unemployment" --page 3 # results 41-6021-40 of 114--all"I didn't find much with a keyword search. I can try a semantic search instead — it matches by meaning, not exact words, so it can find datasets even when the terminology differs. It requires Ollama to be running and is slower (10–30 s). Want me to try?"
opensdmx search --semantic "<query>"# Example: verify age and sex are present
opensdmx info UNE_RT_A # ✓ has age, sex, geo → keep
opensdmx info TIPSUN20 # ✗ no age, no sex → discardStep 1c — Present verified candidates
I found these datasets that could answer your question:
1. **UNE_RT_A** — Unemployment by sex and age – annual data (Eurostat) ⭐ recommended
Has all three dimensions you need: age (7 ranges), sex (F/M/total), geo (38 countries).
Annual data from 2003 to 2025. Clean structure, no extra mandatory filters.
2. **LFSA_URGAED** — Unemployment rates by educational attainment level (Eurostat)
Also has age (29 ranges!) and sex, but adds a mandatory education-level dimension
(ISCED11). More granular age breakdown, but requires choosing an education filter.
Best if you also want to break down by education.
3. **MET_LFU3RT** — Unemployment rates by sex, age and metropolitan region (Eurostat)
Has age and sex, but geo is at metropolitan region level — not country level.
Not suitable for country comparisons.
Which one would you like to explore? You can also say "the first one" or
describe more precisely what you need.Phase 2 — Schema: explore the chosen dataflow
Default flow (Eurostat, OECD, ECB, etc.)
opensdmx constraints PRC_HICP_MANR
# shows all dimensions with count and sample of codes
opensdmx constraints PRC_HICP_MANR coicop
# shows full list of codes present in that dimension, with labelsopensdmx constraintsavailableconstraintopensdmx info PRC_HICP_MANR
# (no --provider needed for Eurostat, it's the default)opensdmx valuesopensdmx constraintsopensdmx valuesISTAT fast flow (recommended)
availableconstraintopensdmx info <dataflow_id> --provider istatopensdmx values <dataflow_id> REF_AREA --provider istat
opensdmx values <dataflow_id> DATA_TYPE --provider istatvaluesgrep -igetopensdmx get <dataflow_id> --provider istat --REF_AREA <code> --last-n 1opensdmx constraintsExtract from both flows
- Dimension list in order (position matters for URL construction later)
- Available codes for each dimension, with descriptions
- Time range (StartPeriod / EndPeriod)
- Dimensions with more than one available value (these are the meaningful filters)
Phase 3 — Presentation: explain the dataset to the user
What the dataset contains
Granularity
- Geographic: national only? Regions? EU countries? Global?
- Temporal: what years are available? Annual, monthly, quarterly?
Key dimensions to filter
- **Country** (GEO): IT (Italy), DE (Germany), FR (France)… 35 countries
- **Indicator** (INDIC_DE): live births (GBIRTHS), deaths (DEATH),
crude birth rate (CNBIRTHS), total fertility rate (TOTFERRT)…
- **Period**: 1960 to 2024, annual frequencyEstimated size
Invitation to choose
How would you like to proceed?
- Do you want data for a specific country or a European comparison?
- Which time period are you interested in?
- Are there any dimensions you want to filter?
Tell me what you want and I'll build the query.Phase 4 — Data retrieval: after the user decides
Building the query — critical rules
- Dimension order must match the output exactly. Never guess the order.
opensdmx info - Use only codes confirmed by , never codes from
opensdmx constraintsor other sources. Providers often return 404 or empty results for invalid codes.opensdmx values - For dimensions with a single available value, include that value — don't skip them.
- For unfiltered dimensions (user wants all values), use as wildcard.
.
--geo--coicop--freq--REF_AREA--DATA_TYPE--FREQStep 1 — Verify with a preview (last observation)
--last-n 1opensdmx get PRC_HICP_MANR --coicop CP00 --geo IT+DE+FR --start-period 2020 --end-period 2023 --last-n 1--last-n 1--last-n 1--first-n NStep 2 — Provide the download URL
https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/{dataflow_id}/{dim1.dim2...}/ALL/?startPeriod={start}&endPeriod={end}&format=SDMX-CSVopensdmx info.+curl "https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/APRO_CPNH1/A.I2200.AR./ALL/?startPeriod=2014&endPeriod=2023&format=SDMX-CSV"https://esploradati.istat.it/SDMXWS/rest/data/{dataflow_id}/{dim1.dim2...}?startPeriod={start}&endPeriod={end}Step 3 — Ask the user what to do next
Would you like me to download and save the full dataset locally for analysis?
If yes, tell me where to save it (e.g. /tmp/data.csv) or I'll use a default path.
I can then run a quick analysis: row count, top values, time range, flagged records.--out <path>Step 4 — Offer to save the query as a reusable template
This query worked well. Would you like to save it as a YAML template so you can
re-run it later without remembering all the parameters?
opensdmx get <id> [filters] --out data.csv --query-file my_query.yaml
The YAML captures provider, dataset, filters with human-readable descriptions,
and time range. To re-run it later:
opensdmx run my_query.yaml
opensdmx run my_query.yaml --out fresh_data.csv
The file is also useful for version control and sharing with colleagues.unemployment_eu_2020_2024.yamlgdp_annual_eurostat.yamlStep 5 — Offer metadata and README
Two optional extras:
1. **Full metadata**: do you want the complete list of codes and labels for each
dimension (e.g. all country names, all flag meanings)? I can extract them from
the opensdmx cache and save them as a companion file (e.g. `metadata.csv`).
2. **README**: do you want a `README.md` that documents the dataset schema —
columns, dimension codes with labels, flag meanings, units, and source URL?
Useful if you plan to share the data or revisit it later.- Run for each dimension with more than one value to get the full code → label mapping.
opensdmx constraints <dataflow_id> <dim> - Combine all dimensions into a single metadata file with columns:
,
dimension,code.label - Save it alongside the data file (e.g. ).
tomato_production_metadata.csv
- Generate a in the same folder as the data file.
README.md - The goal is to make the output verifiable (check values against the source), evaluable (judge quality and scope), and repeatable (reproduce from scratch).
- Follow the full template in references/readme-template.md.
In summary, include:
- Files produced (table)
- One section per source dataflow: ID, provider, filters with labels, unit, last update date, and the exact download URL
- Derivations: join keys, filters applied after download, computed columns with explicit formulas (not prose)
- Column schema: name, type, description, unit for every column in the output
- Flag legend: only flags actually present in the data, with row counts
- Coverage table when geographic or categorical gaps exist
- Caveats: scope limitations, reporting lags, known biases
Step 6 — Visualization
opensdmx plot--geom- (default): line chart with points — best for time series
line - : vertical bar chart — best for comparing values across categories over time; with
barproduces stacked bars--color - : horizontal bar chart — best for rankings; bars are automatically sorted by value (lowest at bottom, highest at top)
barh - : scatter plot — best for correlations between two numeric variables
point
- Always prepare data with DuckDB before plotting (separate units, limit series, remove aggregates, use year strings for annual data)
- After generating a chart, read the image and evaluate it — if it's not good, fix it yourself before showing the user
- Multiple focused charts are better than one overloaded chart
Key principles
opensdmx constraintsopensdmx values- → "mother's citizenship"
CITIZENSHIP_MOTHER - → "live births (absolute count)"
DATA_TYPE: LBIRTH - → "Italy"
GEO: IT - → "live births"
INDIC_DE: GBIRTHS - → "crude birth rate (per 1,000 inhabitants)" Never show raw codes without an explanation.
INDIC_DE: CNBIRTHS
- or
OBS_FLAG: quality/availability flags — look up what each code means in the context of that provider (OBS_STATUS= break in series,b= estimated,e= not significant,n= unreliable,u= provisional, etc.)p - : multiplier applied to the value (e.g.
UNIT_MULTmeans values are in thousands)3 - : confidentiality status
CONF_STATUS - : free-text annotations attached to specific dimensions
NOTE_*
| Provider | Notes |
|---|---|
| ISTAT | Use |
| Eurostat | Default provider (no |
| OECD | Use |
| ECB | Use |
| World Bank | Use |
WDI# Step 1 — find the indicator code (replaces opensdmx search)
opensdmx values WDI SERIES --provider worldbank 2>&1 | grep -i "gdp per capita"
# → NY_GDP_PCAP_KD (constant USD), NY_GDP_PCAP_PP_KD (PPP), etc.
# Step 2 — get structure
opensdmx info WDI --provider worldbank
# → 3 dimensions: FREQ · SERIES · REF_AREA
# Step 3 — find country codes (alpha-3, not alpha-2)
opensdmx values WDI REF_AREA --provider worldbank 2>&1 | grep -i "italy\|germany"
# → ITA, DEU (not IT, DE)
# Step 4 — build query (skip constraints — endpoint returns 400)
opensdmx get WDI --provider worldbank --SERIES NY_GDP_PCAP_KD \
--REF_AREA ITA+DEU+FRA --start-period 2000ITDEFRESEU27_2020EA20opensdmx constraintsITG12SLL_*opensdmx values <dataflow_id> REF_AREA --provider istatgrep -iopensdmx values <dataflow_id> REF_AREA --provider istat 2>&1 | grep -i "palermo\|matera"opensdmx constraints <dataflow_id> REF_AREA --provider istatavailableconstraintvaluesgrepget