datapackage
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFrictionless Data Package Guide
Frictionless Data Package 指南
This skill covers any dataset described by a
Frictionless Data Package descriptor file
(). It is intentionally generic — it works for any conforming
datapackage, regardless of who published it or what the data contains.
datapackage.jsonFor PUDL-specific knowledge (S3 bucket paths, table tier conventions, data source
context, usage warnings), also use the skill on top of this one.
pudl本技能适用于所有由Frictionless Data Package描述符文件()定义的数据集。它具有通用特性——适用于任何符合规范的datapackage,无论发布者是谁或数据包含什么内容。
datapackage.json若需要PUDL相关的特定知识(S3存储桶路径、表层级约定、数据源背景、使用警告),可在本技能基础上搭配使用技能。
pudlWhat is a datapackage.json?
什么是datapackage.json?
A is a JSON file that describes a collection of tabular data
resources. Each resource represents one table (or file) and includes:
datapackage.json- : machine-readable identifier
name - : human-readable description, often including processing notes, primary keys, and usage warnings
description - : filename or URL of the actual data file
path - : list of columns, each with a
schema.fieldsandnamedescription
The file can be large (hundreds of resources, megabytes of JSON). Always query it
selectively — never load it whole into context.
datapackage.json- :机器可读的标识符
name - :人类可读的描述,通常包含处理说明、主键和使用警告
description - :实际数据文件的文件名或URL
path - :列的列表,每个列包含
schema.fields和namedescription
该文件可能很大(包含数百个资源、数MB的JSON数据)。请务必选择性地查询它——切勿将整个文件加载到上下文环境中。
Dependency check
依赖检查
Before querying metadata, verify is available:
jqbash
command -v jqIf not found, tell the user how to install it:
- macOS:
brew install jq - Linux (apt):
sudo apt install jq - Linux (conda):
conda install jq - Windows:
winget install jqlang.jq
For data loading and SQL queries, the , and skills from
must be installed. Install them from .
attach-dbqueryduckdb-skillsduckdb/duckdb-skills在查询元数据之前,请验证是否可用:
jqbash
command -v jq如果未找到,请告知用户安装方法:
- macOS:
brew install jq - Linux(apt):
sudo apt install jq - Linux(conda):
conda install jq - Windows:
winget install jqlang.jq
对于数据加载和SQL查询,必须安装中的和技能。可从安装它们。
duckdb-skillsattach-dbqueryduckdb/duckdb-skillsWorkflow overview
工作流程概述
- Locate the descriptor — find or download (see below).
datapackage.json - Query metadata selectively — use jq or DuckDB to extract only what you need. See Metadata Querying.
- Surface warnings — always check for usage warnings before presenting a resource.
- Validate (optional) — if the user wants to know whether the data actually
matches the descriptor, or if you're diagnosing a suspicious package, use
. See Frictionless Validate.
frictionless validate - Load the data (optional) — only if the user explicitly wants to query or explore the actual data. Data files can be large and remote access can be slow or costly. Don't initiate data loading as a follow-on to a metadata lookup without confirming the user wants it. See Storage Backends.
- 定位描述符——查找或下载(见下文)。
datapackage.json - 选择性查询元数据——使用jq或DuckDB仅提取所需内容。详见元数据查询。
- 显示警告——在展示资源前,务必检查使用警告。
- 验证(可选)——如果用户想了解数据是否与描述符匹配,或者你正在诊断可疑的数据包,请使用。详见Frictionless 验证。
frictionless validate - 加载数据(可选)——仅当用户明确想要查询或探索实际数据时才执行此操作。数据文件可能很大,远程访问可能缓慢或成本高昂。在未确认用户需求的情况下,请勿在元数据查询后自动启动数据加载。详见存储后端。
Reference index
参考索引
- Metadata Querying — locate the descriptor, query it selectively with jq or DuckDB, surface usage warnings
- Storage Backends — load data from Parquet, DuckDB, SQLite, or CSV files referenced by the descriptor
- Frictionless Validate — use the CLI to validate packages, check data quality, infer schemas, and diagnose unfamiliar descriptors; read when the user wants to validate a descriptor, check if data matches its schema, or understand what the
frictionlesstool can tell them about a packagefrictionless
- 元数据查询——定位描述符,使用jq或DuckDB选择性查询,显示使用警告
- 存储后端——从描述符引用的Parquet、DuckDB、SQLite或CSV文件中加载数据
- Frictionless 验证——使用CLI验证数据包、检查数据质量、推断模式并诊断不熟悉的描述符;当用户想要验证描述符、检查数据是否匹配其模式,或了解
frictionless工具能提供的数据包相关信息时,可阅读此部分frictionless
Community patterns and recipes
社区模式与实践
The datapackage standard is permissive: publishers frequently add non-standard fields.
Two conventions are worth knowing immediately:
- Custom fields — non-standard keys added by publishers are common and valid.
The prefix convention marks system-generated or platform-specific keys (e.g.
_,_cache). Some publishers add custom keys without the prefix (e.g. PUDL adds_platformVersion,duckdb_tableon database-backed resources). Treat unknown fields as informational metadata, not errors.sqlite_table - Compressed resources — a resource with a or
.gzpath may have an explicit.zipfield. The"compression": "gz"andbytesfields apply to the compressed file, not the uncompressed original.hash
For other patterns (catalogs, versioning, external foreign keys, translation support,
field relationships, etc.), fetch the relevant page on demand:
- v1 patterns: https://specs.frictionlessdata.io/patterns/
- v2 recipes: https://datapackage.org/recipes/caching-of-resources/ (navigate via sidebar or next/previous links — no index page exists)
Both pages cover largely the same set of community conventions; consult whichever
matches the descriptor version you're working with.
datapackage标准具有灵活性:发布者经常添加非标准字段。有两个约定需要立即了解:
- 自定义字段——发布者添加的非标准键是常见且有效的。前缀约定用于标记系统生成或平台特定的键(如
_、_cache)。部分发布者添加不带前缀的自定义键(如PUDL在基于数据库的资源上添加_platformVersion、duckdb_table)。将未知字段视为信息元数据,而非错误。sqlite_table - 压缩资源——路径带有或
.gz的资源可能包含显式的.zip字段。"compression": "gz"和bytes字段适用于压缩文件,而非未压缩的原始文件。hash
如需了解其他模式(目录、版本控制、外部外键、翻译支持、字段关系等),可按需获取相关页面:
- v1模式:https://specs.frictionlessdata.io/patterns/
- v2实践:https://datapackage.org/recipes/caching-of-resources/(通过侧边栏或上一页/下一页链接导航——无索引页面)
两个页面涵盖的社区约定基本相同;请根据你所处理的描述符版本选择查阅。
Companion skills
配套技能
This skill delegates actual data querying to:
- — attach a
/duckdb-skills:attach-dbor.duckdbdatabase file and set up a persistent session for querying.sqlite - — run SQL or natural language queries against attached databases, ad-hoc files (Parquet, CSV, remote HTTPS/S3), and JSON files including
/duckdb-skills:queryitself (via DuckDB'sdatapackage.json)read_json
These skills must be installed. See in the project root.
skills-lock.json本技能将实际数据查询委托给以下技能:
- ——附加
/duckdb-skills:attach-db或.duckdb数据库文件,并设置持久化查询会话.sqlite - ——对附加的数据库、临时文件(Parquet、CSV、远程HTTPS/S3)以及包括
/duckdb-skills:query在内的JSON文件(通过DuckDB的datapackage.json)运行SQL或自然语言查询read_json
这些技能必须安装。请查看项目根目录下的。
skills-lock.jsonKey constraints
关键约束
- Golden rule: never load the full datapackage.json into context. It may be megabytes with hundreds of resources. Always query selectively.
- Read the full description before presenting a resource. Descriptions often contain important context: processing notes, primary key conventions, data provenance, or caveats about known limitations. Don't skip them.
- Use to install Python packages — prefer
uvoveruv add <package>.pip install <package>is faster and installs into a virtual environment rather than globally. Fall back touvonly ifpipis not available (uvreturns nothing).command -v uv - Do not use Python to query descriptor metadata. Python is not the right tool here — it loads the full JSON into memory (violating the golden rule above), adds unnecessary dependencies, and can't easily handle remote descriptors. Use jq for metadata-only tasks; use DuckDB when you need to combine metadata queries with data queries. Python is only appropriate for loading data (via pandas or polars) after you already know which table and columns you need.
- **黄金法则:切勿将完整的datapackage.json加载到上下文环境中。**它可能有数MB大小,包含数百个资源。务必选择性查询。
- **在展示资源前阅读完整描述。**描述中通常包含重要背景信息:处理说明、主键约定、数据来源,或已知限制的说明。请勿跳过。
- 使用安装Python包——优先使用
uv而非uv add <package>。pip install <package>速度更快,且会安装到虚拟环境而非全局环境。仅当uv不可用时(uv无返回结果),才退而使用command -v uv。pip - **不要使用Python查询描述符元数据。**Python并非合适的工具——它会将完整的JSON加载到内存中(违反上述黄金法则),增加不必要的依赖,且难以处理远程描述符。仅处理元数据时使用jq;当需要结合元数据查询与数据查询时使用DuckDB。仅当你已经明确需要哪些表和列时,才适合使用Python(通过pandas或polars)加载数据。
Schema reference and version detection
模式参考与版本检测
Two versions of the Frictionless Data Package standard are in common use. Identify the
version from the top-level descriptor before parsing:
| Field present | Version | Example value |
|---|---|---|
| v2.0 | |
| v1.0 | |
| neither | ambiguous (treat as v1 baseline) | — |
Key differences between versions that affect parsing:
- Contributors — v1 has (singular string); v2 has
"role": "author"(array). Both may appear in the wild."roles": ["author"] - Name pattern — v1 enforces strictly lowercase ; v2 is unrestricted.
[-a-z0-9._/] - field — present in v2, absent in v1.
version
Bundled schemas:
- — v1.0 (JSON Schema draft-04). Used by FERC XBRL packages and many older datasets.
assets/datapackage-v1.schema.json - — v2.0 (JSON Schema draft-07). The current standard. Canonical version always at: https://datapackage.org/profiles/2.0/datapackage.json
assets/datapackage-v2.schema.json
Read the appropriate schema when you need to understand which fields are valid in a
descriptor or validate one programmatically.
Frictionless Data Package标准有两个版本被广泛使用。在解析前,请从顶层描述符中识别版本:
| 存在的字段 | 版本 | 示例值 |
|---|---|---|
| v2.0 | |
| v1.0 | |
| 均不存在 | 模糊(视为v1基线) | — |
影响解析的版本间关键差异:
- 贡献者——v1使用(单个字符串);v2使用
"role": "author"(数组)。两种形式在实际中都可能出现。"roles": ["author"] - 名称规则——v1严格强制使用小写;v2无限制。
[-a-z0-9._/] - 字段——v2中存在,v1中不存在。
version
内置模式:
- ——v1.0(JSON Schema draft-04)。FERC XBRL数据包和许多旧数据集使用此版本。
assets/datapackage-v1.schema.json - ——v2.0(JSON Schema draft-07)。当前标准。标准版本始终位于:https://datapackage.org/profiles/2.0/datapackage.json
assets/datapackage-v2.schema.json
当你需要了解描述符中哪些字段有效,或需要以编程方式验证描述符时,请阅读相应的模式。