ibis-data
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseIbis Data Interface
Ibis 数据接口
Ibis provides a database-agnostic Python DataFrame API. Write queries once in Python; Ibis translates them to optimized SQL for the connected backend (DuckDB, PostgreSQL, SQLite, etc.).
Ibis 提供了一个与数据库无关的Python DataFrame API。只需用Python编写一次查询,Ibis就会将其转换为适配所连接后端(DuckDB、PostgreSQL、SQLite等)的优化SQL。
Why Ibis
为什么选择Ibis
- Portability: Develop with DuckDB, deploy against PostgreSQL -- change only the connection
- Lazy evaluation: Operations build an expression tree; nothing executes until
.execute() - Full SQL power: Window functions, CTEs, joins, aggregations -- all through Python
- No ORM: You get SQL performance without SQL strings
- 可移植性:使用DuckDB开发,部署到PostgreSQL环境——只需修改连接配置即可
- 惰性求值:操作会构建表达式树,直到调用才会执行
.execute() - 完整SQL能力:窗口函数、CTE、连接、聚合——全部通过Python实现
- 无ORM:无需编写SQL字符串即可获得SQL级别的性能
Connecting
连接数据库
python
import ibispython
import ibisDuckDB (default for local/parquet work)
DuckDB(本地/Parquet文件处理的默认选择)
con = ibis.duckdb.connect()
con = ibis.duckdb.connect("my.duckdb")
con = ibis.duckdb.connect()
con = ibis.duckdb.connect("my.duckdb")
PostgreSQL
PostgreSQL
con = ibis.postgres.connect(host="localhost", database="mydb", user="user", password="pass")
con = ibis.postgres.connect(host="localhost", database="mydb", user="user", password="pass")
SQLite
SQLite
con = ibis.sqlite.connect("my.sqlite")
con = ibis.sqlite.connect("my.sqlite")
Read files directly
直接读取文件
table = con.read_parquet("data.parquet")
table = con.read_csv("data.csv")
undefinedtable = con.read_parquet("data.parquet")
table = con.read_csv("data.csv")
undefinedCore Operations
核心操作
python
undefinedpython
undefinedExplore
探索数据
table.schema() # column names and types
table.head(10) # preview rows
table.describe() # summary statistics
table.count().execute() # row count
table.schema() # 列名和类型
table.head(10) # 预览行数据
table.describe() # 统计摘要
table.count().execute() # 行数统计
Select and filter
选择与过滤
selected = table.select("id", "amount", "date")
filtered = table.filter((table.amount > 100) & (table.date >= "2024-01-01"))
sorted_data = table.order_by(table.amount.desc())
selected = table.select("id", "amount", "date")
filtered = table.filter((table.amount > 100) & (table.date >= "2024-01-01"))
sorted_data = table.order_by(table.amount.desc())
Transform
数据转换
enriched = table.mutate(
revenue=table.quantity * table.unit_price,
year=table.date.year(),
size=ibis.case()
.when(table.amount < 100, "small")
.when(table.amount < 1000, "medium")
.else_("large")
.end()
)
enriched = table.mutate(
revenue=table.quantity * table.unit_price,
year=table.date.year(),
size=ibis.case()
.when(table.amount < 100, "small")
.when(table.amount < 1000, "medium")
.else_("large")
.end()
)
Aggregate
聚合操作
summary = (
table.group_by("category")
.aggregate(
total=table.amount.sum(),
avg=table.amount.mean(),
count=table.count()
)
)
summary = (
table.group_by("category")
.aggregate(
total=table.amount.sum(),
avg=table.amount.mean(),
count=table.count()
)
)
Join
表连接
joined = (
orders
.join(customers, orders.customer_id == customers.id, how="left")
.select(orders.order_id, orders.amount, customers.name)
)
joined = (
orders
.join(customers, orders.customer_id == customers.id, how="left")
.select(orders.order_id, orders.amount, customers.name)
)
Window functions
窗口函数
ranked = table.mutate(
rank=table.amount.rank().over(
ibis.window(group_by="category", order_by=table.amount.desc())
)
)
ranked = table.mutate(
rank=table.amount.rank().over(
ibis.window(group_by="category", order_by=table.amount.desc())
)
)
Execute and export
执行与导出
df = summary.execute() # -> pandas DataFrame
con.to_parquet(summary, "out.parquet")
df.to_csv("out.csv", index=False)
undefineddf = summary.execute() # -> pandas DataFrame
con.to_parquet(summary, "out.parquet")
df.to_csv("out.csv", index=False)
undefinedAPI Reference
API 参考
| API | What it covers |
|---|---|
| Table expressions | |
| Selectors | Choose columns by name, type, or regex |
| Generic expressions | |
| Numeric expressions | |
| String expressions | Slicing, regex, case conversion, stripping |
| Temporal expressions | |
| Collection expressions | Array/map operations, unnesting |
| JSON expressions | Path-based extraction from JSON columns |
| API | 涵盖内容 |
|---|---|
| Table expressions | |
| Selectors | 按名称、类型或正则表达式选择列 |
| Generic expressions | |
| Numeric expressions | |
| String expressions | 切片、正则、大小写转换、去除空白 |
| Temporal expressions | |
| Collection expressions | 数组/映射操作、展开嵌套 |
| JSON expressions | 从JSON列中基于路径提取数据 |
Best Practices
最佳实践
- Filter early: Reduce data volume before aggregations
- Stay lazy: Chain operations before calling
.execute() - Use selectors: Apply operations to multiple columns programmatically
- Handle nulls: Check with and handle with
.isnull()explicitly.fillna() - Check SQL: Use to inspect generated queries
ibis.to_sql(expr)
- 尽早过滤:在聚合前减少数据量
- 保持惰性:在调用前链式调用操作
.execute() - 使用选择器:以编程方式对多列应用操作
- 处理空值:使用检查并显式用
.isnull()处理.fillna() - 检查SQL:使用查看生成的查询语句
ibis.to_sql(expr)
Installation
安装
bash
uv add "ibis-framework[duckdb]" # DuckDB backend
uv add "ibis-framework[postgres]" # PostgreSQL backendbash
uv add "ibis-framework[duckdb]" # DuckDB 后端
uv add "ibis-framework[postgres]" # PostgreSQL 后端