chdb-sql

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

chdb SQL — ClickHouse in Your Python Process

chdb SQL — 进程内的Python版ClickHouse

Run ClickHouse SQL directly in Python — no server needed. Query local files, remote databases, and cloud storage with full ClickHouse SQL power.
bash
pip install chdb
无需服务器,直接在Python中运行ClickHouse SQL。借助完整的ClickHouse SQL能力查询本地文件、远程数据库和云存储。
bash
pip install chdb

Decision Tree: Pick the Right API

决策树:选择合适的API

1. One-off query on files or databases → chdb.query()
2. Multi-step analysis with tables      → Session
3. DB-API 2.0 connection                → chdb.connect()
4. Pandas-style DataFrame operations    → Use chdb-datastore skill instead
1. 一次性文件或数据库查询 → chdb.query()
2. 多步骤表分析 → Session
3. DB-API 2.0连接 → chdb.connect()
4. Pandas风格的DataFrame操作 → 请改用chdb-datastore技能

chdb.query() — One Line, Any Data

chdb.query() — 一行代码,处理任意数据

python
import chdb

chdb.query("SELECT * FROM file('data.parquet', Parquet) WHERE price > 100 LIMIT 10")       # local files
chdb.query("SELECT * FROM mysql('db:3306', 'shop', 'orders', 'root', 'pass')")              # databases
chdb.query("SELECT * FROM s3('s3://bucket/data.parquet', NOSIGN) LIMIT 10")                 # cloud storage
chdb.query("SELECT * FROM deltaLake('s3://bucket/delta/table', NOSIGN) LIMIT 10")           # data lakes
python
import chdb

chdb.query("SELECT * FROM file('data.parquet', Parquet) WHERE price > 100 LIMIT 10")       # local files
chdb.query("SELECT * FROM mysql('db:3306', 'shop', 'orders', 'root', 'pass')")              # databases
chdb.query("SELECT * FROM s3('s3://bucket/data.parquet', NOSIGN) LIMIT 10")                 # cloud storage
chdb.query("SELECT * FROM deltaLake('s3://bucket/delta/table', NOSIGN) LIMIT 10")           # data lakes

Cross-source join

Cross-source join

chdb.query(""" SELECT u.name, o.amount FROM mysql('db:3306', 'crm', 'users', 'root', 'pass') AS u JOIN file('orders.parquet', Parquet) AS o ON u.id = o.user_id ORDER BY o.amount DESC """)
data = {"name": ["Alice", "Bob"], "score": [95, 87]} chdb.query("SELECT * FROM Python(data) ORDER BY score DESC") # Python data df = chdb.query("SELECT * FROM numbers(10)", "DataFrame") # output formats chdb.query("SELECT toDate({d:String}) + number FROM numbers({n:UInt64})", "DataFrame", params={"d": "2025-01-01", "n": 30}) # parametrized

Table functions → [table-functions.md](references/table-functions.md) | SQL functions → [sql-functions.md](references/sql-functions.md) | Full API → [api-reference.md](references/api-reference.md)
chdb.query(""" SELECT u.name, o.amount FROM mysql('db:3306', 'crm', 'users', 'root', 'pass') AS u JOIN file('orders.parquet', Parquet) AS o ON u.id = o.user_id ORDER BY o.amount DESC """)
data = {"name": ["Alice", "Bob"], "score": [95, 87]} chdb.query("SELECT * FROM Python(data) ORDER BY score DESC") # Python data df = chdb.query("SELECT * FROM numbers(10)", "DataFrame") # output formats chdb.query("SELECT toDate({d:String}) + number FROM numbers({n:UInt64})", "DataFrame", params={"d": "2025-01-01", "n": 30}) # parametrized

表函数 → [table-functions.md](references/table-functions.md) | SQL函数 → [sql-functions.md](references/sql-functions.md) | 完整API → [api-reference.md](references/api-reference.md)

Session — Stateful Analysis Pipelines

Session — 有状态分析管道

python
from chdb import session as chs
sess = chs.Session("./analytics_db")   # persistent; Session() for in-memory

sess.query("CREATE TABLE users ENGINE=MergeTree() ORDER BY id AS SELECT * FROM mysql('db:3306','crm','users','root','pass')")
sess.query("CREATE TABLE events ENGINE=MergeTree() ORDER BY (ts,user_id) AS SELECT * FROM s3('s3://logs/events/*.parquet',NOSIGN)")
sess.query("""
    SELECT u.country, count() AS cnt, uniqExact(e.user_id) AS users
    FROM events e JOIN users u ON e.user_id = u.id
    WHERE e.ts >= today() - 7 GROUP BY u.country ORDER BY cnt DESC
""", "Pretty").show()
sess.close()
python
from chdb import session as chs
sess = chs.Session("./analytics_db")   # persistent; Session() for in-memory

sess.query("CREATE TABLE users ENGINE=MergeTree() ORDER BY id AS SELECT * FROM mysql('db:3306','crm','users','root','pass')")
sess.query("CREATE TABLE events ENGINE=MergeTree() ORDER BY (ts,user_id) AS SELECT * FROM s3('s3://logs/events/*.parquet',NOSIGN)")
sess.query("""
    SELECT u.country, count() AS cnt, uniqExact(e.user_id) AS users
    FROM events e JOIN users u ON e.user_id = u.id
    WHERE e.ts >= today() - 7 GROUP BY u.country ORDER BY cnt DESC
""", "Pretty").show()
sess.close()

Connection API (DB-API 2.0)

连接API(DB-API 2.0)

python
from chdb import dbapi
conn = dbapi.connect()
cur = conn.cursor()
cur.execute("SELECT * FROM file('data.parquet', Parquet) WHERE value > 100")
print(cur.fetchall())
cur.close()
conn.close()
python
from chdb import dbapi
conn = dbapi.connect()
cur = conn.cursor()
cur.execute("SELECT * FROM file('data.parquet', Parquet) WHERE value > 100")
print(cur.fetchall())
cur.close()
conn.close()

Troubleshooting

故障排查

ProblemFix
ImportError: No module named 'chdb'
pip install chdb
DB::Exception: FILE_NOT_FOUND
Check file path; use absolute path or verify cwd
DB::Exception: Unknown table function
Check function name spelling (e.g.,
deltaLake
not
deltalake
)
Connection refused to remote DBCheck host:port format; ensure remote DB allows connections
Environment checkRun
python scripts/verify_install.py
(from skill directory)
问题解决方法
ImportError: No module named 'chdb'
执行
pip install chdb
DB::Exception: FILE_NOT_FOUND
检查文件路径;使用绝对路径或验证当前工作目录(cwd)
DB::Exception: Unknown table function
检查函数名称拼写(例如
deltaLake
而非
deltalake
无法连接到远程数据库检查host:port格式;确保远程数据库允许连接
环境检查运行
python scripts/verify_install.py
(从技能目录执行)

References

参考资料

  • API Reference — query/Session/connect signatures
  • Table Functions — All ClickHouse table functions
  • SQL Functions — Commonly used SQL functions
  • Examples — 9 runnable examples with expected output
  • Official Docs
Note: This skill teaches how to use chdb SQL. For pandas-style operations, use the
chdb-datastore
skill. For contributing to chdb source code, see CLAUDE.md in the project root.
  • API参考文档 — query/Session/connect方法签名
  • 表函数 — 所有ClickHouse表函数
  • SQL函数 — 常用SQL函数
  • 示例 — 9个可运行示例及预期输出
  • 官方文档
注意:本技能讲解如何使用chdb SQL。 若需进行pandas风格操作,请使用
chdb-datastore
技能。 若要为chdb源代码做贡献,请查看项目根目录下的CLAUDE.md。