ktx-ai-data-agents

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ktx-ai-data-agents

ktx-ai-data-agents

Skill by ara.so — MCP Skills collection.
ktx is an executable context layer for data and analytics agents. It allows Claude Code, Codex, Cursor, and other AI agents to query data warehouses accurately through MCP by providing skills, memory, and a semantic layer. ktx automatically learns from your company knowledge (wikis, dbt, Looker, Metabase), maps your data stack, detects joinable columns, resolves fan/chasm traps, and serves approved metric definitions to agents.
ara.so开发的Skill — MCP Skills集合。
ktx是一款面向数据分析AI Agent的可执行上下文层。它通过MCP提供技能、记忆与语义层功能,让Claude Code、Codex、Cursor等AI Agent能够精准查询数据仓库。ktx会自动从企业知识库(维基、dbt、Looker、Metabase)中学习,映射数据栈、检测可关联列、解决扇出/鸿沟陷阱,并向Agent提供已审批的指标定义。

Installation

安装

Install ktx globally via npm:
bash
npm install -g @kaelio/ktx
Or use npx for one-off commands:
bash
npx @kaelio/ktx setup
通过npm全局安装ktx:
bash
npm install -g @kaelio/ktx
或使用npx执行一次性命令:
bash
npx @kaelio/ktx setup

Quick Start

快速开始

Initialize a ktx project in your analytics directory:
bash
ktx setup
This interactive command:
  • Creates or resumes a local ktx project
  • Configures LLM and embedding providers
  • Connects to your data warehouse(s)
  • Configures context sources (dbt, Looker, Metabase, Notion)
  • Builds initial context
  • Installs agent integration
Check project status:
bash
ktx status
Expected output after successful setup:
text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
在你的分析目录中初始化ktx项目:
bash
ktx setup
这个交互式命令会:
  • 创建或恢复本地ktx项目
  • 配置LLM与嵌入模型提供商
  • 连接你的数据仓库
  • 配置上下文源(dbt、Looker、Metabase、Notion)
  • 构建初始上下文
  • 安装Agent集成
检查项目状态:
bash
ktx status
成功设置后的预期输出:
text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Project Structure

项目结构

After running
ktx setup
, your project structure:
text
my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets (git-ignored)
Important: Commit
ktx.yaml
,
semantic-layer/
, and
wiki/
. Never commit
.ktx/
.
运行
ktx setup
后,你的项目结构如下:
text
my-project/
├── ktx.yaml                         # 项目配置文件
├── semantic-layer/<connection-id>/  # YAML语义源文件
├── wiki/global/                     # 共享业务上下文
├── wiki/user/<user-id>/             # 用户专属笔记
├── raw-sources/<connection-id>/     # 导入的工件与报告
└── .ktx/                            # 本地状态与密钥(Git忽略)
重要提示:提交
ktx.yaml
semantic-layer/
wiki/
目录。绝对不要提交
.ktx/
目录。

Core Commands

核心命令

Context Building

上下文构建

Build context for all configured connections:
bash
ktx ingest
Build context for a specific connection:
bash
ktx ingest --connection-id warehouse
Force rebuild all context:
bash
ktx ingest --force
为所有已配置的连接构建上下文:
bash
ktx ingest
为特定连接构建上下文:
bash
ktx ingest --connection-id warehouse
强制重建所有上下文:
bash
ktx ingest --force

Searching Context

上下文搜索

Search semantic layer sources:
bash
ktx sl "revenue"
ktx sl "monthly active users"
ktx sl "customer churn rate"
Search wiki knowledge:
bash
ktx wiki "refund policy"
ktx wiki "customer segmentation"
搜索语义层源:
bash
ktx sl "revenue"
ktx sl "monthly active users"
ktx sl "customer churn rate"
搜索维基知识库:
bash
ktx wiki "refund policy"
ktx wiki "customer segmentation"

MCP Server

MCP服务器

Start the MCP server for agent clients:
bash
ktx mcp start
Start with specific project directory:
bash
ktx mcp start --project-dir /path/to/project
启动面向Agent客户端的MCP服务器:
bash
ktx mcp start
指定项目目录启动:
bash
ktx mcp start --project-dir /path/to/project

Configuration

配置

ktx.yaml Example

ktx.yaml示例

yaml
version: 1
project:
  name: analytics
  description: Company data warehouse and analytics

llm:
  provider: anthropic
  model: claude-sonnet-4-6
  # API key stored in .ktx/secrets.yaml

embeddings:
  provider: openai
  model: text-embedding-3-small
  # API key stored in .ktx/secrets.yaml

connections:
  warehouse:
    type: postgres
    description: Main data warehouse
    # Connection details in .ktx/secrets.yaml

context_sources:
  dbt_main:
    type: dbt
    path: ./dbt
    connection_id: warehouse
  
  looker_models:
    type: lookml
    path: ./looker
    connection_id: warehouse
  
  notion_docs:
    type: notion
    # Notion token in .ktx/secrets.yaml
yaml
version: 1
project:
  name: analytics
  description: Company data warehouse and analytics

llm:
  provider: anthropic
  model: claude-sonnet-4-6
  # API密钥存储在.ktx/secrets.yaml中

embeddings:
  provider: openai
  model: text-embedding-3-small
  # API密钥存储在.ktx/secrets.yaml中

connections:
  warehouse:
    type: postgres
    description: Main data warehouse
    # 连接详情存储在.ktx/secrets.yaml中

context_sources:
  dbt_main:
    type: dbt
    path: ./dbt
    connection_id: warehouse
  
  looker_models:
    type: lookml
    path: ./looker
    connection_id: warehouse
  
  notion_docs:
    type: notion
    # Notion令牌存储在.ktx/secrets.yaml中

Database Connection Types

数据库连接类型

ktx supports:
  • PostgreSQL
  • Snowflake
  • BigQuery
  • ClickHouse
  • MySQL
  • SQL Server
  • SQLite
Example PostgreSQL connection configuration:
yaml
connections:
  warehouse:
    type: postgres
    description: Production warehouse
    host: db.example.com
    port: 5432
    database: analytics
    schema: public
    # username and password in .ktx/secrets.yaml
ktx支持以下数据库:
  • PostgreSQL
  • Snowflake
  • BigQuery
  • ClickHouse
  • MySQL
  • SQL Server
  • SQLite
PostgreSQL连接配置示例:
yaml
connections:
  warehouse:
    type: postgres
    description: Production warehouse
    host: db.example.com
    port: 5432
    database: analytics
    schema: public
    # 用户名和密码存储在.ktx/secrets.yaml中

LLM Provider Configuration

LLM提供商配置

Configure Anthropic API:
yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
Configure Google Vertex AI:
yaml
llm:
  provider: vertex
  model: claude-sonnet-4-6
  project: my-gcp-project
  location: us-central1
Configure AI Gateway:
yaml
llm:
  provider: ai-gateway
  endpoint: https://gateway.example.com/v1
  model: claude-sonnet-4-6
Use Claude Code session (no API key needed):
yaml
llm:
  provider: claude-agent-sdk
API keys are stored in
.ktx/secrets.yaml
(never committed):
yaml
llm:
  api_key: ${ANTHROPIC_API_KEY}

embeddings:
  api_key: ${OPENAI_API_KEY}

connections:
  warehouse:
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
配置Anthropic API:
yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
配置Google Vertex AI:
yaml
llm:
  provider: vertex
  model: claude-sonnet-4-6
  project: my-gcp-project
  location: us-central1
配置AI网关:
yaml
llm:
  provider: ai-gateway
  endpoint: https://gateway.example.com/v1
  model: claude-sonnet-4-6
使用Claude Code会话(无需API密钥):
yaml
llm:
  provider: claude-agent-sdk
API密钥存储在
.ktx/secrets.yaml
中(绝对不要提交):
yaml
llm:
  api_key: ${ANTHROPIC_API_KEY}

embeddings:
  api_key: ${OPENAI_API_KEY}

connections:
  warehouse:
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}

Semantic Layer

语义层

Creating Semantic Sources

创建语义源

Semantic sources are YAML files in
semantic-layer/<connection-id>/
. ktx auto-generates these during ingestion but you can create custom sources.
Example metric source (
semantic-layer/warehouse/revenue_metrics.yaml
):
yaml
version: 1
name: revenue_metrics
description: Core revenue and ARR metrics
connection_id: warehouse

metrics:
  - name: monthly_recurring_revenue
    description: Total MRR from active subscriptions
    type: sum
    sql: |
      SELECT SUM(amount)
      FROM subscriptions
      WHERE status = 'active'
        AND billing_period = 'monthly'
    
  - name: annual_recurring_revenue
    description: ARR (MRR * 12)
    type: derived
    sql: |
      SELECT monthly_recurring_revenue * 12
    dependencies:
      - monthly_recurring_revenue

dimensions:
  - name: subscription_plan
    description: Subscription tier (free, pro, enterprise)
    column: subscriptions.plan_name
    
  - name: customer_segment
    description: Customer business segment
    column: customers.segment
Example entity source with join graph:
yaml
version: 1
name: customer_orders
description: Customer and order entities with relationship
connection_id: warehouse

entities:
  - name: customers
    description: Customer accounts
    table: public.customers
    primary_key: customer_id
    
  - name: orders
    description: Order transactions
    table: public.orders
    primary_key: order_id
    foreign_keys:
      - column: customer_id
        references: customers.customer_id

joins:
  - from: orders
    to: customers
    type: many_to_one
    on: orders.customer_id = customers.customer_id
语义源是
semantic-layer/<connection-id>/
目录下的YAML文件。ktx会在导入过程中自动生成这些文件,你也可以创建自定义源。
指标源示例(
semantic-layer/warehouse/revenue_metrics.yaml
):
yaml
version: 1
name: revenue_metrics
description: Core revenue and ARR metrics
connection_id: warehouse

metrics:
  - name: monthly_recurring_revenue
    description: Total MRR from active subscriptions
    type: sum
    sql: |
      SELECT SUM(amount)
      FROM subscriptions
      WHERE status = 'active'
        AND billing_period = 'monthly'
    
  - name: annual_recurring_revenue
    description: ARR (MRR * 12)
    type: derived
    sql: |
      SELECT monthly_recurring_revenue * 12
    dependencies:
      - monthly_recurring_revenue

dimensions:
  - name: subscription_plan
    description: Subscription tier (free, pro, enterprise)
    column: subscriptions.plan_name
    
  - name: customer_segment
    description: Customer business segment
    column: customers.segment
带关联图的实体源示例:
yaml
version: 1
name: customer_orders
description: Customer and order entities with relationship
connection_id: warehouse

entities:
  - name: customers
    description: Customer accounts
    table: public.customers
    primary_key: customer_id
    
  - name: orders
    description: Order transactions
    table: public.orders
    primary_key: order_id
    foreign_keys:
      - column: customer_id
        references: customers.customer_id

joins:
  - from: orders
    to: customers
    type: many_to_one
    on: orders.customer_id = customers.customer_id

Querying Semantic Layer

查询语义层

Agents can query the semantic layer via MCP tools or CLI:
bash
undefined
Agent可以通过MCP工具或CLI查询语义层:
bash
undefined

Search for revenue-related metrics

搜索与收入相关的指标

ktx sl "revenue"
ktx sl "revenue"

Get specific metric definition

获取特定指标定义

ktx sl "monthly_recurring_revenue" --exact
ktx sl "monthly_recurring_revenue" --exact

List all metrics in a source

列出某个源中的所有指标

ktx sl --source revenue_metrics
undefined
ktx sl --source revenue_metrics
undefined

Wiki Management

维基管理

Adding Wiki Content

添加维基内容

Create global wiki pages in
wiki/global/
:
bash
mkdir -p wiki/global
cat > wiki/global/refund-policy.md <<EOF
wiki/global/
目录下创建全局维基页面:
bash
mkdir -p wiki/global
cat > wiki/global/refund-policy.md <<EOF

Refund Policy

退款政策

Our refund policy allows:
  • Full refund within 30 days
  • Partial refund (50%) within 90 days
  • No refunds after 90 days
Process: Customer contacts support → Support approves → Finance processes EOF

Create user-scoped notes in `wiki/user/<user-id>/`:

```bash
mkdir -p wiki/user/alice
cat > wiki/user/alice/analysis-notes.md <<EOF
我们的退款政策允许:
  • 30天内全额退款
  • 90天内半额退款(50%)
  • 90天后不予退款
流程:客户联系支持 → 支持审批 → 财务处理 EOF

在`wiki/user/<user-id>/`目录下创建用户专属笔记:

```bash
mkdir -p wiki/user/alice
cat > wiki/user/alice/analysis-notes.md <<EOF

Q1 2026 Revenue Analysis Notes

2026年第一季度收入分析笔记

Key findings:
  • Enterprise segment grew 40% QoQ
  • Churn increased in SMB segment
  • Marketing attribution needs review EOF
undefined
关键发现:
  • 企业客户板块环比增长40%
  • SMB客户板块流失率上升
  • 营销归因需要重新评估 EOF
undefined

Searching Wiki

搜索维基

bash
ktx wiki "refund policy"
ktx wiki "customer segmentation"
ktx wiki "analysis notes" --user alice
bash
ktx wiki "退款政策"
ktx wiki "客户细分"
ktx wiki "分析笔记" --user alice

Agent Integration

Agent集成

Claude Code / Codex Setup

Claude Code / Codex设置

From your project directory in Claude Code or Codex:
text
Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.
Or manually add to MCP settings:
json
{
  "mcpServers": {
    "ktx": {
      "command": "ktx",
      "args": ["mcp", "start", "--project-dir", "/path/to/project"]
    }
  }
}
在Claude Code或Codex中,从你的项目目录执行:
text
Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.
或手动添加到MCP设置:
json
{
  "mcpServers": {
    "ktx": {
      "command": "ktx",
      "args": ["mcp", "start", "--project-dir", "/path/to/project"]
    }
  }
}

Available MCP Tools

可用的MCP工具

Once the MCP server is running, agents have access to:
  • ktx_search_semantic_layer
    - Search metrics, dimensions, entities
  • ktx_search_wiki
    - Search wiki knowledge
  • ktx_get_metric_definition
    - Get canonical metric SQL
  • ktx_get_join_path
    - Get join graph between entities
  • ktx_validate_query
    - Validate SQL against semantic layer
MCP服务器启动后,Agent可使用以下工具:
  • ktx_search_semantic_layer
    - 搜索指标、维度、实体
  • ktx_search_wiki
    - 搜索维基知识库
  • ktx_get_metric_definition
    - 获取标准指标SQL
  • ktx_get_join_path
    - 获取实体间的关联路径
  • ktx_validate_query
    - 根据语义层验证SQL

Real-World Usage Patterns

实际使用场景

Pattern 1: Agent Queries Approved Metrics

场景1:Agent查询已审批指标

Agent prompt:
text
What was our MRR last month?
Agent workflow (via MCP):
  1. ktx_search_semantic_layer("monthly recurring revenue")
  2. ktx_get_metric_definition("monthly_recurring_revenue")
  3. Execute canonical SQL against warehouse
  4. Return result
Agent提示:
text
我们上个月的MRR是多少?
Agent工作流(通过MCP):
  1. ktx_search_semantic_layer("monthly recurring revenue")
  2. ktx_get_metric_definition("monthly_recurring_revenue")
  3. 执行标准SQL查询数据仓库
  4. 返回结果

Pattern 2: Agent Joins Entities Correctly

场景2:Agent正确关联实体

Agent prompt:
text
Show me average order value by customer segment
Agent workflow:
  1. ktx_search_semantic_layer("order value")
  2. ktx_search_semantic_layer("customer segment")
  3. ktx_get_join_path("orders", "customers")
  4. Build query using approved join logic
  5. Execute and return results
Agent提示:
text
按客户细分展示平均订单价值
Agent工作流:
  1. ktx_search_semantic_layer("order value")
  2. ktx_search_semantic_layer("customer segment")
  3. ktx_get_join_path("orders", "customers")
  4. 使用已审批的关联逻辑构建查询
  5. 执行并返回结果

Pattern 3: Agent Consults Business Knowledge

场景3:Agent参考业务知识

Agent prompt:
text
How should I handle refunds in the revenue analysis?
Agent workflow:
  1. ktx_search_wiki("refund policy")
  2. Read company refund policy
  3. ktx_search_semantic_layer("revenue")
  4. Combine knowledge to provide accurate answer
Agent提示:
text
在收入分析中我应该如何处理退款?
Agent工作流:
  1. ktx_search_wiki("退款政策")
  2. 读取公司退款政策
  3. ktx_search_semantic_layer("revenue")
  4. 结合知识提供准确答案

Ingestion Sources

导入源

dbt Integration

dbt集成

Configure dbt source:
yaml
context_sources:
  dbt_main:
    type: dbt
    path: ./dbt
    connection_id: warehouse
    manifest_path: ./dbt/target/manifest.json  # Optional
ktx ingests:
  • Model definitions and descriptions
  • Metric definitions (dbt metrics)
  • Column-level documentation
  • Tests and constraints
  • Lineage information
配置dbt源:
yaml
context_sources:
  dbt_main:
    type: dbt
    path: ./dbt
    connection_id: warehouse
    manifest_path: ./dbt/target/manifest.json  # 可选
ktx会导入:
  • 模型定义与描述
  • 指标定义(dbt metrics)
  • 列级文档
  • 测试与约束
  • 血缘关系信息

LookML Integration

LookML集成

yaml
context_sources:
  looker_models:
    type: lookml
    path: ./looker
    connection_id: warehouse
ktx ingests:
  • Explores and views
  • Dimensions and measures
  • Join relationships
  • Derived tables
yaml
context_sources:
  looker_models:
    type: lookml
    path: ./looker
    connection_id: warehouse
ktx会导入:
  • Explore与视图
  • 维度与度量
  • 关联关系
  • 派生表

Metabase Integration

Metabase集成

yaml
context_sources:
  metabase_instance:
    type: metabase
    url: https://metabase.example.com
    connection_id: warehouse
    # API key in .ktx/secrets.yaml
ktx ingests:
  • Question definitions
  • Metric definitions
  • Dashboard descriptions
yaml
context_sources:
  metabase_instance:
    type: metabase
    url: https://metabase.example.com
    connection_id: warehouse
    # API密钥存储在.ktx/secrets.yaml中
ktx会导入:
  • 问题定义
  • 指标定义
  • 仪表板描述

Notion Integration

Notion集成

yaml
context_sources:
  notion_docs:
    type: notion
    database_id: abc123...
    # Token in .ktx/secrets.yaml
ktx ingests:
  • Page content and descriptions
  • Database properties
  • Relationships between pages
yaml
context_sources:
  notion_docs:
    type: notion
    database_id: abc123...
    # 令牌存储在.ktx/secrets.yaml中
ktx会导入:
  • 页面内容与描述
  • 数据库属性
  • 页面间的关系

Troubleshooting

故障排查

MCP Server Not Starting

MCP服务器无法启动

Check if MCP server is required:
bash
ktx status
If output includes
ktx mcp start --project-dir ...
, run that command before opening your agent client.
检查是否需要启动MCP服务器:
bash
ktx status
如果输出包含
ktx mcp start --project-dir ...
,请在打开Agent客户端前运行该命令。

Connection Issues

连接问题

Test database connection:
bash
ktx ingest --connection-id warehouse --dry-run
Verify credentials in
.ktx/secrets.yaml
:
yaml
connections:
  warehouse:
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
Ensure environment variables are set:
bash
export DB_USERNAME=myuser
export DB_PASSWORD=mypassword
ktx ingest --connection-id warehouse
测试数据库连接:
bash
ktx ingest --connection-id warehouse --dry-run
验证
.ktx/secrets.yaml
中的凭据:
yaml
connections:
  warehouse:
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
确保环境变量已设置:
bash
export DB_USERNAME=myuser
export DB_PASSWORD=mypassword
ktx ingest --connection-id warehouse

LLM Provider Errors

LLM提供商错误

Verify API key configuration:
bash
export ANTHROPIC_API_KEY=sk-ant-...
ktx setup
Check LLM configuration in
ktx.yaml
:
yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
Test LLM connection:
bash
ktx ingest --force  # Will test LLM during ingestion
验证API密钥配置:
bash
export ANTHROPIC_API_KEY=sk-ant-...
ktx setup
检查
ktx.yaml
中的LLM配置:
yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
测试LLM连接:
bash
ktx ingest --force  # 导入过程中会测试LLM

Context Not Building

上下文无法构建

Force rebuild all context:
bash
ktx ingest --force
Check ingestion logs in
raw-sources/<connection-id>/
:
bash
cat raw-sources/warehouse/ingestion.log
Validate source configuration:
bash
ktx status
强制重建所有上下文:
bash
ktx ingest --force
查看
raw-sources/<connection-id>/
目录下的导入日志:
bash
cat raw-sources/warehouse/ingestion.log
验证源配置:
bash
ktx status

Semantic Layer Search Returns Nothing

语义层搜索无结果

Ensure context is built:
bash
ktx ingest
Check if semantic sources exist:
bash
ls -la semantic-layer/warehouse/
Search with broader terms:
bash
ktx sl "revenue"  # Instead of "mrr_monthly_q1"
确保上下文已构建:
bash
ktx ingest
检查语义源是否存在:
bash
ls -la semantic-layer/warehouse/
使用更宽泛的关键词搜索:
bash
ktx sl "revenue"  # 替代"mrr_monthly_q1"

Advanced Usage

高级用法

Custom Semantic Source Templates

自定义语义源模板

Generate template for new source:
bash
ktx sl generate --name customer_health --connection-id warehouse
Edit generated file in
semantic-layer/warehouse/customer_health.yaml
生成新源的模板:
bash
ktx sl generate --name customer_health --connection-id warehouse
编辑
semantic-layer/warehouse/customer_health.yaml
中的生成文件

Multi-Environment Setup

多环境设置

Create separate ktx projects per environment:
text
analytics/
├── production/
│   └── ktx.yaml
├── staging/
│   └── ktx.yaml
└── development/
    └── ktx.yaml
Switch between environments:
bash
ktx ingest --project-dir ./production
ktx ingest --project-dir ./staging
为每个环境创建独立的ktx项目:
text
analytics/
├── production/
│   └── ktx.yaml
├── staging/
│   └── ktx.yaml
└── development/
    └── ktx.yaml
切换环境:
bash
ktx ingest --project-dir ./production
ktx ingest --project-dir ./staging

CI/CD Integration

CI/CD集成

Validate semantic layer in CI:
bash
#!/bin/bash
ktx ingest --dry-run --connection-id warehouse
if [ $? -ne 0 ]; then
  echo "ktx validation failed"
  exit 1
fi
在CI中验证语义层:
bash
#!/bin/bash
ktx ingest --dry-run --connection-id warehouse
if [ $? -ne 0 ]; then
  echo "ktx验证失败"
  exit 1
fi

Environment Variables

环境变量

ktx respects:
  • KTX_PROJECT_DIR
    - Default project directory
  • ANTHROPIC_API_KEY
    - Anthropic API key
  • OPENAI_API_KEY
    - OpenAI API key
  • GOOGLE_APPLICATION_CREDENTIALS
    - GCP credentials for Vertex AI
  • Database-specific vars (
    DB_USERNAME
    ,
    DB_PASSWORD
    , etc.)
ktx支持以下环境变量:
  • KTX_PROJECT_DIR
    - 默认项目目录
  • ANTHROPIC_API_KEY
    - Anthropic API密钥
  • OPENAI_API_KEY
    - OpenAI API密钥
  • GOOGLE_APPLICATION_CREDENTIALS
    - 用于Vertex AI的GCP凭据
  • 数据库特定变量(
    DB_USERNAME
    DB_PASSWORD
    等)

Development & Extension

开发与扩展

Local Development Setup

本地开发设置

Clone and build ktx:
bash
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
Link development CLI:
bash
pnpm run setup:dev
pnpm run link:dev
ktx-dev --help
Run tests:
bash
pnpm run test
uv run pytest -q
克隆并构建ktx:
bash
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
链接开发版CLI:
bash
pnpm run setup:dev
pnpm run link:dev
ktx-dev --help
运行测试:
bash
pnpm run test
uv run pytest -q

Custom Connectors

自定义连接器

ktx connector interface (TypeScript):
typescript
import { Connector } from '@kaelio/ktx/connectors';

export class CustomConnector extends Connector {
  async connect(): Promise<void> {
    // Establish connection
  }

  async introspect(): Promise<SchemaMetadata> {
    // Return table/column metadata
  }

  async sample(table: string, limit: number): Promise<Row[]> {
    // Return sample rows
  }

  async detectJoins(): Promise<JoinCandidate[]> {
    // Detect foreign key relationships
  }
}
Register in
ktx.yaml
:
yaml
connectors:
  custom:
    module: ./connectors/custom.ts
    type: custom
ktx连接器接口(TypeScript):
typescript
import { Connector } from '@kaelio/ktx/connectors';

export class CustomConnector extends Connector {
  async connect(): Promise<void> {
    // 建立连接
  }

  async introspect(): Promise<SchemaMetadata> {
    // 返回表/列元数据
  }

  async sample(table: string, limit: number): Promise<Row[]> {
    // 返回样本行
  }

  async detectJoins(): Promise<JoinCandidate[]> {
    // 检测外键关系
  }
}
ktx.yaml
中注册:
yaml
connectors:
  custom:
    module: ./connectors/custom.ts
    type: custom

Read-Only Guarantee

只读保证

ktx never writes to your data warehouse. All connections are read-only:
  • PostgreSQL: Uses role without INSERT/UPDATE/DELETE grants
  • Snowflake: Requires SELECT-only role
  • BigQuery: Needs
    roles/bigquery.dataViewer
  • Other: Similar read-only permissions
Verify connection is read-only:
bash
ktx ingest --dry-run --connection-id warehouse
ktx绝不会写入你的数据仓库。所有连接均为只读:
  • PostgreSQL:使用无INSERT/UPDATE/DELETE权限的角色
  • Snowflake:需要仅SELECT权限的角色
  • BigQuery:需要
    roles/bigquery.dataViewer
    角色
  • 其他数据库:类似的只读权限
验证连接是否为只读:
bash
ktx ingest --dry-run --connection-id warehouse

Resources

资源