ktx-data-agent-context-layer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesektx Data Agent Context Layer
ktx Data Agent 上下文层
Skill by ara.so — MCP Skills collection.
ktx is an executable context layer that teaches AI agents how to query data warehouses accurately. It automatically builds and maintains approved metric definitions, discovers joinable columns, ingests business knowledge from wikis and semantic layers, and exposes everything through CLI and MCP tools for agent execution.
由ara.so开发的Skill — MCP Skills 合集。
ktx是一个可执行的上下文层,用于教导AI Agent准确查询数据仓库。它会自动构建并维护已审批的指标定义,可发现可关联的列,从维基和语义层摄取业务知识,并通过CLI和MCP工具将所有内容暴露给Agent执行。
What ktx Does
ktx 的功能
- Learns from company knowledge: Ingests dbt, Looker, Metabase, Notion content and organizes it
- Maps the data stack: Samples tables, detects joinable columns, annotates sources
- Builds semantic layer: Combines raw tables and metrics through a join graph that resolves chasm and fan traps
- Serves agents: Exposes CLI and MCP tools with full-text and semantic search
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, SQLite.
- 学习企业知识:摄取dbt、Looker、Metabase、Notion内容并进行整理
- 映射数据栈:对表进行采样,检测可关联的列,为数据源添加注释
- 构建语义层:通过可解决裂谷陷阱和扇形陷阱的关联图,结合原始表和指标
- 为Agent提供服务:通过CLI和MCP工具提供全文检索和语义搜索功能
支持PostgreSQL、Snowflake、BigQuery、ClickHouse、MySQL、SQL Server、SQLite。
Installation
安装
bash
npm install -g @kaelio/ktxOr use locally without global install:
bash
npx @kaelio/ktx setupbash
npm install -g @kaelio/ktx或无需全局安装,直接本地使用:
bash
npx @kaelio/ktx setupCore Commands
核心命令
Initial Setup
初始设置
bash
undefinedbash
undefinedInteractive setup - creates or resumes a ktx project
交互式设置 - 创建或恢复ktx项目
ktx setup
ktx setup
Check project readiness
检查项目就绪状态
ktx status
ktx status
Manual ingestion (usually automatic during setup)
手动摄取(通常在设置过程中自动完成)
ktx ingest
ktx ingest
Start MCP server for agent clients
启动供Agent客户端使用的MCP服务器
ktx mcp start
undefinedktx mcp start
undefinedSearch and Query
搜索与查询
bash
undefinedbash
undefinedSearch semantic layer sources
搜索语义层数据源
ktx sl "revenue"
ktx sl "customer churn rate"
ktx sl "revenue"
ktx sl "customer churn rate"
Search wiki pages
搜索维基页面
ktx wiki "refund policy"
ktx wiki "data quality standards"
ktx wiki "refund policy"
ktx wiki "data quality standards"
Explain a semantic source in detail
详细解释语义数据源
ktx sl explain revenue_daily
undefinedktx sl explain revenue_daily
undefinedAdvanced Commands
高级命令
bash
undefinedbash
undefinedValidate semantic layer YAML
验证语义层YAML文件
ktx sl validate
ktx sl validate
Refresh specific connection
刷新特定连接
ktx ingest --connection warehouse
ktx ingest --connection warehouse
Export context for debugging
导出上下文用于调试
ktx export --format json --output context.json
undefinedktx export --format json --output context.json
undefinedConfiguration
配置
Project Structure
项目结构
text
my-project/
├── ktx.yaml # Main configuration
├── semantic-layer/<connection-id>/ # YAML semantic sources
├── wiki/global/ # Shared business context
├── wiki/user/<user-id>/ # User-scoped notes
├── raw-sources/<connection-id>/ # Ingest artifacts
└── .ktx/ # Local state (git-ignored)text
my-project/
├── ktx.yaml # 主配置文件
├── semantic-layer/<connection-id>/ # YAML格式的语义数据源
├── wiki/global/ # 共享业务上下文
├── wiki/user/<user-id>/ # 用户范围的笔记
├── raw-sources/<connection-id>/ # 摄取的工件
└── .ktx/ # 本地状态(已加入git忽略)ktx.yaml Example
ktx.yaml 示例
yaml
version: 1
project:
name: analytics
description: Company analytics warehouse
llm:
provider: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEY
embeddings:
provider: openai
model: text-embedding-3-small
api_key_env: OPENAI_API_KEY
databases:
warehouse:
type: postgres
host: localhost
port: 5432
database: analytics
username: readonly_user
password_env: DB_PASSWORD
schema: public
snowflake_prod:
type: snowflake
account: xy12345.us-east-1
warehouse: COMPUTE_WH
database: PROD
schema: PUBLIC
username: service_account
password_env: SNOWFLAKE_PASSWORD
context_sources:
dbt_main:
type: dbt
profiles_dir: ~/.dbt
project_dir: ./dbt
target: prod
looker_metrics:
type: lookml
project_dir: ./looker
notion_wiki:
type: notion
token_env: NOTION_TOKEN
database_id: a1b2c3d4e5f6yaml
version: 1
project:
name: analytics
description: Company analytics warehouse
llm:
provider: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEY
embeddings:
provider: openai
model: text-embedding-3-small
api_key_env: OPENAI_API_KEY
databases:
warehouse:
type: postgres
host: localhost
port: 5432
database: analytics
username: readonly_user
password_env: DB_PASSWORD
schema: public
snowflake_prod:
type: snowflake
account: xy12345.us-east-1
warehouse: COMPUTE_WH
database: PROD
schema: PUBLIC
username: service_account
password_env: SNOWFLAKE_PASSWORD
context_sources:
dbt_main:
type: dbt
profiles_dir: ~/.dbt
project_dir: ./dbt
target: prod
looker_metrics:
type: lookml
project_dir: ./looker
notion_wiki:
type: notion
token_env: NOTION_TOKEN
database_id: a1b2c3d4e5f6Semantic Layer YAML
语义层YAML
Define a Metric Source
定义指标数据源
yaml
undefinedyaml
undefinedsemantic-layer/warehouse/revenue_daily.yaml
semantic-layer/warehouse/revenue_daily.yaml
version: 1
kind: metric_source
metadata:
name: revenue_daily
display_name: Daily Revenue
description: Daily gross revenue aggregated from transactions
owners: [data-team]
tags: [finance, revenue]
source:
connection: warehouse
type: sql
sql: |
SELECT
DATE(created_at) as date,
product_id,
region_id,
SUM(amount) as revenue,
COUNT(DISTINCT order_id) as order_count
FROM transactions
WHERE status = 'completed'
GROUP BY 1, 2, 3
dimensions:
- name: date type: date primary_key: true
- name: product_id type: string
- name: region_id type: string
metrics:
- name: revenue type: sum sql: revenue format: currency
- name: order_count type: sum sql: order_count format: number
joins:
- to: products type: left on: product_id = products.id
- to: regions type: left on: region_id = regions.id
undefinedversion: 1
kind: metric_source
metadata:
name: revenue_daily
display_name: Daily Revenue
description: Daily gross revenue aggregated from transactions
owners: [data-team]
tags: [finance, revenue]
source:
connection: warehouse
type: sql
sql: |
SELECT
DATE(created_at) as date,
product_id,
region_id,
SUM(amount) as revenue,
COUNT(DISTINCT order_id) as order_count
FROM transactions
WHERE status = 'completed'
GROUP BY 1, 2, 3
dimensions:
- name: date type: date primary_key: true
- name: product_id type: string
- name: region_id type: string
metrics:
- name: revenue type: sum sql: revenue format: currency
- name: order_count type: sum sql: order_count format: number
joins:
- to: products type: left on: product_id = products.id
- to: regions type: left on: region_id = regions.id
undefinedDefine a Dimension Table
定义维度表
yaml
undefinedyaml
undefinedsemantic-layer/warehouse/products.yaml
semantic-layer/warehouse/products.yaml
version: 1
kind: dimension_source
metadata:
name: products
display_name: Products
description: Product catalog with categories
source:
connection: warehouse
type: table
table: products
dimensions:
- name: id type: string primary_key: true
- name: name type: string
- name: category type: string
- name: price type: number
undefinedversion: 1
kind: dimension_source
metadata:
name: products
display_name: Products
description: Product catalog with categories
source:
connection: warehouse
type: table
table: products
dimensions:
- name: id type: string primary_key: true
- name: name type: string
- name: category type: string
- name: price type: number
undefinedTypeScript Integration
TypeScript 集成
Using ktx as a Library
将ktx作为库使用
typescript
import { KtxProject } from '@kaelio/ktx';
// Load existing project
const project = await KtxProject.load('/path/to/project');
// Search semantic layer
const slResults = await project.semanticLayer.search('revenue', {
limit: 5,
threshold: 0.7
});
for (const result of slResults) {
console.log(`${result.metadata.display_name}: ${result.metadata.description}`);
console.log(`Score: ${result.score}`);
}
// Search wiki
const wikiResults = await project.wiki.search('customer retention', {
scope: 'global',
limit: 3
});
// Get source details
const revenueSource = await project.semanticLayer.getSource('revenue_daily');
console.log(revenueSource.metrics);typescript
import { KtxProject } from '@kaelio/ktx';
// 加载现有项目
const project = await KtxProject.load('/path/to/project');
// 搜索语义层
const slResults = await project.semanticLayer.search('revenue', {
limit: 5,
threshold: 0.7
});
for (const result of slResults) {
console.log(`${result.metadata.display_name}: ${result.metadata.description}`);
console.log(`Score: ${result.score}`);
}
// 搜索维基
const wikiResults = await project.wiki.search('customer retention', {
scope: 'global',
limit: 3
});
// 获取数据源详情
const revenueSource = await project.semanticLayer.getSource('revenue_daily');
console.log(revenueSource.metrics);Custom Connector
自定义连接器
typescript
import { DatabaseConnector, ScanResult } from '@kaelio/ktx/connectors';
class CustomConnector implements DatabaseConnector {
async connect(): Promise<void> {
// Connection logic
}
async scan(): Promise<ScanResult> {
const tables = await this.getTables();
const samples = await this.sampleData(tables);
return {
tables,
samples,
relationships: this.detectJoins(tables, samples),
metadata: this.extractMetadata(tables)
};
}
private detectJoins(tables: Table[], samples: Sample[]): Relationship[] {
// Join detection logic
return [];
}
}typescript
import { DatabaseConnector, ScanResult } from '@kaelio/ktx/connectors';
class CustomConnector implements DatabaseConnector {
async connect(): Promise<void> {
// 连接逻辑
}
async scan(): Promise<ScanResult> {
const tables = await this.getTables();
const samples = await this.sampleData(tables);
return {
tables,
samples,
relationships: this.detectJoins(tables, samples),
metadata: this.extractMetadata(tables)
};
}
private detectJoins(tables: Table[], samples: Sample[]): Relationship[] {
// 关联检测逻辑
return [];
}
}Agent Integration
Agent 集成
MCP Server Setup
MCP 服务器设置
bash
undefinedbash
undefinedStart MCP server (usually automatic with agent clients)
启动MCP服务器(通常与Agent客户端自动配合)
ktx mcp start --project-dir /path/to/project
ktx mcp start --project-dir /path/to/project
Start with specific port
使用指定端口启动
ktx mcp start --port 3000
ktx mcp start --port 3000
Check MCP server status
检查MCP服务器状态
ktx mcp status
undefinedktx mcp status
undefinedAgent Configuration
Agent 配置
For Claude Code, Codex, Cursor, or OpenCode:
bash
undefined适用于Claude Code、Codex、Cursor或OpenCode:
bash
undefinedFrom your project directory, ask your agent:
在你的项目目录中,向Agent发出指令:
"Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install and configure ktx in this project."
"Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install and configure ktx in this project."
Or configure manually in MCP settings:
```json
{
"mcpServers": {
"ktx": {
"command": "ktx",
"args": ["mcp", "start", "--project-dir", "/path/to/project"]
}
}
}
或在MCP设置中手动配置:
```json
{
"mcpServers": {
"ktx": {
"command": "ktx",
"args": ["mcp", "start", "--project-dir", "/path/to/project"]
}
}
}Common Patterns
常见模式
Daily Context Refresh
每日上下文刷新
bash
#!/bin/bashbash
#!/bin/bashrefresh-context.sh
refresh-context.sh
cd /path/to/analytics-project
ktx ingest --connection warehouse
ktx sl validate
undefinedcd /path/to/analytics-project
ktx ingest --connection warehouse
ktx sl validate
undefinedMulti-Database Setup
多数据库设置
yaml
undefinedyaml
undefinedktx.yaml with multiple warehouses
ktx.yaml 配置多个数据仓库
databases:
prod_warehouse:
type: snowflake
account: prod.us-east-1
database: PROD
password_env: PROD_DB_PASSWORD
staging_warehouse:
type: postgres
host: staging-db.company.com
database: staging
password_env: STAGING_DB_PASSWORD
context_sources:
prod_dbt:
type: dbt
target: prod
database: prod_warehouse
staging_dbt:
type: dbt
target: staging
database: staging_warehouse
undefineddatabases:
prod_warehouse:
type: snowflake
account: prod.us-east-1
database: PROD
password_env: PROD_DB_PASSWORD
staging_warehouse:
type: postgres
host: staging-db.company.com
database: staging
password_env: STAGING_DB_PASSWORD
context_sources:
prod_dbt:
type: dbt
target: prod
database: prod_warehouse
staging_dbt:
type: dbt
target: staging
database: staging_warehouse
undefinedWiki Organization
维基组织
text
wiki/
├── global/
│ ├── metrics/
│ │ ├── revenue-definitions.md
│ │ └── customer-metrics.md
│ ├── data-quality/
│ │ └── validation-rules.md
│ └── onboarding/
│ └── data-stack-overview.md
└── user/
└── alice/
└── analysis-notes.mdExample wiki page:
markdown
undefinedtext
wiki/
├── global/
│ ├── metrics/
│ │ ├── revenue-definitions.md
│ │ └── customer-metrics.md
│ ├── data-quality/
│ │ └── validation-rules.md
│ └── onboarding/
│ └── data-stack-overview.md
└── user/
└── alice/
└── analysis-notes.md维基页面示例:
markdown
undefinedRevenue Definitions
收入定义
Gross Revenue
总收入
Total transaction amount before refunds and discounts.
Source:
revenue_daily.revenue退款和折扣前的交易总额。
来源:
revenue_daily.revenueNet Revenue
净收入
Gross revenue minus refunds.
Formula:
revenue_daily.revenue - refunds_daily.amount总收入减去退款金额。
公式:
revenue_daily.revenue - refunds_daily.amountARR (Annual Recurring Revenue)
ARR(年度经常性收入)
Subscription revenue normalized to annual value.
Only applies to subscription products.
undefined标准化为年度价值的订阅收入。
仅适用于订阅产品。
undefinedQuery Pattern with ktx
使用ktx的查询模式
typescript
// Agent workflow using ktx context
// 1. Search for relevant metrics
const metrics = await project.semanticLayer.search('monthly recurring revenue');
// 2. Get the semantic source
const mrrSource = metrics[0];
// 3. Agent generates query using source metadata
const query = `
SELECT
date,
SUM(${mrrSource.metrics.find(m => m.name === 'mrr').sql}) as total_mrr
FROM ${mrrSource.source.sql}
WHERE date >= '2024-01-01'
GROUP BY date
ORDER BY date
`;
// 4. Execute against database connection
const results = await project.executeQuery('warehouse', query);typescript
// 使用ktx上下文的Agent工作流
// 1. 搜索相关指标
const metrics = await project.semanticLayer.search('monthly recurring revenue');
// 2. 获取语义数据源
const mrrSource = metrics[0];
// 3. Agent使用数据源元数据生成查询语句
const query = `
SELECT
date,
SUM(${mrrSource.metrics.find(m => m.name === 'mrr').sql}) as total_mrr
FROM ${mrrSource.source.sql}
WHERE date >= '2024-01-01'
GROUP BY date
ORDER BY date
`;
// 4. 通过数据库连接执行查询
const results = await project.executeQuery('warehouse', query);Troubleshooting
故障排除
Setup Issues
设置问题
Problem: fails with "LLM provider not configured"
ktx setupbash
undefined问题: 失败并提示 "LLM provider not configured"
ktx setupbash
undefinedSet required environment variables
设置所需的环境变量
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
Or use Claude Code session (no API key needed)
或使用Claude Code会话(无需API密钥)
ktx setup --llm-provider claude-code
**Problem**: Database connection fails
```bashktx setup --llm-provider claude-code
**问题**:数据库连接失败
```bashTest connection separately
单独测试连接
export DB_PASSWORD=your_password
ktx setup --test-connection warehouse
export DB_PASSWORD=your_password
ktx setup --test-connection warehouse
Check credentials in .ktx/secrets.env (git-ignored)
检查.ktx/secrets.env中的凭据(已加入git忽略)
cat .ktx/secrets.env
undefinedcat .ktx/secrets.env
undefinedIngestion Issues
摄取问题
Problem: times out on large database
ktx ingestyaml
undefined问题: 在大型数据库上超时
ktx ingestyaml
undefinedktx.yaml - limit scope
ktx.yaml - 限制范围
databases:
warehouse:
type: postgres
# ... connection details ...
schema: analytics # Only scan this schema
sample_size: 100 # Reduce sample size
max_tables: 50 # Limit tables scanned
**Problem**: Duplicate or conflicting metrics
```bashdatabases:
warehouse:
type: postgres
# ... 连接详情 ...
schema: analytics # 仅扫描该schema
sample_size: 100 # 减小采样大小
max_tables: 50 # 限制扫描的表数量
**问题**:存在重复或冲突的指标
```bashSearch for conflicts
搜索冲突项
ktx sl "revenue" --show-all
ktx sl "revenue" --show-all
Review validation report
查看验证报告
ktx sl validate --verbose
ktx sl validate --verbose
ktx flags contradictions in output - review and consolidate
ktx会在输出中标记矛盾内容 - 查看并合并
undefinedundefinedMCP Server Issues
MCP服务器问题
Problem: Agent can't connect to MCP server
bash
undefined问题:Agent无法连接到MCP服务器
bash
undefinedCheck if server is running
检查服务器是否运行
ktx mcp status
ktx mcp status
Start manually with logging
手动启动并启用日志
ktx mcp start --verbose --project-dir $(pwd)
ktx mcp start --verbose --project-dir $(pwd)
Verify project path in agent MCP config matches ktx project location
验证Agent的MCP配置中的项目路径是否与ktx项目位置匹配
**Problem**: "Project not ready" error
```bash
**问题**:提示 "Project not ready" 错误
```bashCheck project status
检查项目状态
ktx status
ktx status
If context not built:
如果上下文未构建:
ktx ingest
ktx ingest
If semantic layer invalid:
如果语义层无效:
ktx sl validate
undefinedktx sl validate
undefinedPerformance Optimization
性能优化
yaml
undefinedyaml
undefinedktx.yaml - optimize for large projects
ktx.yaml - 针对大型项目优化
project:
cache:
enabled: true
ttl: 3600
embeddings:
batch_size: 50 # Smaller batches for rate limits
databases:
warehouse:
connection_pool_size: 5
query_timeout: 30
undefinedproject:
cache:
enabled: true
ttl: 3600
embeddings:
batch_size: 50 # 更小的批次以避免速率限制
databases:
warehouse:
connection_pool_size: 5
query_timeout: 30
undefinedEnvironment Variables
环境变量
bash
undefinedbash
undefinedRequired for non-Claude-Code LLM
非Claude-Code LLM所需
export ANTHROPIC_API_KEY=sk-ant-...
export ANTHROPIC_API_KEY=sk-ant-...
Required for embeddings
嵌入功能所需
export OPENAI_API_KEY=sk-...
export OPENAI_API_KEY=sk-...
Database credentials (referenced in ktx.yaml)
数据库凭据(在ktx.yaml中引用)
export DB_PASSWORD=...
export SNOWFLAKE_PASSWORD=...
export DB_PASSWORD=...
export SNOWFLAKE_PASSWORD=...
Optional: Notion integration
可选:Notion集成
export NOTION_TOKEN=secret_...
export NOTION_TOKEN=secret_...
Optional: Project location override
可选:覆盖项目位置
export KTX_PROJECT_DIR=/path/to/project
export KTX_PROJECT_DIR=/path/to/project
Optional: Disable telemetry
可选:禁用遥测
export KTX_TELEMETRY_DISABLED=1
undefinedexport KTX_TELEMETRY_DISABLED=1
undefinedValidation
验证
bash
undefinedbash
undefinedValidate all semantic layer YAML
验证所有语义层YAML文件
ktx sl validate
ktx sl validate
Validate specific source
验证特定数据源
ktx sl validate revenue_daily
ktx sl validate revenue_daily
Check for broken joins
检查无效关联
ktx sl validate --check-joins
ktx sl validate --check-joins
Lint wiki markdown
检查维基markdown格式
ktx wiki validate
undefinedktx wiki validate
undefinedBest Practices
最佳实践
- Commit semantic layer and wiki to git, exclude
.ktx/ - Use environment variables for all secrets, never hardcode
- Run regularly (nightly cron) to keep context fresh
ktx ingest - Organize wiki by domain (metrics/, processes/, data-quality/)
- Tag semantic sources for discoverability
- Set owners on metrics for accountability
- Use read-only database credentials - ktx never writes to your warehouse
- Validate before committing changes to semantic layer YAML
- 将语义层和维基提交到git,排除目录
.ktx/ - 所有密钥使用环境变量,切勿硬编码
- 定期运行(每日定时任务)以保持上下文最新
ktx ingest - 按领域组织维基(metrics/、processes/、data-quality/)
- 为语义数据源添加标签以提高可发现性
- 为指标设置负责人明确责任
- 使用只读数据库凭据 - ktx绝不会写入你的数据仓库
- 提交前验证语义层YAML的更改
Example: Complete Setup Flow
示例:完整设置流程
bash
undefinedbash
undefined1. Install ktx
1. 安装ktx
npm install -g @kaelio/ktx
npm install -g @kaelio/ktx
2. Set credentials
2. 设置凭据
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export DB_PASSWORD=readonly_password
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export DB_PASSWORD=readonly_password
3. Initialize project
3. 初始化项目
cd ~/my-analytics-project
ktx setup
cd ~/my-analytics-project
ktx setup
Follow interactive prompts:
跟随交互式提示:
- Select LLM provider: Anthropic
- 选择LLM提供商:Anthropic
- Select embeddings: OpenAI
- 选择嵌入服务:OpenAI
- Add database connection: PostgreSQL
- 添加数据库连接:PostgreSQL
- Add context source: dbt
- 添加上下文源:dbt
- Build initial context: Yes
- 构建初始上下文:是
4. Verify setup
4. 验证设置
ktx status
ktx status
5. Create first semantic source
5. 创建第一个语义数据源
mkdir -p semantic-layer/warehouse
cat > semantic-layer/warehouse/users.yaml <<EOF
version: 1
kind: dimension_source
metadata:
name: users
display_name: Users
source:
connection: warehouse
type: table
table: users
dimensions:
- name: id type: string primary_key: true
- name: email type: string
- name: created_at type: timestamp EOF
mkdir -p semantic-layer/warehouse
cat > semantic-layer/warehouse/users.yaml <<EOF
version: 1
kind: dimension_source
metadata:
name: users
display_name: Users
source:
connection: warehouse
type: table
table: users
dimensions:
- name: id type: string primary_key: true
- name: email type: string
- name: created_at type: timestamp EOF
6. Validate and re-ingest
6. 验证并重新摄取
ktx sl validate
ktx ingest
ktx sl validate
ktx ingest
7. Test search
7. 测试搜索
ktx sl "users"
ktx wiki "user"
ktx sl "users"
ktx wiki "user"
8. Start MCP for agents
8. 启动供Agent使用的MCP服务器
ktx mcp start
Now your AI agent can query your warehouse using approved metrics and business context from ktx.ktx mcp start
现在你的AI Agent可以借助ktx提供的已审批指标和业务上下文查询你的数据仓库了。