bigquery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

BigQuery CLI Skill

BigQuery CLI 操作指南

You are a BigQuery specialist using the
bigquery
CLI tool. This skill provides comprehensive guidance for working with Google BigQuery through a unified Rust-based CLI with query execution, template management, and server modes.
你是使用
bigquery
CLI工具的BigQuery专家。本指南提供了通过这款基于Rust开发的统一CLI工具操作Google BigQuery的全面指导,涵盖查询执行、模板管理和服务器模式等功能。

Core Capabilities

核心功能

The
bigquery
CLI provides:
  1. Authentication: Check status and login with gcloud
  2. Query Execution: Run SQL queries with cost awareness and confirmation prompts
  3. Dry Run: Estimate query costs without execution
  4. Dataset Operations: List datasets in a project
  5. Table Operations: List, describe, insert, load, and manage external tables
  6. Template System: Named query templates with parameter substitution
  7. MCP Server: Semantic search via stdio or HTTP modes
  8. LSP Server: SQL language server for editor integration
bigquery
CLI提供以下功能:
  1. 身份验证:检查状态并通过gcloud登录
  2. 查询执行:执行SQL查询,具备成本感知和确认提示
  3. 预运行(Dry Run):无需实际执行即可估算查询成本
  4. 数据集操作:列出项目中的所有数据集
  5. 表操作:列出、描述、插入、加载和管理外部表
  6. 模板系统:支持参数替换的命名查询模板
  7. MCP服务器:通过标准输入输出或HTTP模式实现语义搜索
  8. LSP服务器:用于编辑器集成的SQL语言服务器

Authentication

身份验证

Check Authentication Status

检查身份验证状态

bash
undefined
bash
undefined

Check if authenticated and verify required scopes

检查身份验证状态并验证所需权限范围

bigquery auth check
bigquery auth check

Will show:

输出内容包括:

- Authentication status

- 身份验证状态

- Active account

- 当前活跃账户

- BigQuery scopes availability

- BigQuery权限范围是否可用

undefined
undefined

Login with gcloud

通过gcloud登录

bash
undefined
bash
undefined

Authenticate with gcloud including all required BigQuery scopes

通过gcloud进行身份验证,包含所有BigQuery所需权限范围

bigquery auth login
bigquery auth login

This will:

该命令会:

1. Run gcloud auth login

1. 执行gcloud auth login

2. Ensure all necessary BigQuery scopes are granted

2. 确保所有必要的BigQuery权限已授予

3. Verify authentication succeeded

3. 验证身份验证是否成功


**Best Practice**: Always run `bigquery auth check` first to verify authentication before operations.

**最佳实践**:在执行任何操作前,始终先运行`bigquery auth check`验证身份验证状态。

Query Operations

查询操作

Running Queries

执行查询

bash
undefined
bash
undefined

Basic query execution (interactive cost confirmation)

基础查询执行(交互式成本确认)

bigquery query "SELECT * FROM dataset.table LIMIT 10"
bigquery query "SELECT * FROM dataset.table LIMIT 10"

Skip cost confirmation for automation

自动化场景下跳过成本确认

bigquery query --yes "SELECT COUNT(*) FROM dataset.table"
bigquery query --yes "SELECT COUNT(*) FROM dataset.table"

JSON output (default)

JSON格式输出(默认)

bigquery query "SELECT * FROM dataset.table LIMIT 5"
bigquery query "SELECT * FROM dataset.table LIMIT 5"

Text/table output

文本/表格格式输出

bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"

**Cost Awareness**: The query command automatically:
1. Estimates query cost before execution
2. Displays bytes to be processed
3. Prompts for confirmation (unless `--yes` is used)
4. Prevents accidental expensive queries
bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"

**成本感知**:查询命令会自动执行以下操作:
1. 执行前估算查询成本
2. 显示将处理的数据量
3. 提示确认(除非使用`--yes`参数)
4. 防止意外执行高成本查询

Query Output Formats

查询输出格式

bash
undefined
bash
undefined

JSON output (default, machine-readable)

JSON格式输出(默认,适合机器处理)

bigquery query "SELECT * FROM dataset.table" bigquery query --format json "SELECT * FROM dataset.table"
bigquery query "SELECT * FROM dataset.table" bigquery query --format json "SELECT * FROM dataset.table"

Text output (human-readable table)

文本格式输出(适合人类阅读的表格)

bigquery query --format text "SELECT * FROM dataset.table"
undefined
bigquery query --format text "SELECT * FROM dataset.table"
undefined

Dry Run (Cost Estimation)

预运行(成本估算)

bash
undefined
bash
undefined

Estimate cost without executing

无需执行即可估算查询成本

bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"
bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"

Returns:

返回内容:

- Bytes that would be processed

- 将会处理的数据量

- Estimated cost

- 估算的成本

- No actual data

- 不会产生实际数据操作


**Use dry-run to**:
- Estimate costs before running expensive queries
- Validate query syntax
- Check partition pruning effectiveness
- Test queries in CI/CD pipelines

**预运行的使用场景**:
- 在执行高成本查询前估算费用
- 验证查询语法
- 检查分区过滤的有效性
- 在CI/CD流水线中测试查询

Dataset Operations

数据集操作

Listing Datasets

列出数据集

bash
undefined
bash
undefined

List datasets in current project (text format, default)

列出当前项目中的数据集(默认文本格式)

bigquery datasets list my-project
bigquery datasets list my-project

JSON output

JSON格式输出

bigquery datasets list my-project --format json
bigquery datasets list my-project --format json

Example output shows:

示例输出包含:

- Dataset ID

- 数据集ID

- Location

- 存储位置

- Creation time

- 创建时间

- Labels (if any)

- 标签(如有)


**Note**: Dataset reference format is `project.dataset` or just `project` to list all datasets.

**注意**:数据集引用格式为`project.dataset`,仅传入`project`则列出该项目下的所有数据集。

Table Operations

表操作

Listing Tables

列出表

bash
undefined
bash
undefined

List tables in a dataset (text format, first 10)

列出数据集中的表(文本格式,默认显示前10个)

bigquery tables list my-project.my-dataset
bigquery tables list my-project.my-dataset

JSON output

JSON格式输出

bigquery tables list my-project.my-dataset --format json
bigquery tables list my-project.my-dataset --format json

Limit results

限制结果数量

bigquery tables list my-project.my-dataset --limit 20
bigquery tables list my-project.my-dataset --limit 20

Maximum limit is 100

最大限制为100

bigquery tables list my-project.my-dataset --limit 100
undefined
bigquery tables list my-project.my-dataset --limit 100
undefined

Describing Table Schema

查看表Schema

bash
undefined
bash
undefined

Show table schema and metadata (text format)

显示表Schema和元数据(文本格式)

bigquery tables describe my-project.my-dataset.my-table
bigquery tables describe my-project.my-dataset.my-table

JSON output

JSON格式输出

bigquery tables describe my-project.my-dataset.my-table --format json
bigquery tables describe my-project.my-dataset.my-table --format json

Output includes:

输出内容包括:

- Column names and types

- 列名和类型

- Nullability (NULLABLE, REQUIRED, REPEATED)

- 可为空性(NULLABLE, REQUIRED, REPEATED)

- Mode information

- 模式信息

- Table metadata

- 表元数据

undefined
undefined

Inserting Rows (Small Datasets)

插入行(小规模数据集)

Best for <1000 rows. Uses streaming insert API for immediate availability.
适用于少于1000行的数据。使用流式插入API,数据可立即使用。

JSONL (Newline-Delimited JSON) Format

JSONL(换行分隔JSON)格式

From JSONL File:
bash
undefined
从JSONL文件插入:
bash
undefined

Create sample JSONL file

创建示例JSONL文件

cat > users.jsonl <<EOF {"id": "1", "name": "Alice Johnson", "email": "alice@example.com", "age": 30} {"id": "2", "name": "Bob Smith", "email": "bob@example.com", "age": 25} {"id": "3", "name": "Charlie Brown", "email": "charlie@example.com", "age": 35} EOF
cat > users.jsonl <<EOF {"id": "1", "name": "Alice Johnson", "email": "alice@example.com", "age": 30} {"id": "2", "name": "Bob Smith", "email": "bob@example.com", "age": 25} {"id": "3", "name": "Charlie Brown", "email": "charlie@example.com", "age": 35} EOF

Insert from JSONL file

从JSONL文件插入数据

bigquery tables insert my-project.dataset.users
--data users.jsonl --format json

**From JSONL Stream (stdin):**

```bash
bigquery tables insert my-project.dataset.users
--data users.jsonl --format json

**从JSONL流(标准输入)插入:**

```bash

Stream from command output

从命令输出流式插入

echo '{"id": "1", "name": "Alice", "email": "alice@example.com"}' |
bigquery tables insert my-project.dataset.users --data - --format json
echo '{"id": "1", "name": "Alice", "email": "alice@example.com"}' |
bigquery tables insert my-project.dataset.users --data - --format json

Stream from multiple sources (heredoc)

从多源流式插入(here文档)

cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json {"id": "1", "name": "Alice", "email": "alice@example.com", "age": 30} {"id": "2", "name": "Bob", "email": "bob@example.com", "age": 25} {"id": "3", "name": "Charlie", "email": "charlie@example.com", "age": 35} EOF
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json {"id": "1", "name": "Alice", "email": "alice@example.com", "age": 30} {"id": "2", "name": "Bob", "email": "bob@example.com", "age": 25} {"id": "3", "name": "Charlie", "email": "charlie@example.com", "age": 35} EOF

Stream from application output

从应用程序输出流式插入

my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data -
my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data -

Stream from compressed file

从压缩文件流式插入

gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data -
gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data -

Stream from jq transformation

从jq转换结果流式插入

cat raw_data.json | jq -c '.records[]' |
bigquery tables insert my-project.dataset.processed --data -

**JSONL Format Requirements:**
- Each line is a separate JSON object
- Empty lines are automatically skipped
- No commas between objects
- Ideal for streaming and large datasets
- Format: `{"field1":"value1","field2":"value2"}\n`
cat raw_data.json | jq -c '.records[]' |
bigquery tables insert my-project.dataset.processed --data -

**JSONL格式要求:**
- 每行是一个独立的JSON对象
- 自动跳过空行
- 对象之间不需要逗号分隔
- 适合流式处理和大规模数据集
- 格式示例:`{"field1":"value1","field2":"value2"}\n`

CSV Format

CSV格式

From CSV File:
bash
undefined
从CSV文件插入:
bash
undefined

Create sample CSV file

创建示例CSV文件

cat > users.csv <<EOF id,name,email,age 1,Alice Johnson,alice@example.com,30 2,Bob Smith,bob@example.com,25 3,"Charlie Brown, Jr.",charlie@example.com,35 EOF
cat > users.csv <<EOF id,name,email,age 1,Alice Johnson,alice@example.com,30 2,Bob Smith,bob@example.com,25 3,"Charlie Brown, Jr.",charlie@example.com,35 EOF

Insert from CSV file

从CSV文件插入数据

bigquery tables insert my-project.dataset.users
--data users.csv --format csv

**From CSV Stream (stdin):**

```bash
bigquery tables insert my-project.dataset.users
--data users.csv --format csv

**从CSV流(标准输入)插入:**

```bash

Stream from heredoc

从here文档流式插入

cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv id,name,email,age 1,Alice Johnson,alice@example.com,30 2,Bob Smith,bob@example.com,25 3,Charlie Brown,charlie@example.com,35 EOF
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv id,name,email,age 1,Alice Johnson,alice@example.com,30 2,Bob Smith,bob@example.com,25 3,Charlie Brown,charlie@example.com,35 EOF

Stream from application output

从应用程序输出流式插入

./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv
./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv

Stream from compressed CSV

从压缩CSV文件流式插入

gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data -
gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data -

Stream from curl/API response

从curl/API响应流式插入

curl -s https://api.example.com/export.csv |
bigquery tables insert my-project.dataset.api_data --data - --format csv
curl -s https://api.example.com/export.csv |
bigquery tables insert my-project.dataset.api_data --data - --format csv

Transform and stream CSV

转换后流式插入CSV

cat raw.csv | tail -n +2 | awk '{print tolower($0)}' |
bigquery tables insert my-project.dataset.cleaned --data - --format csv

**CSV Format Requirements:**
- First row must contain column headers matching BigQuery table schema
- Values are inserted as strings (BigQuery will coerce types)
- Supports quoted fields, escaped quotes, and newlines (RFC 4180 compliant)
- Headers are case-sensitive and must match table column names
cat raw.csv | tail -n +2 | awk '{print tolower($0)}' |
bigquery tables insert my-project.dataset.cleaned --data - --format csv

**CSV格式要求:**
- 第一行必须包含与BigQuery表Schema匹配的列名
- 值会以字符串形式插入(BigQuery会自动转换类型)
- 支持带引号的字段、转义引号和换行符(符合RFC 4180标准)
- 列名区分大小写,必须与表列名完全匹配

Additional Insert Options

额外插入选项

bash
undefined
bash
undefined

Insert inline JSON (single object)

插入内联JSON(单个对象)

bigquery tables insert my-project.dataset.users
--json '{"id": "1", "name": "Alice", "email": "alice@example.com"}'
bigquery tables insert my-project.dataset.users
--json '{"id": "1", "name": "Alice", "email": "alice@example.com"}'

Insert inline JSON array

插入内联JSON数组

bigquery tables insert my-project.dataset.users
--json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'
bigquery tables insert my-project.dataset.users
--json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'

Dry-run validation (no data inserted)

预运行验证(不会插入数据)

bigquery tables insert my-project.dataset.users
--data users.csv --format csv --dry-run
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --dry-run

Skip invalid rows instead of failing

跳过无效行而非直接失败

bigquery tables insert my-project.dataset.users
--data users.csv --format csv --skip-invalid
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --skip-invalid

Ignore unknown fields in data

忽略数据中的未知字段

bigquery tables insert my-project.dataset.users
--data users.csv --format csv --ignore-unknown
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --ignore-unknown

Combine options for production pipelines

生产流水线中组合使用多个选项

cat production_data.jsonl |
bigquery tables insert my-project.dataset.production
--data - --format json
--skip-invalid
--ignore-unknown

**Insert Options:**
- `--json <JSON>`: Inline JSON data (object or array)
- `--data <PATH>`: Path to data file, or `-` for stdin
- `--format <FORMAT>`: Data format (json or csv, default: json)
- `--dry-run`: Validate without inserting
- `--skip-invalid`: Skip invalid rows instead of failing
- `--ignore-unknown`: Ignore unknown fields in data
- `--yes`: Skip confirmation prompts
cat production_data.jsonl |
bigquery tables insert my-project.dataset.production
--data - --format json
--skip-invalid
--ignore-unknown

**插入选项说明:**
- `--json <JSON>`:内联JSON数据(对象或数组)
- `--data <PATH>`:数据文件路径,或使用`-`表示标准输入
- `--format <FORMAT>`:数据格式(json或csv,默认json)
- `--dry-run`:仅验证不插入数据
- `--skip-invalid`:跳过无效行而非失败
- `--ignore-unknown`:忽略数据中的未知字段
- `--yes`:跳过确认提示

Loading Data (Large Datasets)

加载数据(大规模数据集)

Best for >10MB files or >1000 rows. Uses BigQuery load jobs.
⚠️ IMPORTANT: Local file loading requires GCS staging bucket configuration.
  • If you get "The specified bucket does not exist" error, use
    tables insert
    for datasets <1000 rows instead
  • For larger datasets, upload to GCS first, then use
    bigquery tables load gs://...
bash
undefined
适用于大于10MB的文件或多于1000行的数据。使用BigQuery加载作业。
⚠️ 重要提示:本地文件加载需要配置GCS暂存存储桶。
  • 如果出现"The specified bucket does not exist"错误,对于少于1000行的数据集请使用
    tables insert
    命令
  • 对于更大的数据集,请先上传至GCS,再使用
    bigquery tables load gs://...
    命令
bash
undefined

Load from Cloud Storage URI (RECOMMENDED - no bucket config needed)

从Cloud Storage URI加载数据(推荐方式,无需配置存储桶)

bigquery tables load my-project.dataset.users
gs://my-bucket/data.csv --format csv
bigquery tables load my-project.dataset.users
gs://my-bucket/data.csv --format csv

Load from local CSV file (requires GCS staging bucket configured)

从本地CSV文件加载数据(需要配置GCS暂存存储桶)

bigquery tables load my-project.dataset.users data.csv --format csv
bigquery tables load my-project.dataset.users data.csv --format csv

Load with schema auto-detection

自动检测Schema加载数据

bigquery tables load my-project.dataset.new_table data.csv
--format csv --autodetect
bigquery tables load my-project.dataset.new_table data.csv
--format csv --autodetect

Load with replace write disposition (truncates table first)

使用替换写入模式(先清空表)

bigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace
bigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace

Load JSON file

加载JSON文件

bigquery tables load my-project.dataset.events events.json
--format json
bigquery tables load my-project.dataset.events events.json
--format json

Supported formats: csv, json, avro, parquet, orc

支持的格式:csv、json、avro、parquet、orc

bigquery tables load my-project.dataset.table data.parquet
--format parquet
bigquery tables load my-project.dataset.table data.parquet
--format parquet

Dry-run validation (no data loaded)

预运行验证配置(不会加载数据)

bigquery tables load my-project.dataset.users data.csv
--format csv --dry-run
bigquery tables load my-project.dataset.users data.csv
--format csv --dry-run

Allow some bad records (skip up to 100 invalid rows)

允许存在部分错误记录(最多跳过100条无效行)

bigquery tables load my-project.dataset.users data.csv
--format csv --max-bad-records 100
bigquery tables load my-project.dataset.users data.csv
--format csv --max-bad-records 100

Ignore unknown fields

忽略未知字段

bigquery tables load my-project.dataset.users data.csv
--format csv --ignore-unknown
bigquery tables load my-project.dataset.users data.csv
--format csv --ignore-unknown

Skip confirmation prompts (for automation/CI)

跳过确认提示(适用于自动化/CI场景)

bigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace --yes

**Load Job Features:**
- **GCS Staging Bucket Required:** Local file loading needs GCS bucket configuration
- Real-time progress tracking with exponential backoff
- Automatic cleanup of temporary files after completion
- Write modes: `append` (default) or `replace` (truncate first)
- Safety confirmations for destructive operations
- Configurable error tolerance with `--max-bad-records`

**When to Use:**
- Large datasets (>1000 rows or >10MB)
- Data already in Cloud Storage
- Bulk data migrations

**When NOT to Use:**
- Small datasets (<1000 rows) → Use `tables insert` instead (no GCS required)
- Don't have GCS staging bucket configured → Use `tables insert` or upload to GCS first

**Load Options:**
- `--format <FORMAT>`: csv, json, avro, parquet, orc (default: csv)
- `--write-disposition <DISPOSITION>`: append or replace (default: append)
- `--autodetect`: Auto-detect schema from source files
- `--dry-run`: Validate without loading
- `--max-bad-records <N>`: Maximum bad records before failing
- `--ignore-unknown`: Ignore unknown fields
- `--yes`: Skip confirmation prompts
bigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace --yes

**加载作业特性:**
- **需要GCS暂存存储桶**:本地文件加载需要配置GCS存储桶
- 实时进度跟踪,支持指数退避重试
- 完成后自动清理临时文件
- 写入模式:`append`(默认追加)或`replace`(先清空再写入)
- 对破坏性操作提供安全确认
- 可通过`--max-bad-records`配置错误容忍度

**适用场景:**
- 大规模数据集(超过1000行或10MB)
- 数据已存储在Cloud Storage中
- 批量数据迁移

**不适用场景:**
- 小规模数据集(少于1000行)→ 改用`tables insert`(无需GCS)
- 未配置GCS暂存存储桶 → 改用`tables insert`或先上传至GCS

**加载选项说明:**
- `--format <FORMAT>`:数据格式(csv、json、avro、parquet、orc,默认csv)
- `--write-disposition <DISPOSITION>`:写入模式(append或replace,默认append)
- `--autodetect`:从源文件自动检测Schema
- `--dry-run`:仅验证配置不加载数据
- `--max-bad-records <N>`:允许的最大错误记录数
- `--ignore-unknown`:忽略未知字段
- `--yes`:跳过确认提示

Extracting Data

提取数据

Export table data to Cloud Storage in various formats:
bash
undefined
将表数据导出到Cloud Storage,支持多种格式:
bash
undefined

Extract table to Cloud Storage as CSV

将表数据提取为CSV格式存储到Cloud Storage

bigquery tables extract my-project.dataset.users
gs://my-bucket/exports/users.csv --format csv
bigquery tables extract my-project.dataset.users
gs://my-bucket/exports/users.csv --format csv

Extract as JSON

提取为JSON格式

bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.json --format json
bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.json --format json

Extract with compression

提取并压缩

bigquery tables extract my-project.dataset.large_table
gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip
bigquery tables extract my-project.dataset.large_table
gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip

Extract as Avro with Snappy compression

提取为Avro格式并使用Snappy压缩

bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.avro --format avro --compression snappy
bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.avro --format avro --compression snappy

Extract as Parquet

提取为Parquet格式

bigquery tables extract my-project.dataset.analytics
gs://my-bucket/exports/analytics.parquet --format parquet
bigquery tables extract my-project.dataset.analytics
gs://my-bucket/exports/analytics.parquet --format parquet

CSV with custom delimiter and header

使用自定义分隔符和表头的CSV格式提取

bigquery tables extract my-project.dataset.data
gs://my-bucket/data.csv
--format csv
--field-delimiter "|"
--print-header
bigquery tables extract my-project.dataset.data
gs://my-bucket/data.csv
--format csv
--field-delimiter "|"
--print-header

Dry-run to validate configuration

预运行验证配置

bigquery tables extract my-project.dataset.users
gs://my-bucket/users.csv --format csv --dry-run
bigquery tables extract my-project.dataset.users
gs://my-bucket/users.csv --format csv --dry-run

Skip confirmation prompt

跳过确认提示

bigquery tables extract my-project.dataset.large
gs://my-bucket/export.csv --format csv --yes

**Supported Formats:** CSV, JSON (newline-delimited), Avro, Parquet
**Compression:** none, gzip, snappy (Avro/Parquet only)
bigquery tables extract my-project.dataset.large
gs://my-bucket/export.csv --format csv --yes

**支持的格式:** CSV、JSON(换行分隔)、Avro、Parquet
**压缩方式:** none、gzip、snappy(仅Avro/Parquet支持)

External Tables

外部表

External tables reference data in Cloud Storage without copying it to BigQuery.
外部表引用Cloud Storage中的数据,无需将数据复制到BigQuery。

Creating External Tables

创建外部表

bash
undefined
bash
undefined

Create CSV external table

创建CSV格式外部表

bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"

Create with auto-detected schema

自动检测Schema创建外部表

bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--autodetect
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--autodetect

Multiple source URIs (comma-separated)

多个源URI(逗号分隔)

bigquery tables create-external my-project.dataset.external_table
--source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv"
--format csv
--autodetect
bigquery tables create-external my-project.dataset.external_table
--source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv"
--format csv
--autodetect

Multiple source URIs (multiple flags)

多个源URI(多次使用参数)

bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/file1.csv
--source-uri gs://bucket/file2.csv
--format csv
--autodetect
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/file1.csv
--source-uri gs://bucket/file2.csv
--format csv
--autodetect

CSV-specific options

CSV特定选项

bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING"
--field-delimiter ","
--skip-leading-rows 1
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING"
--field-delimiter ","
--skip-leading-rows 1

Other formats (Parquet, JSON, Avro, ORC)

其他格式(Parquet、JSON、Avro、ORC)

bigquery tables create-external my-project.dataset.parquet_table
--source-uri gs://bucket/data.parquet
--format parquet
--autodetect
bigquery tables create-external my-project.dataset.json_table
--source-uri gs://bucket/data.jsonl
--format json
--autodetect

**External Table Options**:
- `--source-uri <URI>`: Cloud Storage URI(s) - required
- `--format <FORMAT>`: csv, json, avro, parquet, orc - required
- `--schema <SCHEMA>`: Schema definition (column:type,column:type,...)
- `--autodetect`: Auto-detect schema from source files
- `--field-delimiter <DELIMITER>`: CSV field delimiter (default: ,)
- `--skip-leading-rows <N>`: CSV header rows to skip
bigquery tables create-external my-project.dataset.parquet_table
--source-uri gs://bucket/data.parquet
--format parquet
--autodetect
bigquery tables create-external my-project.dataset.json_table
--source-uri gs://bucket/data.jsonl
--format json
--autodetect

**外部表选项说明**:
- `--source-uri <URI>`:Cloud Storage URI(必填)
- `--format <FORMAT>`:数据格式(csv、json、avro、parquet、orc,必填)
- `--schema <SCHEMA>`:Schema定义(格式:column:type,column:type,...)
- `--autodetect`:从源文件自动检测Schema
- `--field-delimiter <DELIMITER>`:CSV字段分隔符(默认逗号)
- `--skip-leading-rows <N>`:需要跳过的CSV表头行数

Updating External Tables

更新外部表

bash
undefined
bash
undefined

Update source URIs

更新源URI

bigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv
bigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv

Update schema

更新Schema

bigquery tables update-external my-project.dataset.external_table
--schema "id:INTEGER,name:STRING,email:STRING"
bigquery tables update-external my-project.dataset.external_table
--schema "id:INTEGER,name:STRING,email:STRING"

Update CSV options

更新CSV选项

bigquery tables update-external my-project.dataset.external_table
--field-delimiter "|"
--skip-leading-rows 2
bigquery tables update-external my-project.dataset.external_table
--field-delimiter "|"
--skip-leading-rows 2

Update multiple properties

同时更新多个属性

bigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv
--schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP"
--skip-leading-rows 1
undefined
bigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv
--schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP"
--skip-leading-rows 1
undefined

Template System

模板系统

Named query templates allow you to save frequently-used queries with parameter placeholders.
命名查询模板允许你保存常用查询,并支持参数占位符。

Listing Templates

列出模板

bash
undefined
bash
undefined

List all available templates (text format)

列出所有可用模板(文本格式)

bigquery templates list
bigquery templates list

JSON output

JSON格式输出

bigquery templates list --format json
bigquery templates list --format json

Shows:

输出内容包括:

- Template name

- 模板名称

- Description

- 描述

- Parameters

- 参数

- Query preview

- 查询预览

undefined
undefined

Searching Templates

搜索模板

bash
undefined
bash
undefined

Search by name or description

按名称或描述搜索模板

bigquery templates search "customer" bigquery templates search "daily metrics"
bigquery templates search "customer" bigquery templates search "daily metrics"

JSON output

JSON格式输出

bigquery templates search "analytics" --format json
undefined
bigquery templates search "analytics" --format json
undefined

Validating Templates

验证模板

bash
undefined
bash
undefined

Validate template for parameter consistency

验证模板参数一致性

bigquery templates validate my-template
bigquery templates validate my-template

Checks:

检查内容:

- Parameter definitions match query placeholders

- 参数定义与查询占位符是否匹配

- Required parameters are defined

- 必填参数是否已定义

- Parameter types are valid

- 参数类型是否有效

undefined
undefined

Running Templates

运行模板

bash
undefined
bash
undefined

Run template with default parameters

使用默认参数运行模板

bigquery templates run my-template
bigquery templates run my-template

Override parameters

覆盖模板参数

bigquery templates run daily-report
--param date=2025-01-15
--param region=US
bigquery templates run daily-report
--param date=2025-01-15
--param region=US

Multiple parameters

传入多个参数

bigquery templates run customer-analysis
--param customer_id=CUST123
--param start_date=2025-01-01
--param end_date=2025-01-31
bigquery templates run customer-analysis
--param customer_id=CUST123
--param start_date=2025-01-01
--param end_date=2025-01-31

JSON output

JSON格式输出

bigquery templates run my-template --format json
bigquery templates run my-template --format json

Skip cost confirmation

跳过成本确认

bigquery templates run expensive-query --yes

**Template Run Options**:
- `--param <KEY=VALUE>`: Parameter override (can be used multiple times)
- `--format <FORMAT>`: Output format (json or text, default: json)
- `--yes`: Skip cost confirmation prompt
bigquery templates run expensive-query --yes

**模板运行选项说明**:
- `--param <KEY=VALUE>`:覆盖模板参数(可多次使用)
- `--format <FORMAT>`:输出格式(json或text,默认json)
- `--yes`:跳过成本确认提示

Template Workflow Example

模板工作流示例

bash
undefined
bash
undefined

1. Search for templates

1. 搜索相关模板

bigquery templates search "revenue"
bigquery templates search "revenue"

2. Validate template before running

2. 运行前验证模板

bigquery templates validate monthly-revenue
bigquery templates validate monthly-revenue

3. Run with parameters

3. 传入参数运行模板

bigquery templates run monthly-revenue
--param month=2025-01
--param min_amount=1000
bigquery templates run monthly-revenue
--param month=2025-01
--param min_amount=1000

4. Run in automation (skip confirmation)

4. 自动化运行(跳过确认)

bigquery templates run monthly-revenue
--param month=2025-01
--yes
--format json > output.json

**Use templates for**:
- Standardized reporting queries
- Common analytics patterns
- Scheduled data pipelines
- Team query sharing
- Reducing query errors
bigquery templates run monthly-revenue
--param month=2025-01
--yes
--format json > output.json

**模板的适用场景**:
- 标准化报表查询
- 常见分析模式
- 定时数据流水线
- 团队查询共享
- 减少查询错误

MCP Server Integration

MCP服务器集成

The BigQuery MCP server provides semantic search and natural language query capabilities via Model Context Protocol.
BigQuery MCP服务器通过Model Context Protocol提供语义搜索和自然语言查询功能。

Starting MCP Server

启动MCP服务器

STDIO Mode (for local clients):
bash
undefined
标准输入输出模式(适用于本地客户端):
bash
undefined

Start MCP server in stdio mode

以标准输入输出模式启动MCP服务器

bigquery mcp stdio
bigquery mcp stdio

Server will:

服务器将:

- Accept MCP protocol messages on stdin

- 在标准输入接收MCP协议消息

- Send responses on stdout

- 在标准输出发送响应

- Expose BigQuery tools to MCP clients

- 向MCP客户端暴露BigQuery工具


**HTTP Mode** (for network clients):

```bash

**HTTP模式**(适用于网络客户端):

```bash

Start HTTP MCP server on default port 8080

在默认端口8080启动HTTP MCP服务器

bigquery mcp http
bigquery mcp http

Specify custom port

指定自定义端口

bigquery mcp http --port 3000
bigquery mcp http --port 3000

Server provides:

服务器提供:

- HTTP endpoint for MCP protocol

- MCP协议的HTTP端点

- JSON-RPC over HTTP

- 基于HTTP的JSON-RPC

- Remote access to BigQuery tools

- BigQuery工具的远程访问

undefined
undefined

MCP Server Capabilities

MCP服务器功能

The MCP server exposes these tools through the Model Context Protocol:
  1. semantic_search: Search tables using natural language
  2. execute_query: Run SQL queries with automatic formatting
  3. get_schema: Retrieve table schemas
  4. list_tables: List available tables
  5. list_datasets: List available datasets
  6. explain_query: Get query execution plan
  7. optimize_query: Suggest query optimizations
  8. run_template: Execute named templates with parameters
MCP服务器通过Model Context Protocol暴露以下工具:
  1. semantic_search:使用自然语言搜索表
  2. execute_query:运行SQL查询并自动格式化
  3. get_schema:获取表Schema
  4. list_tables:列出可用表
  5. list_datasets:列出可用数据集
  6. explain_query:获取查询执行计划
  7. optimize_query:提供查询优化建议
  8. run_template:传入参数执行命名模板

MCP Configuration

MCP配置

Configure in Claude Code or other MCP-enabled applications:
STDIO Mode (
.claude/mcp.json
or similar):
json
{
  "mcpServers": {
    "bigquery": {
      "command": "bigquery",
      "args": ["mcp", "stdio"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "my-project"
      }
    }
  }
}
HTTP Mode:
json
{
  "mcpServers": {
    "bigquery": {
      "url": "http://localhost:8080",
      "transport": "http"
    }
  }
}
在Claude Code或其他支持MCP的应用中配置:
标准输入输出模式
.claude/mcp.json
或类似配置文件):
json
{
  "mcpServers": {
    "bigquery": {
      "command": "bigquery",
      "args": ["mcp", "stdio"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "my-project"
      }
    }
  }
}
HTTP模式:
json
{
  "mcpServers": {
    "bigquery": {
      "url": "http://localhost:8080",
      "transport": "http"
    }
  }
}

MCP Usage Patterns

MCP使用模式

When using BigQuery MCP through clients:
Semantic Search:
"Find all tables containing customer purchase data from the last 30 days"
→ MCP translates to appropriate SQL query
Schema Discovery:
"What columns are in the analytics.events table?"
→ MCP returns schema information
Natural Language Queries:
"Show me total revenue by region for Q1 2025"
→ MCP generates and executes SQL
Template Execution:
"Run the monthly revenue template for January 2025"
→ MCP executes template with parameters
通过客户端使用BigQuery MCP时的常见场景:
语义搜索:
"查找所有包含过去30天客户购买数据的表"
→ MCP自动转换为对应的SQL查询
Schema发现:
"analytics.events表有哪些列?"
→ MCP返回表Schema信息
自然语言查询:
"显示2025年第一季度各地区的总收入"
→ MCP生成并执行SQL查询
模板执行:
"运行2025年1月的月度收入模板"
→ MCP传入参数执行模板

LSP Integration

LSP集成

The BigQuery LSP provides SQL language features in text editors.
BigQuery LSP为文本编辑器提供SQL语言特性支持。

Starting LSP Server

启动LSP服务器

bash
undefined
bash
undefined

Start LSP server

启动LSP服务器

bigquery lsp
bigquery lsp

Server provides:

服务器提供:

- Language Server Protocol communication

- Language Server Protocol通信

- SQL syntax validation

- SQL语法验证

- Schema-aware completions

- 基于Schema的自动补全

- Query formatting

- 查询格式化

- Hover documentation

- 悬停文档提示

undefined
undefined

LSP Features

LSP功能

  • SQL syntax highlighting: Proper tokenization and highlighting
  • Schema completion: Table and column suggestions based on project schema
  • Query validation: Real-time syntax and semantic checks
  • Hover documentation: Table and column info on hover
  • Go to definition: Navigate to table definitions
  • Query formatting: Auto-format SQL queries
  • Diagnostics: Show errors and warnings inline
  • SQL语法高亮:正确的标记化和高亮显示
  • Schema补全:基于项目Schema提供表和列的建议
  • 查询验证:实时语法和语义检查
  • 悬停文档:悬停时显示表和列的信息
  • 跳转到定义:导航到表定义
  • 查询格式化:自动格式化SQL查询
  • 诊断信息:在编辑器内联显示错误和警告

Editor Configuration

编辑器配置

Neovim:
lua
-- In nvim/lua/bigquery-lsp.lua or init.lua
vim.api.nvim_create_autocmd("FileType", {
  pattern = { "sql", "bq", "bigquery" },
  callback = function()
    vim.lsp.start({
      name = "bigquery-lsp",
      cmd = { "bigquery", "lsp" },
      root_dir = vim.fn.getcwd(),
    })
  end,
})
VS Code (in
settings.json
or language server config):
json
{
  "bigquery-lsp": {
    "command": "bigquery",
    "args": ["lsp"],
    "filetypes": ["sql", "bq", "bigquery"]
  }
}
Helix (in
languages.toml
):
toml
[[language]]
name = "sql"
language-servers = ["bigquery-lsp"]

[language-server.bigquery-lsp]
command = "bigquery"
args = ["lsp"]
Neovim:
lua
-- 在nvim/lua/bigquery-lsp.lua或init.lua中配置
vim.api.nvim_create_autocmd("FileType", {
  pattern = { "sql", "bq", "bigquery" },
  callback = function()
    vim.lsp.start({
      name = "bigquery-lsp",
      cmd = { "bigquery", "lsp" },
      root_dir = vim.fn.getcwd(),
    })
  end,
})
VS Code(在
settings.json
或语言服务器配置中):
json
{
  "bigquery-lsp": {
    "command": "bigquery",
    "args": ["lsp"],
    "filetypes": ["sql", "bq", "bigquery"]
  }
}
Helix(在
languages.toml
中):
toml
[[language]]
name = "sql"
language-servers = ["bigquery-lsp"]

[language-server.bigquery-lsp]
command = "bigquery"
args = ["lsp"]

Common Workflows

常见工作流

Workflow 1: Exploratory Data Analysis

工作流1:探索性数据分析

bash
undefined
bash
undefined

1. Verify authentication

1. 验证身份验证状态

bigquery auth check
bigquery auth check

2. List available datasets

2. 列出可用数据集

bigquery datasets list my-project
bigquery datasets list my-project

3. List tables in dataset

3. 列出数据集中的表

bigquery tables list my-project.analytics
bigquery tables list my-project.analytics

4. Check table schema

4. 查看表Schema

bigquery tables describe my-project.analytics.events
bigquery tables describe my-project.analytics.events

5. Preview data (text format for readability)

5. 预览数据(文本格式更易读)

bigquery query --format text
"SELECT * FROM my-project.analytics.events LIMIT 10"
bigquery query --format text
"SELECT * FROM my-project.analytics.events LIMIT 10"

6. Get row count

6. 获取行数

bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"
bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"

7. Check data distribution

7. 查看数据分布

bigquery query --format text " SELECT DATE(timestamp) as date, COUNT(*) as events FROM my-project.analytics.events GROUP BY date ORDER BY date DESC LIMIT 30 "
undefined
bigquery query --format text " SELECT DATE(timestamp) as date, COUNT(*) as events FROM my-project.analytics.events GROUP BY date ORDER BY date DESC LIMIT 30 "
undefined

Workflow 2: Cost-Aware Query Development

工作流2:成本感知的查询开发

bash
undefined
bash
undefined

1. Dry run to estimate cost

1. 预运行估算成本

bigquery dry-run " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' "
bigquery dry-run " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' "

2. If cost is acceptable, run query

2. 如果成本可接受,执行查询

bigquery query " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' "
bigquery query " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' "

3. For automation, skip confirmation

3. 自动化场景下跳过确认

bigquery query --yes " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' " > results.json
undefined
bigquery query --yes " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' " > results.json
undefined

Workflow 3: Template-Based Reporting

工作流3:基于模板的报表

bash
undefined
bash
undefined

1. Search for relevant templates

1. 搜索相关模板

bigquery templates search "daily"
bigquery templates search "daily"

2. Validate template

2. 验证模板

bigquery templates validate daily-metrics
bigquery templates validate daily-metrics

3. Run template with parameters

3. 传入参数运行模板

bigquery templates run daily-metrics
--param date=$(date +%Y-%m-%d)
--param region=US
--format json > daily-report.json
bigquery templates run daily-metrics
--param date=$(date +%Y-%m-%d)
--param region=US
--format json > daily-report.json

4. Schedule in cron or CI/CD

4. 在cron或CI/CD中定时执行

0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes

0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes

undefined
undefined

Workflow 4: External Data Analysis

工作流4:外部数据分析

bash
undefined
bash
undefined

1. Create external table pointing to GCS

1. 创建指向GCS的外部表

bigquery tables create-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-01-*.json
--format json
--autodetect
bigquery tables create-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-01-*.json
--format json
--autodetect

2. Query external table

2. 查询外部表

bigquery query " SELECT timestamp, user_id, action FROM my-project.staging.raw_logs WHERE action = 'purchase' LIMIT 100 "
bigquery query " SELECT timestamp, user_id, action FROM my-project.staging.raw_logs WHERE action = 'purchase' LIMIT 100 "

3. Update external table when new files arrive

3. 新文件到达时更新外部表

bigquery tables update-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-02-*.json
undefined
bigquery tables update-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-02-*.json
undefined

Workflow 5: Data Loading Pipeline

工作流5:数据加载流水线

bash
undefined
bash
undefined

1. Load initial data

1. 加载初始数据

bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-01.csv
--format csv
--write-disposition replace
bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-01.csv
--format csv
--write-disposition replace

2. Append incremental data

2. 追加增量数据

bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-02.csv
--format csv
--write-disposition append
bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-02.csv
--format csv
--write-disposition append

3. Verify data loaded

3. 验证数据加载完成

bigquery query " SELECT DATE(timestamp) as date, COUNT(*) as count FROM my-project.dataset.events GROUP BY date ORDER BY date "
undefined
bigquery query " SELECT DATE(timestamp) as date, COUNT(*) as count FROM my-project.dataset.events GROUP BY date ORDER BY date "
undefined

Workflow 6: Real-Time Data Insertion

工作流6:实时数据插入

bash
undefined
bash
undefined

1. Insert single event (inline JSON)

1. 插入单个事件(内联JSON)

bigquery tables insert my-project.dataset.events
--json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'
bigquery tables insert my-project.dataset.events
--json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'

2. Stream JSONL from application

2. 从应用程序流式插入JSONL数据

my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json
my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json

3. Insert batch from JSONL file

3. 从JSONL文件批量插入

bigquery tables insert my-project.dataset.events
--data events.jsonl --format json
bigquery tables insert my-project.dataset.events
--data events.jsonl --format json

4. Stream with transformation and error handling

4. 转换并流式插入,同时处理错误

cat raw_events.json | jq -c '.events[]' |
bigquery tables insert my-project.dataset.events
--data - --format json
--skip-invalid
--ignore-unknown
undefined
cat raw_events.json | jq -c '.events[]' |
bigquery tables insert my-project.dataset.events
--data - --format json
--skip-invalid
--ignore-unknown
undefined

Best Practices

最佳实践

Query Development

查询开发

  1. Always dry-run first: Use
    bigquery dry-run
    to estimate costs
  2. Use templates: Create templates for repeated queries
  3. Validate before running: Check syntax and cost before execution
  4. Use text format for exploration:
    --format text
    for human-readable tables
  5. Use JSON for automation:
    --format json
    for machine processing
  6. Skip confirmations in scripts: Use
    --yes
    flag for automation
  1. 始终先预运行:使用
    bigquery dry-run
    估算成本
  2. 使用模板:为重复查询创建模板
  3. 运行前验证:执行前检查语法和成本
  4. 探索时使用文本格式
    --format text
    适合人类阅读
  5. 自动化时使用JSON格式
    --format json
    适合机器处理
  6. 脚本中跳过确认:自动化场景使用
    --yes
    参数

Cost Management

成本管理

  1. Dry run expensive queries: Always estimate with
    bigquery dry-run
  2. Monitor bytes processed: Check query cost estimates before running
  3. Use partition pruning: Filter on partitioned columns in WHERE clauses
  4. Limit result sets: Use LIMIT for exploratory queries
  5. Use templates: Standardize queries to avoid mistakes
  6. Leverage external tables: Avoid copying data when querying directly from GCS
  1. 高成本查询先预运行:始终使用
    bigquery dry-run
    估算
  2. 监控处理的数据量:执行前查看查询成本估算
  3. 使用分区过滤:在WHERE子句中过滤分区列
  4. 限制结果集:探索性查询使用LIMIT
  5. 使用模板:标准化查询避免错误
  6. 使用外部表:直接查询GCS数据,避免复制

Authentication

身份验证

  1. Check auth first: Run
    bigquery auth check
    before operations
  2. Use service accounts: For automation and CI/CD
  3. Verify scopes: Ensure all required BigQuery scopes are granted
  4. Re-authenticate when needed:
    bigquery auth login
    if check fails
  1. 先检查身份验证:操作前运行
    bigquery auth check
  2. 使用服务账号:自动化和CI/CD场景优先使用
  3. 验证权限范围:确保所有必要的BigQuery权限已授予
  4. 必要时重新验证:如果检查失败,运行
    bigquery auth login

Template Management

模板管理

  1. Use descriptive names: Make templates easy to find
  2. Document parameters: Include parameter descriptions in templates
  3. Validate before use: Run
    bigquery templates validate
    before execution
  4. Search before creating: Check if similar template exists
  5. Version control templates: Store template definitions in git
  1. 使用描述性名称:让模板易于查找
  2. 文档化参数:在模板中包含参数描述
  3. 使用前验证:执行前运行
    bigquery templates validate
  4. 创建前搜索:检查是否已有类似模板
  5. 版本控制模板:将模板定义存储在git中

Data Loading

数据加载

  1. Choose the right method:
    • Use
      insert
      for <1000 rows (streaming insert API, immediate availability)
    • Use
      load
      for >10MB files or >1000 rows (load jobs with GCS upload)
  2. Use JSONL for streaming: Newline-delimited JSON is ideal for streaming pipelines
  3. Stream from stdin: Use
    --data -
    to pipe data from applications or transformations
  4. Validate before loading: Use
    --dry-run
    flag to test configurations
  5. Handle bad records: Set
    --max-bad-records
    for messy data
  6. Choose write disposition:
    replace
    for full refresh,
    append
    for incremental
  7. Use external tables: For data that changes frequently in GCS (no data copying)
  8. Use appropriate formats: CSV for simple data, JSON/JSONL for complex, Parquet/Avro for large datasets
  1. 选择合适的方法:
    • 少于1000行使用
      insert
      (流式插入API,数据立即可用)
    • 大于10MB文件或1000行使用
      load
      (需要上传至GCS的加载作业)
  2. 流式处理使用JSONL:换行分隔JSON是流式流水线的理想格式
  3. 从标准输入流式插入:使用
    --data -
    从应用程序或转换工具管道数据
  4. 加载前验证:使用
    --dry-run
    参数测试配置
  5. 处理错误记录:为杂乱数据设置
    --max-bad-records
  6. 选择合适的写入模式
    replace
    用于全量刷新,
    append
    用于增量加载
  7. 使用外部表:针对GCS中频繁变化的数据(无需复制)
  8. 使用合适的格式:简单数据用CSV,复杂数据用JSON/JSONL,大规模数据集用Parquet/Avro

MCP Server

MCP服务器

  1. Use stdio for local: Prefer stdio mode for local MCP clients
  2. Use HTTP for remote: Use HTTP mode for networked deployments
  3. Secure HTTP endpoints: Put HTTP server behind authentication/firewall
  4. Monitor server logs: Check for errors and performance issues
  5. Set appropriate port: Choose non-conflicting port for HTTP mode
  1. 本地使用标准输入输出模式:本地MCP客户端优先使用该模式
  2. 远程使用HTTP模式:网络部署使用HTTP模式
  3. 保护HTTP端点:将HTTP服务器放在身份验证/防火墙后
  4. 监控服务器日志:检查错误和性能问题
  5. 选择合适的端口:为HTTP模式选择不冲突的端口

LSP Integration

LSP集成

  1. Configure per-project: Set up LSP for SQL files in your editor
  2. Use schema completion: Leverage auto-complete for table/column names
  3. Check diagnostics: Fix errors and warnings shown inline
  4. Format queries: Use LSP formatting for consistent style
  1. 按项目配置:为编辑器中的SQL文件配置LSP
  2. 使用Schema补全:利用自动补全功能输入表/列名
  3. 查看诊断信息:修复内联显示的错误和警告
  4. 格式化查询:使用LSP格式化保持一致的风格

Configuration

配置

Environment Variables

环境变量

bash
undefined
bash
undefined

Set default project

设置默认项目

export GOOGLE_CLOUD_PROJECT=my-project
export GOOGLE_CLOUD_PROJECT=my-project

Set credentials (for service accounts)

设置凭据(服务账号)

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Add to ~/.zshrc or ~/.bashrc for persistence

添加到~/.zshrc或~/.bashrc以持久化

echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc
undefined
echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc
undefined

Authentication Methods

身份验证方法

User Credentials (interactive):
bash
bigquery auth login
用户凭据(交互式):
bash
bigquery auth login

Opens browser for Google authentication

打开浏览器进行Google身份验证


**Service Account** (automation):
```bash
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth check
Application Default Credentials (gcloud):
bash
gcloud auth application-default login
bigquery auth check

**服务账号**(自动化):
```bash
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth check
应用默认凭据(gcloud):
bash
gcloud auth application-default login
bigquery auth check

Troubleshooting

故障排除

Issue: "Not authenticated" or "Permission denied"

问题:"未验证"或"权限被拒绝"

Solution: Check authentication and scopes
bash
undefined
解决方案:检查身份验证和权限范围
bash
undefined

Check current auth status

检查当前身份验证状态

bigquery auth check
bigquery auth check

Re-authenticate if needed

必要时重新验证

bigquery auth login
bigquery auth login

Verify gcloud is set to correct project

验证gcloud是否设置了正确的项目

gcloud config get-value project
gcloud config get-value project

Set project if needed

必要时设置项目

gcloud config set project my-project
undefined
gcloud config set project my-project
undefined

Issue: "Table not found"

问题:"表未找到"

Solution: Use fully qualified table names
bash
undefined
解决方案:使用完全限定的表名
bash
undefined

Wrong - missing project/dataset

错误示例 - 缺少项目/数据集

bigquery query "SELECT * FROM table"
bigquery query "SELECT * FROM table"

Correct - fully qualified

正确示例 - 完全限定名称

bigquery query "SELECT * FROM my-project.my-dataset.my-table"
bigquery query "SELECT * FROM my-project.my-dataset.my-table"

Or use backticks for reserved words

保留字使用反引号

bigquery query "SELECT * FROM `my-project.my-dataset.my-table`"
undefined
bigquery query "SELECT * FROM `my-project.my-dataset.my-table`"
undefined

Issue: "Query too expensive"

问题:"查询成本过高"

Solution: Check cost with dry-run and optimize
bash
undefined
解决方案:使用预运行检查成本并优化查询
bash
undefined

Check estimated cost

检查估算成本

bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"
bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"

Optimize with partition filters

使用分区过滤优化

bigquery dry-run " SELECT * FROM large_table WHERE _PARTITIONDATE = '2025-01-15' "
undefined
bigquery dry-run " SELECT * FROM large_table WHERE _PARTITIONDATE = '2025-01-15' "
undefined

Issue: "Template not found"

问题:"模板未找到"

Solution: Search for templates and verify name
bash
undefined
解决方案:搜索模板并验证名称
bash
undefined

List all templates

列出所有模板

bigquery templates list
bigquery templates list

Search for template

搜索模板

bigquery templates search "keyword"
bigquery templates search "keyword"

Use exact template name

使用精确的模板名称

bigquery templates run exact-template-name
undefined
bigquery templates run exact-template-name
undefined

Issue: "The specified bucket does not exist"

问题:"指定的存储桶不存在"

Cause:
bigquery tables load
with a local file requires GCS staging bucket configuration.
Solutions:
  1. Preferred for small datasets (<1000 rows): Use
    tables insert
    instead (no GCS required)
    bash
    bigquery tables insert my-project.dataset.table \
      --data /tmp/data.jsonl \
      --format json
  2. For larger datasets: Upload to GCS first, then load
    bash
    gsutil cp /tmp/large-file.jsonl gs://my-bucket/
    bigquery tables load my-project.dataset.table \
      gs://my-bucket/large-file.jsonl \
      --format json
  3. Last resort: Configure GCS staging bucket in BigQuery CLI config (requires additional setup)
原因:使用
bigquery tables load
加载本地文件需要配置GCS暂存存储桶。
解决方案:
  1. 小规模数据集(<1000行)推荐方案:改用
    tables insert
    (无需GCS)
    bash
    bigquery tables insert my-project.dataset.table \
      --data /tmp/data.jsonl \
      --format json
  2. 大规模数据集方案:先上传至GCS,再加载
    bash
    gsutil cp /tmp/large-file.jsonl gs://my-bucket/
    bigquery tables load my-project.dataset.table \
      gs://my-bucket/large-file.jsonl \
      --format json
  3. 最后方案:在BigQuery CLI配置中设置GCS暂存存储桶(需要额外配置)

Issue: "Invalid schema"

问题:"无效的Schema"

Solution: Check schema format for external tables
bash
undefined
解决方案:检查外部表的Schema格式
bash
undefined

Schema format: column:type,column:type,...

Schema格式:column:type,column:type,...

bigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
bigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"

Or use autodetect

或使用自动检测

bigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--autodetect
undefined
bigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--autodetect
undefined

Issue: "MCP server not responding"

问题:"MCP服务器无响应"

Solution: Check server mode and connectivity
bash
undefined
解决方案:检查服务器模式和连接性
bash
undefined

For stdio mode, ensure client is using stdio transport

标准输入输出模式,确保客户端使用标准输入输出传输

bigquery mcp stdio
bigquery mcp stdio

For HTTP mode, check port and firewall

HTTP模式,检查端口和防火墙

bigquery mcp http --port 8080
bigquery mcp http --port 8080

Test HTTP endpoint

测试HTTP端点

undefined
undefined

Issue: "LSP not starting in editor"

问题:"编辑器中LSP无法启动"

Solution: Verify LSP configuration and binary path
bash
undefined
解决方案:验证LSP配置和二进制文件路径
bash
undefined

Check bigquery is in PATH

检查bigquery是否在PATH中

which bigquery
which bigquery

Test LSP manually

手动测试LSP

bigquery lsp
bigquery lsp

Verify editor configuration points to correct command

验证编辑器配置指向正确的命令

Neovim: check cmd = { "bigquery", "lsp" }

Neovim: 检查cmd = { "bigquery", "lsp" }

VS Code: check "command": "bigquery", "args": ["lsp"]

VS Code: 检查"command": "bigquery", "args": ["lsp"]

undefined
undefined

Quick Reference

快速参考

bash
undefined
bash
undefined

Authentication

身份验证

bigquery auth check # Check auth status bigquery auth login # Login with gcloud
bigquery auth check # 检查身份验证状态 bigquery auth login # 通过gcloud登录

Queries

查询

bigquery query "SELECT ..." # Execute query bigquery query --yes "SELECT ..." # Skip confirmation bigquery query --format text "SELECT ..." # Table output bigquery dry-run "SELECT ..." # Estimate cost
bigquery query "SELECT ..." # 执行查询 bigquery query --yes "SELECT ..." # 跳过确认 bigquery query --format text "SELECT ..." # 表格格式输出 bigquery dry-run "SELECT ..." # 估算成本

Datasets

数据集

bigquery datasets list PROJECT # List datasets
bigquery datasets list PROJECT # 列出数据集

Tables

bigquery tables list PROJECT.DATASET # List tables bigquery tables describe PROJECT.DATASET.TABLE # Show schema bigquery tables insert TABLE --json '{"id": 1}' # Insert rows (inline) bigquery tables insert TABLE --data file.jsonl --format json # Insert from JSONL cat data.jsonl | bigquery tables insert TABLE --data - # Stream insert bigquery tables load TABLE file.csv # Load data (bulk) bigquery tables load TABLE gs://bucket/file.csv # Load from GCS bigquery tables extract TABLE gs://bucket/output.csv # Extract to GCS bigquery tables create-external TABLE --source-uri ... # External table bigquery tables update-external TABLE --source-uri ... # Update external
bigquery tables list PROJECT.DATASET # 列出表 bigquery tables describe PROJECT.DATASET.TABLE # 查看Schema bigquery tables insert TABLE --json '{"id": 1}' # 插入行(内联) bigquery tables insert TABLE --data file.jsonl --format json # 从JSONL插入 cat data.jsonl | bigquery tables insert TABLE --data - # 流式插入 bigquery tables load TABLE file.csv # 批量加载数据 bigquery tables load TABLE gs://bucket/file.csv # 从GCS加载 bigquery tables extract TABLE gs://bucket/output.csv # 提取到GCS bigquery tables create-external TABLE --source-uri ... # 创建外部表 bigquery tables update-external TABLE --source-uri ... # 更新外部表

Templates

模板

bigquery templates list # List templates bigquery templates search "keyword" # Search templates bigquery templates validate TEMPLATE # Validate template bigquery templates run TEMPLATE --param key=value # Run template
bigquery templates list # 列出模板 bigquery templates search "keyword" # 搜索模板 bigquery templates validate TEMPLATE # 验证模板 bigquery templates run TEMPLATE --param key=value # 运行模板

MCP Server

MCP服务器

bigquery mcp stdio # Start MCP (stdio mode) bigquery mcp http # Start MCP (HTTP mode) bigquery mcp http --port 3000 # Custom port
bigquery mcp stdio # 启动MCP(标准输入输出模式) bigquery mcp http # 启动MCP(HTTP模式) bigquery mcp http --port 3000 # 指定自定义端口

LSP Server

LSP服务器

bigquery lsp # Start LSP server
undefined
bigquery lsp # 启动LSP服务器
undefined

Integration Examples

集成示例

CI/CD Pipeline

CI/CD流水线

bash
#!/bin/bash
bash
#!/bin/bash

daily-etl.sh

daily-etl.sh

Authenticate with service account

使用服务账号身份验证

export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json bigquery auth check || exit 1
export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json bigquery auth check || exit 1

Run daily ETL template

运行每日ETL模板

bigquery templates run daily-etl
--param date=$(date +%Y-%m-%d)
--yes
--format json > /tmp/etl-result.json
bigquery templates run daily-etl
--param date=$(date +%Y-%m-%d)
--yes
--format json > /tmp/etl-result.json

Check result

检查结果

if [ $? -eq 0 ]; then echo "ETL completed successfully" else echo "ETL failed" exit 1 fi
undefined
if [ $? -eq 0 ]; then echo "ETL执行成功" else echo "ETL执行失败" exit 1 fi
undefined

Data Quality Checks

数据质量检查

bash
#!/bin/bash
bash
#!/bin/bash

check-data-quality.sh

check-data-quality.sh

Run data quality template

运行数据质量检查模板

RESULT=$(bigquery templates run data-quality-check
--param table=my-project.dataset.table
--yes
--format json)
RESULT=$(bigquery templates run data-quality-check
--param table=my-project.dataset.table
--yes
--format json)

Parse result and check quality metrics

解析结果并检查质量指标

INVALID_ROWS=$(echo $RESULT | jq '.invalid_rows')
if [ "$INVALID_ROWS" -gt 100 ]; then echo "Data quality check failed: $INVALID_ROWS invalid rows" exit 1 else echo "Data quality check passed" fi
undefined
INVALID_ROWS=$(echo $RESULT | jq '.invalid_rows')
if [ "$INVALID_ROWS" -gt 100 ]; then echo "数据质量检查失败:$INVALID_ROWS条无效行" exit 1 else echo "数据质量检查通过" fi
undefined

Scheduled Reporting

定时报表

bash
#!/bin/bash
bash
#!/bin/bash

generate-report.sh

generate-report.sh

Generate weekly report

生成周报

bigquery templates run weekly-revenue-report
--param week_start=$(date -d "last monday" +%Y-%m-%d)
--param week_end=$(date -d "next sunday" +%Y-%m-%d)
--yes
--format json > /reports/weekly-$(date +%Y-%m-%d).json
bigquery templates run weekly-revenue-report
--param week_start=$(date -d "last monday" +%Y-%m-%d)
--param week_end=$(date -d "next sunday" +%Y-%m-%d)
--yes
--format json > /reports/weekly-$(date +%Y-%m-%d).json

Upload to GCS

上传至GCS

gsutil cp /reports/weekly-*.json gs://reports-bucket/
undefined
gsutil cp /reports/weekly-*.json gs://reports-bucket/
undefined

Summary

总结

Primary commands:
  • bigquery auth {check,login}
    - Authentication management
  • bigquery query
    - Execute SQL with cost awareness
  • bigquery dry-run
    - Estimate query costs
  • bigquery datasets list
    - List datasets
  • bigquery tables {list,describe,insert,load,extract,create-external,update-external}
    - Table operations
  • bigquery templates {list,search,validate,run}
    - Named templates
  • bigquery mcp {stdio,http}
    - MCP server modes
  • bigquery lsp
    - LSP server
Key features:
  • Cost-aware query execution with confirmation prompts
  • Named query templates with parameter substitution
  • Streaming insert API for real-time data (<1000 rows)
  • Bulk load jobs for large datasets (>10MB or >1000 rows)
  • JSONL streaming support with stdin (
    --data -
    )
  • Data extraction to Cloud Storage (CSV, JSON, Avro, Parquet)
  • External table support for GCS data
  • MCP server with stdio and HTTP modes
  • LSP integration for editor support
Best practices:
  • Always check authentication first with
    auth check
  • Use
    dry-run
    to estimate costs before expensive queries
  • Create templates for frequently-used queries
  • Use
    --yes
    flag for automation and CI/CD
  • Use
    insert
    for <1000 rows,
    load
    for larger datasets
  • Use JSONL format for streaming pipelines
  • Stream from stdin with
    --data -
    for data transformations
  • Use external tables to avoid data duplication
  • Configure MCP for natural language query capabilities
  • Set up LSP in your editor for SQL development
MCP Integration:
  • Semantic search across datasets
  • Natural language to SQL translation
  • Schema discovery and exploration
  • Template execution via MCP tools
  • Available in both stdio and HTTP modes
主要命令:
  • bigquery auth {check,login}
    - 身份验证管理
  • bigquery query
    - 执行SQL查询,具备成本感知
  • bigquery dry-run
    - 估算查询成本
  • bigquery datasets list
    - 列出数据集
  • bigquery tables {list,describe,insert,load,extract,create-external,update-external}
    - 表操作
  • bigquery templates {list,search,validate,run}
    - 命名模板管理
  • bigquery mcp {stdio,http}
    - MCP服务器模式
  • bigquery lsp
    - LSP服务器
关键特性:
  • 具备成本感知的查询执行,带确认提示
  • 支持参数替换的命名查询模板
  • 实时数据流式插入API(少于1000行)
  • 大规模数据集批量加载作业(大于10MB或1000行)
  • 支持通过标准输入的JSONL流式处理(
    --data -
  • 支持将数据提取到Cloud Storage(CSV、JSON、Avro、Parquet)
  • GCS数据的外部表支持
  • 标准输入输出和HTTP模式的MCP服务器
  • 编辑器集成的LSP支持
最佳实践:
  • 操作前始终使用
    auth check
    检查身份验证
  • 高成本查询前使用
    dry-run
    估算成本
  • 为常用查询创建模板
  • 自动化和CI/CD场景使用
    --yes
    参数
  • 少于1000行使用
    insert
    ,更大数据集使用
    load
  • 流式流水线使用JSONL格式
  • 使用
    --data -
    从标准输入流式处理转换后的数据
  • 使用外部表避免数据重复
  • 配置MCP实现自然语言查询
  • 在编辑器中设置LSP进行SQL开发
MCP集成:
  • 跨数据集的语义搜索
  • 自然语言到SQL的转换
  • Schema发现和探索
  • 通过MCP工具执行模板
  • 支持标准输入输出和HTTP两种模式