ipynb-notebooks

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

IPYNB Notebook（.ipynb）

IPYNB Notebook (.ipynb)

概览

Overview

这个 skill 用于指导你以“工程化”的方式操作

.ipynb

文件与 notebook 项目（不限定 Jupyter，也适用于 Google Colab / VS Code Notebook 等环境）：

清晰的文件结构：notebook 作为界面，逻辑沉到可复用的
```
scripts/
```
与
```
lib/
```
Token 高效工作流：在 AI 读写 notebook 时尽量只读结构/代码，不读大输出
可展示模式：用于 demo、团队共享、文档化的结构与输出规范
可复现环境：优先使用
```
uv
```
，或退回到
```
venv
```
，确保可重复运行

This skill guides you to operate

.ipynb

files and notebook projects in an "engineered" manner (not limited to Jupyter, also applicable to environments like Google Colab / VS Code Notebook):

Clear file structure: Notebook serves as the interface, with logic sunk into reusable
```
scripts/
```
and
```
lib/
```
Efficient token workflow: When AI reads/writes notebooks, only read structure/code as much as possible, not large outputs
Presentable mode: Structure and output specifications for demos, team sharing, and documentation
Reproducible environment: Prefer
```
uv
```
, or fall back to
```
venv
```
, to ensure repeatable execution

适用场景

Applicable Scenarios

在以下场景使用本 skill：

新建 notebook 项目或单个 notebook
审阅 / 编辑已有
```
.ipynb
```
（尤其是大文件、输出很多、diff 难读的情况）
整理 notebook 项目结构，把“可复用逻辑”从 notebook 抽到模块/脚本
为演示、分享、归档做“可跑通、可复现、可导出”的整理
改善 notebook 的长期可维护性与版本控制体验

Use this skill in the following scenarios:

Creating a new notebook project or single notebook
Reviewing / editing existing
```
.ipynb
```
files (especially large files with many outputs and unreadable diffs)
Organizing notebook project structures, extracting "reusable logic" from notebooks into modules/scripts
Organizing "runnable, reproducible, exportable" notebooks for demos, sharing, and archiving
Improving long-term maintainability and version control experience of notebooks

核心原则

Core Principles

Notebook 是界面（interface），不是库（library）。

notebook 适合交互探索与叙事展示；可复用、可测试、可自动化的逻辑应放在：

```
scripts/
```
：可直接运行的脚本（不依赖 notebook UI）
```
lib/
```
：可复用模块（被 notebook 与脚本共同 import）

这样做带来的收益：

多 notebook 复用同一套逻辑
无需跑 notebook 就能测试关键逻辑
更容易在 CI/CD 中自动化执行（如导出、定时跑数）
diff 更干净、版本控制更友好

Notebook is an interface, not a library.

Notebooks are suitable for interactive exploration and narrative presentation; reusable, testable, automatable logic should be placed in:

```
scripts/
```
: Directly runnable scripts (no dependency on notebook UI)
```
lib/
```
: Reusable modules (imported by both notebooks and scripts)

Benefits of this approach:

Reuse the same logic across multiple notebooks
Test key logic without running notebooks
Easier automation in CI/CD (e.g., export, scheduled data processing)
Cleaner diffs and more friendly version control

快速上手

Quick Start

新建一个 notebook 项目（推荐 uv）

Create a new notebook project (uv recommended)

初始化项目（uv）

bash

# Create project directory
mkdir notebook-project && cd notebook-project

# Initialize uv project
uv init

# Add dependencies (pick what you need)
uv add jupyterlab pandas plotly

建立目录结构

bash

mkdir -p scripts lib data/{raw,processed} reports docs .archive
touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep

准备
.gitignore
（示例）

gitignore

# Virtual environments
.venv/

# Data and outputs (keep .gitkeep)
data/**
!data/**/
!data/**/.gitkeep
reports/**
!reports/**/
!reports/**/.gitkeep

# Jupyter
.ipynb_checkpoints/

# Python
__pycache__/
*.pyc

# Environment
.env

启动 notebook 环境
bash
```
uv run jupyter lab
```
需要更详细的模式时再加载引用文档：
- ```
references/file-structure.md
```
  ：目录结构与项目组织
- ```
references/presentation-patterns.md
```
  ：演示/分享结构与输出规范
- ```
references/token-efficiency.md
```
  ：AI 读写 notebook 的 token 高效策略

Initialize project (uv)

bash

# Create project directory
mkdir notebook-project && cd notebook-project

# Initialize uv project
uv init

# Add dependencies (pick what you need)
uv add jupyterlab pandas plotly

Set up directory structure

bash

mkdir -p scripts lib data/{raw,processed} reports docs .archive
touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep

Prepare
.gitignore
(example)

gitignore

# Virtual environments
.venv/

# Data and outputs (keep .gitkeep)
data/**
!data/**/
!data/**/.gitkeep
reports/**
!reports/**/
!reports/**/.gitkeep

# Jupyter
.ipynb_checkpoints/

# Python
__pycache__/
*.pyc

# Environment
.env

Start notebook environment
bash
```
uv run jupyter lab
```
Load reference documents when more detailed patterns are needed:
- ```
references/file-structure.md
```
  : Directory structure and project organization
- ```
references/presentation-patterns.md
```
  : Demonstration/sharing structure and output specifications
- ```
references/token-efficiency.md
```
  : Token efficiency strategies for AI reading/writing notebooks

审阅 / 对比一个已有 notebook（尽量只看结构与代码）

Review / compare an existing notebook (focus on structure and code as much as possible)

推荐工作流：

先看结构，不读输出

bash

# Cell types and counts
jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb

# Code cells with outputs
jq '[.cells[] | select(.cell_type == "code") | select(.outputs | length > 0)] | length' notebook.ipynb

只对比代码 cell

bash

# Extract code sources to compare
jq '.cells[] | select(.cell_type == "code") | .source' notebook1.ipynb > /tmp/code1.json
jq '.cells[] | select(.cell_type == "code") | .source' notebook2.ipynb > /tmp/code2.json
diff /tmp/code1.json /tmp/code2.json

确有必要再读取 notebook 正文
- 先明确要读哪一段、哪类 cell，再读
- 大 notebook 优先按 cell 范围/主题分段读取
- 细节见
```
references/token-efficiency.md
```

Recommended workflow:

Check structure first, don't read outputs

bash

# Cell types and counts
jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb

# Code cells with outputs
jq '[.cells[] | select(.cell_type == "code") | select(.outputs | length > 0)] | length' notebook.ipynb

Compare only code cells

bash

# Extract code sources to compare
jq '.cells[] | select(.cell_type == "code") | .source' notebook1.ipynb > /tmp/code1.json
jq '.cells[] | select(.cell_type == "code") | .source' notebook2.ipynb > /tmp/code2.json
diff /tmp/code1.json /tmp/code2.json

Read notebook content only when necessary
- Clarify which section or cell type to read before accessing
- For large notebooks, prefer segmented reading by cell range/topic
- Details in
```
references/token-efficiency.md
```

整理一个 notebook 项目（抽逻辑、控输出、让它可复现）

Organize a notebook project (extract logic, control outputs, make it reproducible)

目录组织建议见

references/file-structure.md

。这里给一个可执行的最小迁移步骤：

盘点根目录文件数量：
```
ls -1 | wc -l
```
移动脚本到
```
scripts/
```
，文档到
```
docs/
```
，旧 notebook 到
```
.archive/
```
更新 notebook 中的 import：
```
from lib import module_name
```
验证仍可正常运行

Directory organization suggestions are in

references/file-structure.md

. Here are minimal executable migration steps:

Count root directory files:
```
ls -1 | wc -l
```
Move scripts to
```
scripts/
```
, documents to
```
docs/
```
, old notebooks to
```
.archive/
```
Update imports in notebooks:
```
from lib import module_name
```
Verify normal operation is still possible

可复现环境（uv / venv）

Reproducible Environment (uv / venv)

为什么优先 uv？

Why prefer uv?

uv 适合做以下事情：

快速、可复现的依赖管理
在项目依赖环境中运行工具（如
```
jupyter
```
,
```
nbconvert
```
）
不污染全局 Python
跨平台一致性更好

uv is suitable for:

Fast, reproducible dependency management
Running tools in project dependency environments (e.g.,
```
jupyter
```
,
```
nbconvert
```
)
No pollution to global Python
Better cross-platform consistency

常用命令模式

Common command patterns

添加依赖：

bash

uv add plotly pandas duckdb

安装工具（可选）：

bash

uv tool install jupyterlab

在项目环境中运行：

bash

uv run jupyter lab

单文件脚本声明依赖（用于
uv run
）：

python

undefined

Add dependencies:

bash

uv add plotly pandas duckdb

Install tools (optional):

bash

uv tool install jupyterlab

Run in project environment:

bash

uv run jupyter lab

Single-file script dependency declaration (for
uv run
):

python

undefined

/// script

requires-python = ">=3.11"

dependencies = [

"pandas",

"plotly",

]

///

import pandas as pd import plotly.express as px

Script code here


运行：`uv run script.py`

如果你不能使用 uv，也可以用 `python -m venv .venv` + `pip`，但要确保能一键复现（建议 `requirements.txt` 或 `pyproject.toml` + lockfile）。


Run: `uv run script.py`

If you can't use uv, you can also use `python -m venv .venv` + `pip`, but ensure one-click reproducibility (recommend `requirements.txt` or `pyproject.toml` + lockfile).

Token 高效工作流（面向 AI 与版本控制）

Token Efficient Workflow (for AI and Version Control)

—

Default strategy: Clean outputs before committing

当通过 AI 助手读写

.ipynb

时：

Recommended pre-commit:

yaml

undefined

默认策略：提交前清理输出

.pre-commit-config.yaml

推荐 pre-commit：

yaml

undefined

repos:

repo: https://github.com/kynan/nbstripout rev: 0.6.1 hooks:
- id: nbstripout


**When outputs must be retained (not recommended):**

```bash
SKIP=nbstripout git commit -m "Add notebook with visualization outputs"

A more common practice is: save outputs to

reports/

, keep notebooks in a state where "re-running can reproduce outputs" (see

references/token-efficiency.md

.pre-commit-config.yaml

Query before reading (structure first)

repos:

repo: https://github.com/kynan/nbstripout rev: 0.6.1 hooks:
- id: nbstripout


**确需保留输出时（不推荐）：**

```bash
SKIP=nbstripout git commit -m "Add notebook with visualization outputs"

更常见的做法是：输出落盘到

reports/

，notebook 保持“可重新运行即可复现输出”（见

references/token-efficiency.md

）。

Check structure first:

bash

jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb

View only code:

bash

jq '.cells[] | select(.cell_type == "code") | .source' notebook.ipynb

读之前先查询（结构优先）

Output should be "controllable, reproducible"

先看结构：

bash

jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb

只看代码：

bash

jq '.cells[] | select(.cell_type == "code") | .source' notebook.ipynb

Prefer output summaries, don't directly dump large objects:

python

print(f"[OK] Loaded {len(df_alarms):,} rows")
print(f"Columns: {', '.join(df_alarms.columns)}")
print(f"Date range: {df_alarms['timestamp'].min()} to {df_alarms['timestamp'].max()}")

Save large outputs to files:

python

fig.write_html(report_dir / "visualization.html")
print(f"[OK] Saved visualization to {report_dir}/visualization.html")

Complete strategies are in

references/token-efficiency.md

输出要“可控、可复现”

Demonstration / Sharing Mode

—

Recommended notebook structure

倾向输出摘要，不要直接 dump 大对象：

python

print(f"[OK] Loaded {len(df_alarms):,} rows")
print(f"Columns: {', '.join(df_alarms.columns)}")
print(f"Date range: {df_alarms['timestamp'].min()} to {df_alarms['timestamp'].max()}")

大输出落盘到文件：

python

fig.write_html(report_dir / "visualization.html")
print(f"[OK] Saved visualization to {report_dir}/visualization.html")

完整策略见

references/token-efficiency.md

。

Title & Overview - Background and objectives
Preparation - Imports and configuration
Data Loading - With feedback and error handling
Summary - High-level statistics
Visualization - With explanations and usage tips
Conclusion - Key findings

演示 / 分享模式

More "professional" output habits

更“专业”的输出习惯

Resource Index

—

references/file-structure.md

统一状态输出：

python

print("[OK] Success")
print("[WARN] Warning")
print("[ERR] Error")
print("[INFO] Note")

数字格式化：

python

print(f"Total: {count:,}")  # 2,055 instead of 2055

按日期落盘到 reports：

python

from datetime import datetime

today = datetime.now().strftime('%Y-%m-%d')
report_dir = Path("reports") / today
report_dir.mkdir(parents=True, exist_ok=True)

fig.write_html(report_dir / "chart.html")

latest = Path("reports/latest")
if latest.exists():
    latest.unlink()
latest.symlink_to(today, target_is_directory=True)

完整模式与模板见

references/presentation-patterns.md

。

Includes:

Recommended directory structure
File organization rules and naming conventions
Git-friendly practices (ignore, diff, output cleaning)
Migration steps for existing projects
Example structures

Suitable for: Loading when creating new projects, refactoring directories, unifying conventions.

资源索引

references/token-efficiency.md

references/file-structure.md

—

包含：

推荐目录结构
文件组织规则与命名约定
Git 友好（ignore、diff、清理输出）
现有项目迁移步骤
示例结构

适合在： 新建项目、重构目录、统一约定时加载。

Includes:

Output cleaning and version control strategies
Structured query methods without reading outputs
Segmented reading and diff ideas for large notebooks
Common
```
jq
```
/ CLI patterns
Cell output management

Suitable for: Loading when token saving is needed, reviewing large notebooks, or performing automated processing.

references/token-efficiency.md

references/presentation-patterns.md

包含：

输出清理与版本控制策略
不读输出的结构化查询方法
大 notebook 的分段读取与 diff 思路
常用
```
jq
```
/ CLI 模式
cell 输出管理

适合在： 需要省 token、要审阅大 notebook、要做自动化处理时加载。

Includes:

Structure templates for demonstration notebooks
Readability and narrative rhythm
Interactive elements and export strategies
Error handling and reproducibility checkpoints
Division of labor between Markdown / Code cells
Notes on exporting to HTML/PDF

Suitable for: Loading before creating demos, team sharing, or publishing documentation.

references/presentation-patterns.md

Best Practices Cheat Sheet

包含：

演示型 notebook 的结构模板
可读性与叙事节奏
交互元素与可导出策略
错误处理与可复现检查点
Markdown / Code cell 分工
导出 HTML/PDF 的注意事项

适合在： 做 demo、团队分享、发布文档前加载。

Structure: Notebook as interface, logic sunk into
```
scripts/
```
/
```
lib/
```
Dependencies: Prefer uv to ensure one-click reproducibility
Version Control: Clean outputs by default (pre-commit/nbstripout/nbconvert)
Token Saving: Query structure before reading; save large outputs to files
Presentation: Clear narrative, restrained outputs, explicit error handling
Reproducibility: Ensure "Restart & Run All" works
Data Flow: raw → processed → reports
Git-friendly: Ignore data and products, keep directory skeleton (
```
.gitkeep
```
)

最佳实践速记

Example Workflow

结构：notebook 作为界面，逻辑下沉到
```
scripts/
```
/
```
lib/
```
依赖：优先 uv，保证一键复现
版本控制：默认清理输出（pre-commit/nbstripout/nbconvert）
省 token：先查询结构再阅读；大输出落盘
展示：叙事清晰、输出克制、错误处理明确
可复现：确保 “Restart & Run All” 能跑通
数据流：raw → processed → reports
Git 友好：忽略数据与产物，保留目录骨架（
```
.gitkeep
```
）

bash

undefined

示例流程

1. Create project

bash

undefined

mkdir my-analysis && cd my-analysis uv init uv add jupyterlab pandas plotly

1. Create project

2. Set up structure

mkdir my-analysis && cd my-analysis uv init uv add jupyterlab pandas plotly

mkdir -p scripts lib data/{raw,processed} reports touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep

2. Set up structure

3. Create notebook

mkdir -p scripts lib data/{raw,processed} reports touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep

uv run jupyter lab

3. Create notebook

4. As you work:

—

- Keep logic in lib/ and scripts/

—

- Save outputs to reports/ with dates

—

- Keep outputs minimal

—

- Strip outputs before committing

—

5. Before presenting:

—

- Run "Restart & Run All" to test

—

- Add context and documentation

—

- Consider exporting to HTML

uv run jupyter lab

jupyter nbconvert --to html --execute notebook.ipynb

undefined

4. As you work:

Cheat Sheet

- Keep logic in lib/ and scripts/

—

- Save outputs to reports/ with dates

—

- Keep outputs minimal

—

- Strip outputs before committing

—

5. Before presenting:

—

- Run "Restart & Run All" to test

—

- Add context and documentation

—

- Consider exporting to HTML

—

jupyter nbconvert --to html --execute notebook.ipynb

undefined

Directory Organization:

Notebooks: Project root (or split into
```
notebooks/
```
by scale)
Scripts:
```
scripts/
```
Modules:
```
lib/
```
Data:
```
data/raw/
```
,
```
data/processed/
```
Reports:
```
reports/YYYY-MM-DD/
```
Archive:
```
.archive/
```

Common uv Commands:

```
uv init
```
: Initialize project
```
uv add <package>
```
: Add dependencies
```
uv run <command>
```
: Run command in project environment
```
uvx <tool>
```
: Run temporary tool (not written to project dependencies)

Token Saving:

Clean outputs: pre-commit hook, or

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace notebook.ipynb

Query structure:
```
jq '.cells | group_by(.cell_type)'
```

Compare code:

jq '.cells[] | select(.cell_type == "code") | .source'

Presentation:

Number formatting:
```
{count:,}
```
Save to files by date:
```
reports/YYYY-MM-DD/
```
Execution verification:
```
jupyter nbconvert --execute
```

速查表

—

目录组织：

Notebook：项目根目录（或按规模拆到
```
notebooks/
```
）
脚本：
```
scripts/
```
模块：
```
lib/
```
数据：
```
data/raw/
```
,
```
data/processed/
```
报告：
```
reports/YYYY-MM-DD/
```
归档：
```
.archive/
```

uv 常用命令：

```
uv init
```
：初始化项目
```
uv add <package>
```
：添加依赖
```
uv run <command>
```
：在项目环境中运行命令
```
uvx <tool>
```
：运行临时工具（不写入项目依赖）

省 token：

清理输出：pre-commit hook，或

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace notebook.ipynb

查询结构：
```
jq '.cells | group_by(.cell_type)'
```

对比代码：

jq '.cells[] | select(.cell_type == "code") | .source'

展示：

数字格式化：
```
{count:,}
```
按日期落盘：
```
reports/YYYY-MM-DD/
```
执行验证：
```
jupyter nbconvert --execute
```

—

ipynb-notebooks

Original

Translation

IPYNB Notebook（.ipynb）

IPYNB Notebook (.ipynb)

概览

Overview

适用场景

Applicable Scenarios

核心原则

Core Principles

快速上手

Quick Start

新建一个 notebook 项目（推荐 uv）

Create a new notebook project (uv recommended)

审阅 / 对比一个已有 notebook（尽量只看结构与代码）

Review / compare an existing notebook (focus on structure and code as much as possible)

整理一个 notebook 项目（抽逻辑、控输出、让它可复现）

Organize a notebook project (extract logic, control outputs, make it reproducible)

可复现环境（uv / venv）

Reproducible Environment (uv / venv)

为什么优先 uv？

Why prefer uv?

常用命令模式

Common command patterns

/// script

/// script

requires-python = ">=3.11"

requires-python = ">=3.11"

dependencies = [

dependencies = [

"pandas",

"pandas",

"plotly",

"plotly",

]

]

///

///

Script code here

Script code here

Token 高效工作流（面向 AI 与版本控制）

Token Efficient Workflow (for AI and Version Control)

Default strategy: Clean outputs before committing

默认策略：提交前清理输出

.pre-commit-config.yaml

.pre-commit-config.yaml

Query before reading (structure first)

读之前先查询（结构优先）

Output should be "controllable, reproducible"

输出要“可控、可复现”

Demonstration / Sharing Mode

Recommended notebook structure

演示 / 分享模式

More "professional" output habits

推荐的 notebook 结构

更“专业”的输出习惯

Resource Index

references/file-structure.md

资源索引

references/token-efficiency.md

references/file-structure.md

references/token-efficiency.md

references/presentation-patterns.md

references/presentation-patterns.md

Best Practices Cheat Sheet

最佳实践速记

Example Workflow

示例流程

1. Create project

1. Create project

2. Set up structure

2. Set up structure

3. Create notebook

3. Create notebook

4. As you work:

- Keep logic in lib/ and scripts/

- Save outputs to reports/ with dates

- Keep outputs minimal

- Strip outputs before committing