ipynb-notebooks
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseIPYNB Notebook(.ipynb)
IPYNB Notebook (.ipynb)
概览
Overview
这个 skill 用于指导你以“工程化”的方式操作 文件与 notebook 项目(不限定 Jupyter,也适用于 Google Colab / VS Code Notebook 等环境):
.ipynb- 清晰的文件结构:notebook 作为界面,逻辑沉到可复用的 与
scripts/lib/ - Token 高效工作流:在 AI 读写 notebook 时尽量只读结构/代码,不读大输出
- 可展示模式:用于 demo、团队共享、文档化的结构与输出规范
- 可复现环境:优先使用 ,或退回到
uv,确保可重复运行venv
This skill guides you to operate files and notebook projects in an "engineered" manner (not limited to Jupyter, also applicable to environments like Google Colab / VS Code Notebook):
.ipynb- Clear file structure: Notebook serves as the interface, with logic sunk into reusable and
scripts/lib/ - Efficient token workflow: When AI reads/writes notebooks, only read structure/code as much as possible, not large outputs
- Presentable mode: Structure and output specifications for demos, team sharing, and documentation
- Reproducible environment: Prefer , or fall back to
uv, to ensure repeatable executionvenv
适用场景
Applicable Scenarios
在以下场景使用本 skill:
- 新建 notebook 项目或单个 notebook
- 审阅 / 编辑已有 (尤其是大文件、输出很多、diff 难读的情况)
.ipynb - 整理 notebook 项目结构,把“可复用逻辑”从 notebook 抽到模块/脚本
- 为演示、分享、归档做“可跑通、可复现、可导出”的整理
- 改善 notebook 的长期可维护性与版本控制体验
Use this skill in the following scenarios:
- Creating a new notebook project or single notebook
- Reviewing / editing existing files (especially large files with many outputs and unreadable diffs)
.ipynb - Organizing notebook project structures, extracting "reusable logic" from notebooks into modules/scripts
- Organizing "runnable, reproducible, exportable" notebooks for demos, sharing, and archiving
- Improving long-term maintainability and version control experience of notebooks
核心原则
Core Principles
Notebook 是界面(interface),不是库(library)。
notebook 适合交互探索与叙事展示;可复用、可测试、可自动化的逻辑应放在:
- :可直接运行的脚本(不依赖 notebook UI)
scripts/ - :可复用模块(被 notebook 与脚本共同 import)
lib/
这样做带来的收益:
- 多 notebook 复用同一套逻辑
- 无需跑 notebook 就能测试关键逻辑
- 更容易在 CI/CD 中自动化执行(如导出、定时跑数)
- diff 更干净、版本控制更友好
Notebook is an interface, not a library.
Notebooks are suitable for interactive exploration and narrative presentation; reusable, testable, automatable logic should be placed in:
- : Directly runnable scripts (no dependency on notebook UI)
scripts/ - : Reusable modules (imported by both notebooks and scripts)
lib/
Benefits of this approach:
- Reuse the same logic across multiple notebooks
- Test key logic without running notebooks
- Easier automation in CI/CD (e.g., export, scheduled data processing)
- Cleaner diffs and more friendly version control
快速上手
Quick Start
新建一个 notebook 项目(推荐 uv)
Create a new notebook project (uv recommended)
-
初始化项目(uv)bash
# Create project directory mkdir notebook-project && cd notebook-project # Initialize uv project uv init # Add dependencies (pick what you need) uv add jupyterlab pandas plotly -
建立目录结构bash
mkdir -p scripts lib data/{raw,processed} reports docs .archive touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep -
准备(示例)
.gitignoregitignore# Virtual environments .venv/ # Data and outputs (keep .gitkeep) data/** !data/**/ !data/**/.gitkeep reports/** !reports/**/ !reports/**/.gitkeep # Jupyter .ipynb_checkpoints/ # Python __pycache__/ *.pyc # Environment .env -
启动 notebook 环境bash
uv run jupyter lab -
需要更详细的模式时再加载引用文档:
- :目录结构与项目组织
references/file-structure.md - :演示/分享结构与输出规范
references/presentation-patterns.md - :AI 读写 notebook 的 token 高效策略
references/token-efficiency.md
-
Initialize project (uv)bash
# Create project directory mkdir notebook-project && cd notebook-project # Initialize uv project uv init # Add dependencies (pick what you need) uv add jupyterlab pandas plotly -
Set up directory structurebash
mkdir -p scripts lib data/{raw,processed} reports docs .archive touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep -
Prepare(example)
.gitignoregitignore# Virtual environments .venv/ # Data and outputs (keep .gitkeep) data/** !data/**/ !data/**/.gitkeep reports/** !reports/**/ !reports/**/.gitkeep # Jupyter .ipynb_checkpoints/ # Python __pycache__/ *.pyc # Environment .env -
Start notebook environmentbash
uv run jupyter lab -
Load reference documents when more detailed patterns are needed:
- : Directory structure and project organization
references/file-structure.md - : Demonstration/sharing structure and output specifications
references/presentation-patterns.md - : Token efficiency strategies for AI reading/writing notebooks
references/token-efficiency.md
审阅 / 对比一个已有 notebook(尽量只看结构与代码)
Review / compare an existing notebook (focus on structure and code as much as possible)
推荐工作流:
-
先看结构,不读输出bash
# Cell types and counts jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb # Code cells with outputs jq '[.cells[] | select(.cell_type == "code") | select(.outputs | length > 0)] | length' notebook.ipynb -
只对比代码 cellbash
# Extract code sources to compare jq '.cells[] | select(.cell_type == "code") | .source' notebook1.ipynb > /tmp/code1.json jq '.cells[] | select(.cell_type == "code") | .source' notebook2.ipynb > /tmp/code2.json diff /tmp/code1.json /tmp/code2.json -
确有必要再读取 notebook 正文
- 先明确要读哪一段、哪类 cell,再读
- 大 notebook 优先按 cell 范围/主题分段读取
- 细节见
references/token-efficiency.md
Recommended workflow:
-
Check structure first, don't read outputsbash
# Cell types and counts jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb # Code cells with outputs jq '[.cells[] | select(.cell_type == "code") | select(.outputs | length > 0)] | length' notebook.ipynb -
Compare only code cellsbash
# Extract code sources to compare jq '.cells[] | select(.cell_type == "code") | .source' notebook1.ipynb > /tmp/code1.json jq '.cells[] | select(.cell_type == "code") | .source' notebook2.ipynb > /tmp/code2.json diff /tmp/code1.json /tmp/code2.json -
Read notebook content only when necessary
- Clarify which section or cell type to read before accessing
- For large notebooks, prefer segmented reading by cell range/topic
- Details in
references/token-efficiency.md
整理一个 notebook 项目(抽逻辑、控输出、让它可复现)
Organize a notebook project (extract logic, control outputs, make it reproducible)
目录组织建议见 。这里给一个可执行的最小迁移步骤:
references/file-structure.md- 盘点根目录文件数量:
ls -1 | wc -l - 移动脚本到 ,文档到
scripts/,旧 notebook 到docs/.archive/ - 更新 notebook 中的 import:
from lib import module_name - 验证仍可正常运行
Directory organization suggestions are in . Here are minimal executable migration steps:
references/file-structure.md- Count root directory files:
ls -1 | wc -l - Move scripts to , documents to
scripts/, old notebooks todocs/.archive/ - Update imports in notebooks:
from lib import module_name - Verify normal operation is still possible
可复现环境(uv / venv)
Reproducible Environment (uv / venv)
为什么优先 uv?
Why prefer uv?
uv 适合做以下事情:
- 快速、可复现的依赖管理
- 在项目依赖环境中运行工具(如 ,
jupyter)nbconvert - 不污染全局 Python
- 跨平台一致性更好
uv is suitable for:
- Fast, reproducible dependency management
- Running tools in project dependency environments (e.g., ,
jupyter)nbconvert - No pollution to global Python
- Better cross-platform consistency
常用命令模式
Common command patterns
添加依赖:
bash
uv add plotly pandas duckdb安装工具(可选):
bash
uv tool install jupyterlab在项目环境中运行:
bash
uv run jupyter lab单文件脚本声明依赖(用于 ):
uv runpython
undefinedAdd dependencies:
bash
uv add plotly pandas duckdbInstall tools (optional):
bash
uv tool install jupyterlabRun in project environment:
bash
uv run jupyter labSingle-file script dependency declaration (for ):
uv runpython
undefined/// script
/// script
requires-python = ">=3.11"
requires-python = ">=3.11"
dependencies = [
dependencies = [
"pandas",
"pandas",
"plotly",
"plotly",
]
]
///
///
import pandas as pd
import plotly.express as px
import pandas as pd
import plotly.express as px
Script code here
Script code here
运行:`uv run script.py`
如果你不能使用 uv,也可以用 `python -m venv .venv` + `pip`,但要确保能一键复现(建议 `requirements.txt` 或 `pyproject.toml` + lockfile)。
Run: `uv run script.py`
If you can't use uv, you can also use `python -m venv .venv` + `pip`, but ensure one-click reproducibility (recommend `requirements.txt` or `pyproject.toml` + lockfile).Token 高效工作流(面向 AI 与版本控制)
Token Efficient Workflow (for AI and Version Control)
—
Default strategy: Clean outputs before committing
当通过 AI 助手读写 时:
.ipynbRecommended pre-commit:
yaml
undefined默认策略:提交前清理输出
.pre-commit-config.yaml
推荐 pre-commit:
yaml
undefinedrepos:
- repo: https://github.com/kynan/nbstripout
rev: 0.6.1
hooks:
- id: nbstripout
**When outputs must be retained (not recommended):**
```bash
SKIP=nbstripout git commit -m "Add notebook with visualization outputs"A more common practice is: save outputs to , keep notebooks in a state where "re-running can reproduce outputs" (see ).
reports/references/token-efficiency.md.pre-commit-config.yaml
Query before reading (structure first)
repos:
- repo: https://github.com/kynan/nbstripout
rev: 0.6.1
hooks:
- id: nbstripout
**确需保留输出时(不推荐):**
```bash
SKIP=nbstripout git commit -m "Add notebook with visualization outputs"更常见的做法是:输出落盘到 ,notebook 保持“可重新运行即可复现输出”(见 )。
reports/references/token-efficiency.mdCheck structure first:
bash
jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynbView only code:
bash
jq '.cells[] | select(.cell_type == "code") | .source' notebook.ipynb读之前先查询(结构优先)
Output should be "controllable, reproducible"
先看结构:
bash
jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb只看代码:
bash
jq '.cells[] | select(.cell_type == "code") | .source' notebook.ipynbPrefer output summaries, don't directly dump large objects:
python
print(f"[OK] Loaded {len(df_alarms):,} rows")
print(f"Columns: {', '.join(df_alarms.columns)}")
print(f"Date range: {df_alarms['timestamp'].min()} to {df_alarms['timestamp'].max()}")Save large outputs to files:
python
fig.write_html(report_dir / "visualization.html")
print(f"[OK] Saved visualization to {report_dir}/visualization.html")Complete strategies are in .
references/token-efficiency.md输出要“可控、可复现”
Demonstration / Sharing Mode
—
Recommended notebook structure
倾向输出摘要,不要直接 dump 大对象:
python
print(f"[OK] Loaded {len(df_alarms):,} rows")
print(f"Columns: {', '.join(df_alarms.columns)}")
print(f"Date range: {df_alarms['timestamp'].min()} to {df_alarms['timestamp'].max()}")大输出落盘到文件:
python
fig.write_html(report_dir / "visualization.html")
print(f"[OK] Saved visualization to {report_dir}/visualization.html")完整策略见 。
references/token-efficiency.md- Title & Overview - Background and objectives
- Preparation - Imports and configuration
- Data Loading - With feedback and error handling
- Summary - High-level statistics
- Visualization - With explanations and usage tips
- Conclusion - Key findings
演示 / 分享模式
More "professional" output habits
推荐的 notebook 结构
—
- 标题与概览 - 背景与目标
- 准备 - 导入与配置
- 数据加载 - 带反馈与错误处理
- 摘要 - 高层统计
- 可视化 - 带解释与使用提示
- 结论 - 关键发现
Unified status output:
python
print("[OK] Success")
print("[WARN] Warning")
print("[ERR] Error")
print("[INFO] Note")Number formatting:
python
print(f"Total: {count:,}") # 2,055 instead of 2055Save to reports by date:
python
from datetime import datetime
today = datetime.now().strftime('%Y-%m-%d')
report_dir = Path("reports") / today
report_dir.mkdir(parents=True, exist_ok=True)
fig.write_html(report_dir / "chart.html")
latest = Path("reports/latest")
if latest.exists():
latest.unlink()
latest.symlink_to(today, target_is_directory=True)Complete patterns and templates are in .
references/presentation-patterns.md更“专业”的输出习惯
Resource Index
—
references/file-structure.md
统一状态输出:
python
print("[OK] Success")
print("[WARN] Warning")
print("[ERR] Error")
print("[INFO] Note")数字格式化:
python
print(f"Total: {count:,}") # 2,055 instead of 2055按日期落盘到 reports:
python
from datetime import datetime
today = datetime.now().strftime('%Y-%m-%d')
report_dir = Path("reports") / today
report_dir.mkdir(parents=True, exist_ok=True)
fig.write_html(report_dir / "chart.html")
latest = Path("reports/latest")
if latest.exists():
latest.unlink()
latest.symlink_to(today, target_is_directory=True)完整模式与模板见 。
references/presentation-patterns.mdIncludes:
- Recommended directory structure
- File organization rules and naming conventions
- Git-friendly practices (ignore, diff, output cleaning)
- Migration steps for existing projects
- Example structures
Suitable for: Loading when creating new projects, refactoring directories, unifying conventions.
资源索引
references/token-efficiency.md
references/file-structure.md
—
包含:
- 推荐目录结构
- 文件组织规则与命名约定
- Git 友好(ignore、diff、清理输出)
- 现有项目迁移步骤
- 示例结构
适合在: 新建项目、重构目录、统一约定时加载。
Includes:
- Output cleaning and version control strategies
- Structured query methods without reading outputs
- Segmented reading and diff ideas for large notebooks
- Common / CLI patterns
jq - Cell output management
Suitable for: Loading when token saving is needed, reviewing large notebooks, or performing automated processing.
references/token-efficiency.md
references/presentation-patterns.md
包含:
- 输出清理与版本控制策略
- 不读输出的结构化查询方法
- 大 notebook 的分段读取与 diff 思路
- 常用 / CLI 模式
jq - cell 输出管理
适合在: 需要省 token、要审阅大 notebook、要做自动化处理时加载。
Includes:
- Structure templates for demonstration notebooks
- Readability and narrative rhythm
- Interactive elements and export strategies
- Error handling and reproducibility checkpoints
- Division of labor between Markdown / Code cells
- Notes on exporting to HTML/PDF
Suitable for: Loading before creating demos, team sharing, or publishing documentation.
references/presentation-patterns.md
Best Practices Cheat Sheet
包含:
- 演示型 notebook 的结构模板
- 可读性与叙事节奏
- 交互元素与可导出策略
- 错误处理与可复现检查点
- Markdown / Code cell 分工
- 导出 HTML/PDF 的注意事项
适合在: 做 demo、团队分享、发布文档前加载。
- Structure: Notebook as interface, logic sunk into /
scripts/lib/ - Dependencies: Prefer uv to ensure one-click reproducibility
- Version Control: Clean outputs by default (pre-commit/nbstripout/nbconvert)
- Token Saving: Query structure before reading; save large outputs to files
- Presentation: Clear narrative, restrained outputs, explicit error handling
- Reproducibility: Ensure "Restart & Run All" works
- Data Flow: raw → processed → reports
- Git-friendly: Ignore data and products, keep directory skeleton ()
.gitkeep
最佳实践速记
Example Workflow
- 结构:notebook 作为界面,逻辑下沉到 /
scripts/lib/ - 依赖:优先 uv,保证一键复现
- 版本控制:默认清理输出(pre-commit/nbstripout/nbconvert)
- 省 token:先查询结构再阅读;大输出落盘
- 展示:叙事清晰、输出克制、错误处理明确
- 可复现:确保 “Restart & Run All” 能跑通
- 数据流:raw → processed → reports
- Git 友好:忽略数据与产物,保留目录骨架()
.gitkeep
bash
undefined示例流程
1. Create project
bash
undefinedmkdir my-analysis && cd my-analysis
uv init
uv add jupyterlab pandas plotly
1. Create project
2. Set up structure
mkdir my-analysis && cd my-analysis
uv init
uv add jupyterlab pandas plotly
mkdir -p scripts lib data/{raw,processed} reports
touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep
2. Set up structure
3. Create notebook
mkdir -p scripts lib data/{raw,processed} reports
touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep
uv run jupyter lab
3. Create notebook
4. As you work:
—
- Keep logic in lib/ and scripts/
—
- Save outputs to reports/ with dates
—
- Keep outputs minimal
—
- Strip outputs before committing
—
5. Before presenting:
—
- Run "Restart & Run All" to test
—
- Add context and documentation
—
- Consider exporting to HTML
uv run jupyter lab
jupyter nbconvert --to html --execute notebook.ipynb
undefined4. As you work:
Cheat Sheet
- Keep logic in lib/ and scripts/
—
- Save outputs to reports/ with dates
—
- Keep outputs minimal
—
- Strip outputs before committing
—
5. Before presenting:
—
- Run "Restart & Run All" to test
—
- Add context and documentation
—
- Consider exporting to HTML
—
jupyter nbconvert --to html --execute notebook.ipynb
undefinedDirectory Organization:
- Notebooks: Project root (or split into by scale)
notebooks/ - Scripts:
scripts/ - Modules:
lib/ - Data: ,
data/raw/data/processed/ - Reports:
reports/YYYY-MM-DD/ - Archive:
.archive/
Common uv Commands:
- : Initialize project
uv init - : Add dependencies
uv add <package> - : Run command in project environment
uv run <command> - : Run temporary tool (not written to project dependencies)
uvx <tool>
Token Saving:
- Clean outputs: pre-commit hook, or
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace notebook.ipynb - Query structure:
jq '.cells | group_by(.cell_type)' - Compare code:
jq '.cells[] | select(.cell_type == "code") | .source'
Presentation:
- Number formatting:
{count:,} - Save to files by date:
reports/YYYY-MM-DD/ - Execution verification:
jupyter nbconvert --execute
速查表
—
目录组织:
- Notebook:项目根目录(或按规模拆到 )
notebooks/ - 脚本:
scripts/ - 模块:
lib/ - 数据:,
data/raw/data/processed/ - 报告:
reports/YYYY-MM-DD/ - 归档:
.archive/
uv 常用命令:
- :初始化项目
uv init - :添加依赖
uv add <package> - :在项目环境中运行命令
uv run <command> - :运行临时工具(不写入项目依赖)
uvx <tool>
省 token:
- 清理输出:pre-commit hook,或
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace notebook.ipynb - 查询结构:
jq '.cells | group_by(.cell_type)' - 对比代码:
jq '.cells[] | select(.cell_type == "code") | .source'
展示:
- 数字格式化:
{count:,} - 按日期落盘:
reports/YYYY-MM-DD/ - 执行验证:
jupyter nbconvert --execute
—