analysis-artifacts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAnalysis Artifacts
分析工件
When to Use
使用场景
- When asked to do a "deep dive" or "analysis" on a question with a non-obvious answer
- When the analysis requires exploratory querying in BigQuery
- When the output should be reproducible and shareable (not just a one-off answer)
- 当被要求针对非显而易见的问题进行「深度挖掘」或「分析」时
- 当分析需要在BigQuery中进行探索性查询时
- 当输出结果需要可复现且可共享时(而非一次性答案)
Workflow
工作流程
1. Scaffold the analysis directory
1. 搭建分析目录结构
At the start of every analysis:
- Create a new directory in the folder, named according to the existing pattern there
analyses - Create subdirectories: and
/assets/queries/assets/visualizations - Create a at the root of the new directory — this is the main readable document for the analysis
README.md
在每次分析开始时:
- 在文件夹中创建一个新目录,命名遵循已有目录的格式
analyses - 创建子目录:和
/assets/queries/assets/visualizations - 在新目录的根目录下创建文件——这是分析的主要可读文档
README.md
2. Plan the analysis
2. 制定分析计划
Always create a plan before starting, whether or not the user asked for one. Steps in the plan should map to the logical sub-questions or sub-areas you've deemed important to explore. Present the plan and wait for a go-ahead before proceeding.
无论用户是否要求,开始分析前都必须制定计划。计划中的步骤应对应你认为需要探索的关键子问题或子领域。先向用户展示计划,等待确认后再继续执行。
3. Set up the README
3. 配置README文件
Once the plan is approved:
-
Add a title, author, and date to the top of the README
-
Add a Problem Statement section summarizing the analysis question and the sub-pieces you'll explore
-
Add a Cohorts Definition section. This must be extremely explicit about the groups being compared. If comparing two groups (e.g., free vs. paid, new vs. old, before vs. after a milestone), define cohorts in a way that controls for confounding factors. Consider:
- Signup/activation time (as defined by your product — e.g., first login, first meaningful action); this relates to user tenure
- Plan type or subscription tier (e.g., free vs. paid)
- Controlling for observation time window length across cohorts
- Product-specific usage propensity metrics relevant to the analysis question
Once defined, respect these cohort definitions in all queries throughout the analysis.
计划获批后:
-
在README顶部添加标题、作者和日期
-
添加「问题陈述」部分,总结分析问题及你将探索的子模块
-
添加「群组定义」部分。此部分必须极其明确地说明被对比的群组。如果对比两个群组(例如免费版 vs 付费版、新用户 vs 老用户、里程碑前后),需以控制混杂因素的方式定义群组。需考虑:
- 注册/激活时间(由产品定义——例如首次登录、首次有意义操作);这与用户使用时长相关
- 套餐类型或订阅层级(例如免费版 vs 付费版)
- 控制不同群组的观察时间窗口长度
- 与分析问题相关的产品特定使用倾向指标
定义完成后,在分析的所有查询中都要严格遵循这些群组定义。
4. Create artifacts as you go
4. 逐步生成工件
For every material step in the analysis:
- SQL query artifact: For any BigQuery query that powers a visualization, summary, or key insight, save a file in
.sqlwith a descriptive name and a comment block explaining the query's purpose. Only create the file after you're satisfied with the results. Skip trivial or one-off lookup queries./assets/queries/ - Visualization or table artifact: For each key insight, assess whether it's best conveyed through a chart or a table. Lean toward visualizations. If a visualization, write a Python script to generate it and save both the script and the output image to with descriptive names. If a table, save it as a
/assets/visualizations/in.csv./assets/visualizations/
对于分析中的每个关键步骤:
- SQL查询工件:任何用于生成可视化图表、汇总表或关键洞察的BigQuery查询,都需保存为文件至
.sql目录,文件名需具有描述性,并添加注释块说明查询目的。仅在你对结果满意后再创建文件。可忽略琐碎或一次性的查询。/assets/queries/ - 可视化或表格工件:针对每个关键洞察,评估其更适合用图表还是表格呈现。优先选择可视化。如果是可视化,编写Python脚本生成图表,并将脚本和输出图像保存至目录,文件名需具有描述性。如果是表格,保存为
/assets/visualizations/文件至.csv目录。/assets/visualizations/
5. Overwriting artifacts
5. 覆盖工件
If you need to redo part of the analysis (due to a methodology correction or user feedback), overwrite all associated artifacts:
- Replace the query file
.sql - Replace the visualization script and regenerate the image
- Replace the table file
.csv
Note the change to the user when you do this.
如果需要重新进行部分分析(因方法修正或用户反馈),需覆盖所有相关工件:
- 替换查询文件
.sql - 替换可视化脚本并重新生成图像
- 替换表格文件
.csv
执行此操作时需向用户说明变更内容。
6. Summarize the analysis
6. 总结分析内容
When the analysis is complete (either at the end of the plan or when the user asks), write the full README:
- Summarize each step and sub-question in logical document sections
- Be crisp and concise — avoid unnecessary verbosity
- Embed saved viz images from where appropriate
/assets/visualizations/ - Generate markdown tables from files in
.csv/assets/visualizations/ - Include a small reference hyperlink to the associated query file in each section
- Add a TL;DR section near the top (after Problem Statement, before Cohorts Definition)
- Add a Key Takeaways section at the end
分析完成后(按计划完成或用户要求结束时),完善完整的README文件:
- 将每个步骤和子问题总结为逻辑清晰的文档章节
- 内容简洁明了——避免不必要的冗长
- 在合适的位置嵌入目录中保存的可视化图像
/assets/visualizations/ - 从目录的
/assets/visualizations/文件生成Markdown表格.csv - 在每个章节中添加指向相关查询文件的小型参考超链接
- 在顶部(问题陈述之后,群组定义之前)添加「TL;DR(摘要)」部分
- 在末尾添加「关键结论」部分
Examples
示例
bash
analyses/
└── 2024-01-user-retention/
├── README.md
└── assets/
├── queries/
│ ├── cohort_retention_by_week.sql
│ └── retention_by_plan_type.sql
└── visualizations/
├── retention_curve.py
├── retention_curve.png
└── plan_type_summary.csvbash
analyses/
└── 2024-01-user-retention/
├── README.md
└── assets/
├── queries/
│ ├── cohort_retention_by_week.sql
│ └── retention_by_plan_type.sql
└── visualizations/
├── retention_curve.py
├── retention_curve.png
└── plan_type_summary.csv