tooluniverse-image-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMicroscopy Image Analysis and Quantitative Imaging Data
显微镜图像分析与定量成像数据处理
Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image. Designed for BixBench imaging questions covering colony morphometry, cell counting, fluorescence quantification, regression modeling, and statistical comparisons.
IMPORTANT: This skill handles complex multi-workflow analysis. Most implementation details have been moved to for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.
references/这是一款可用于生产环境的技能,使用pandas、numpy、scipy、statsmodels和scikit-image分析显微镜衍生的测量数据。专为BixBench成像相关问题设计,涵盖菌落形态分析、细胞计数、荧光定量、回归建模和统计对比。
重要提示:本技能可处理复杂的多工作流分析。大多数实现细节已移至目录,以便逐步展示。本文档重点介绍高层决策与工作流编排。
references/When to Use This Skill
适用场景
Apply when users:
- Have microscopy measurement data (area, circularity, intensity, cell counts) in CSV/TSV
- Ask about colony morphometry (bacterial swarming, biofilm, growth assays)
- Need statistical comparisons of imaging measurements (t-test, ANOVA, Dunnett's, Mann-Whitney)
- Ask about cell counting statistics (NeuN, DAPI, marker counts)
- Need effect size calculations (Cohen's d) and power analysis
- Want regression models (polynomial, spline) fitted to dose-response or ratio data
- Ask about model comparison (R-squared, F-statistic, AIC/BIC)
- Need Shapiro-Wilk normality testing on imaging data
- Want confidence intervals for peak predictions from fitted models
- Questions mention imaging software output (ImageJ, CellProfiler, QuPath)
- Need fluorescence intensity quantification or colocalization analysis
- Ask about image segmentation results (counts, areas, shapes)
BixBench Coverage: 21 questions across 4 projects (bix-18, bix-19, bix-41, bix-54)
NOT for (use other skills instead):
- Phylogenetic analysis → Use
tooluniverse-phylogenetics - RNA-seq differential expression → Use
tooluniverse-rnaseq-deseq2 - Single-cell scRNA-seq → Use
tooluniverse-single-cell - Statistical regression only (no imaging context) → Use
tooluniverse-statistical-modeling
当用户有以下需求时适用:
- 拥有CSV/TSV格式的显微镜测量数据(面积、圆度、强度、细胞计数)
- 询问菌落形态分析相关问题(细菌扩散、生物膜、生长实验)
- 需要对成像测量数据进行统计对比(t检验、ANOVA、Dunnett检验、Mann-Whitney检验)
- 询问细胞计数统计相关问题(NeuN、DAPI、标记物计数)
- 需要计算效应量(Cohen's d)并进行功效分析
- 希望为剂量反应或比例数据拟合回归模型(多项式、样条)
- 询问模型对比相关问题(R平方、F统计量、AIC/BIC)
- 需要对成像数据进行Shapiro-Wilk正态性检验
- 希望获取拟合模型峰值预测的置信区间
- 问题中提及成像软件输出(ImageJ、CellProfiler、QuPath)
- 需要荧光强度定量或共定位分析
- 询问图像分割结果(计数、面积、形状)
BixBench覆盖范围:4个项目(bix-18、bix-19、bix-41、bix-54)中的21个问题
不适用场景(请使用其他技能):
- 系统发育分析 → 使用
tooluniverse-phylogenetics - RNA-seq差异表达分析 → 使用
tooluniverse-rnaseq-deseq2 - 单细胞scRNA-seq分析 → 使用
tooluniverse-single-cell - 仅需统计回归(无成像场景) → 使用
tooluniverse-statistical-modeling
Core Principles
核心原则
- Data-first approach - Load and inspect all CSV/TSV measurement data before analysis
- Question-driven - Parse the exact statistic, comparison, or model requested
- Statistical rigor - Proper effect sizes, multiple comparison corrections, model selection
- Imaging-aware - Understand ImageJ/CellProfiler measurement columns (Area, Circularity, Round, Intensity)
- Workflow flexibility - Support both pre-quantified data (CSV) and raw image processing
- Precision - Match expected answer format (integer, range, decimal places)
- Reproducible - Use standard Python/scipy equivalents to R functions
- 数据优先方法 - 在分析前加载并检查所有CSV/TSV测量数据
- 问题驱动 - 解析用户要求的具体统计量、对比或模型
- 统计严谨性 - 合理计算效应量、多重比较校正、模型选择
- 成像感知 - 理解ImageJ/CellProfiler的测量列(Area、Circularity、Round、Intensity)
- 工作流灵活性 - 支持预量化数据(CSV)和原始图像处理
- 精度 - 匹配预期答案格式(整数、范围、小数位数)
- 可复现性 - 使用与R函数等效的标准Python/scipy实现
Required Python Packages
所需Python包
python
undefinedpython
undefinedCore (MUST be installed)
核心包(必须安装)
import pandas as pd
import numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr
import pandas as pd
import numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr
Optional (for raw image processing)
可选包(用于原始图像处理)
import skimage
import cv2
import tifffile
**Installation**:
```bash
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffileimport skimage
import cv2
import tifffile
**安装命令**:
```bash
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffileHigh-Level Workflow Decision Tree
高层工作流决策树
START: User question about microscopy data
│
├─ Q1: What type of data is available?
│ │
│ ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│ │ └─ Workflow: Load → Parse question → Statistical analysis
│ │ Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│ │ See: Section "Quantitative Data Analysis" below
│ │
│ └─ RAW IMAGES (TIFF, PNG, multi-channel)
│ └─ Workflow: Load → Segment → Measure → Analyze
│ See: references/image_processing.md
│
├─ Q2: What type of analysis is needed?
│ │
│ ├─ STATISTICAL COMPARISON
│ │ ├─ Two groups → t-test or Mann-Whitney
│ │ ├─ Multiple groups → ANOVA or Dunnett's test
│ │ ├─ Two factors → Two-way ANOVA
│ │ └─ Effect size → Cohen's d, power analysis
│ │ See: references/statistical_analysis.md
│ │
│ ├─ REGRESSION MODELING
│ │ ├─ Dose-response → Polynomial (quadratic, cubic)
│ │ ├─ Ratio optimization → Natural spline
│ │ └─ Model comparison → R-squared, F-statistic, AIC/BIC
│ │ See: references/statistical_analysis.md
│ │
│ ├─ CELL COUNTING
│ │ ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│ │ ├─ Brightfield → Adaptive threshold
│ │ └─ High-density → CellPose or StarDist (external)
│ │ See: references/cell_counting.md
│ │
│ ├─ COLONY SEGMENTATION
│ │ ├─ Swarming assays → Otsu threshold + morphology
│ │ ├─ Biofilms → Li threshold + fill holes
│ │ └─ Growth assays → Time-lapse tracking
│ │ See: references/segmentation.md
│ │
│ └─ FLUORESCENCE QUANTIFICATION
│ ├─ Intensity measurement → regionprops
│ ├─ Colocalization → Pearson/Manders
│ └─ Multi-channel → Channel-wise quantification
│ See: references/fluorescence_analysis.md
│
└─ Q3: When to use scikit-image vs OpenCV?
├─ scikit-image: Scientific analysis, measurements, regionprops
├─ OpenCV: Fast processing, real-time, large batches
└─ Both: Often interchangeable for basic operations
See: references/image_processing.md "Library Selection Guide"开始:用户询问显微镜数据相关问题
│
├─ 问题1:可用数据类型是什么?
│ │
│ ├─ 预量化数据(含测量值的CSV/TSV)
│ │ └─ 工作流:加载 → 解析问题 → 统计分析
│ │ 模式:最常见的BixBench模式(bix-18、bix-19、bix-41、bix-54)
│ │ 参考:下文“定量数据分析”章节
│ │
│ └─ 原始图像(TIFF、PNG、多通道)
│ └─ 工作流:加载 → 分割 → 测量 → 分析
│ 参考:references/image_processing.md
│
├─ 问题2:需要何种类型的分析?
│ │
│ ├─ 统计对比
│ │ ├─ 两组对比 → t检验或Mann-Whitney检验
│ │ ├─ 多组对比 → ANOVA或Dunnett检验
│ │ ├─ 双因素 → 双因素ANOVA
│ │ └─ 效应量 → Cohen's d、功效分析
│ │ 参考:references/statistical_analysis.md
│ │
│ ├─ 回归建模
│ │ ├─ 剂量反应 → 多项式(二次、三次)
│ │ ├─ 比例优化 → 自然样条
│ │ └─ 模型对比 → R平方、F统计量、AIC/BIC
│ │ 参考:references/statistical_analysis.md
│ │
│ ├─ 细胞计数
│ │ ├─ 荧光(DAPI、NeuN) → 阈值 + 分水岭算法
│ │ ├─ 明场 → 自适应阈值
│ │ └─ 高密度 → CellPose或StarDist(外部工具)
│ │ 参考:references/cell_counting.md
│ │
│ ├─ 菌落分割
│ │ ├─ 扩散实验 → Otsu阈值 + 形态学操作
│ │ ├─ 生物膜 → Li阈值 + 孔洞填充
│ │ └─ 生长实验 → 延时追踪
│ │ 参考:references/segmentation.md
│ │
│ └─ 荧光定量
│ ├─ 强度测量 → regionprops
│ ├─ 共定位 → Pearson/Manders系数
│ └─ 多通道 → 分通道定量
│ 参考:references/fluorescence_analysis.md
│
└─ 问题3:何时使用scikit-image vs OpenCV?
├─ scikit-image:科学分析、测量、regionprops
├─ OpenCV:快速处理、实时分析、大批次数据
└─ 两者均可:基础操作通常可互换
参考:references/image_processing.md中的“库选择指南”Quantitative Data Analysis Workflow
定量数据分析工作流
Phase 0: Question Parsing and Data Discovery
阶段0:问题解析与数据发现
CRITICAL FIRST STEP: Before writing ANY code, identify what data files are available and what the question is asking for.
python
import os, glob, pandas as pd关键第一步:在编写任何代码之前,先确定可用的数据文件以及用户问题的具体需求。
python
import os, glob, pandas as pdDiscover data files
发现数据文件
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '', '.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '.tif*'), recursive=True)
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '', '.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '.tif*'), recursive=True)
Load and inspect first measurement file
加载并检查第一个测量文件
if csv_files:
df = pd.read_csv(csv_files[0])
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(df.head())
print(df.describe())
**Common Column Names**:
- Area: Colony or cell area in pixels or calibrated units
- Circularity: 4*pi*area/perimeter^2, range [0,1], 1.0 = perfect circle
- Round: Roundness = 4*area/(pi*major_axis^2)
- Genotype/Strain: Biological grouping variable
- Ratio: Co-culture mixing ratio (e.g., "1:3", "5:1")
- NeuN/DAPI/GFP: Cell marker counts or intensitiesif csv_files:
df = pd.read_csv(csv_files[0])
print(f"数据形状: {df.shape}")
print(f"列名: {list(df.columns)}")
print(df.head())
print(df.describe())
**常见列名**:
- Area:菌落或细胞的面积(像素或校准单位)
- Circularity:4*pi*面积/周长²,范围[0,1],1.0表示完美圆形
- Round:圆度 = 4*面积/(pi*长轴²)
- Genotype/Strain:生物分组变量
- Ratio:共培养混合比例(如"1:3"、"5:1")
- NeuN/DAPI/GFP:细胞标记物计数或强度Phase 1: Grouped Statistics
阶段1:分组统计
python
def grouped_summary(df, group_cols, measure_col):
"""Calculate summary statistics by group."""
summary = df.groupby(group_cols)[measure_col].agg(
Mean='mean',
SD='std',
Median='median',
Min='min',
Max='max',
N='count'
).reset_index()
summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
return summarypython
def grouped_summary(df, group_cols, measure_col):
"""按分组计算汇总统计量。"""
summary = df.groupby(group_cols)[measure_col].agg(
Mean='mean',
SD='std',
Median='median',
Min='min',
Max='max',
N='count'
).reset_index()
summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
return summaryExample: Colony morphometry by genotype
示例:按基因型统计菌落形态
area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')
For detailed statistical functions, see: **references/statistical_analysis.md**area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')
详细统计函数请参考:**references/statistical_analysis.md**Phase 2: Statistical Testing
阶段2:统计检验
Decision guide:
- Normality test needed? → Shapiro-Wilk
- Two groups comparison? → t-test or Mann-Whitney
- Multiple groups vs control? → Dunnett's test
- Multiple groups, all comparisons? → Tukey HSD
- Two factors? → Two-way ANOVA
- Effect size? → Cohen's d
- Sample size planning? → Power analysis
See: references/statistical_analysis.md for complete implementations
决策指南:
- 是否需要正态性检验? → Shapiro-Wilk检验
- 两组对比? → t检验或Mann-Whitney检验
- 多组与对照组对比? → Dunnett检验
- 多组间全对比? → Tukey HSD检验
- 双因素? → 双因素ANOVA
- 效应量? → Cohen's d
- 样本量规划? → 功效分析
完整实现请参考:references/statistical_analysis.md
Phase 3: Regression Modeling
阶段3:回归建模
When to use each model:
- Polynomial (quadratic/cubic): Smooth dose-response, clear peak
- Natural spline: Flexible, non-parametric, handles complex patterns
- Linear: Simple relationships, checking for trends
Model comparison metrics:
- R-squared: Overall fit (higher = better)
- Adjusted R-squared: Penalizes complexity
- F-statistic p-value: Model significance
- AIC/BIC: Compare non-nested models
See: references/statistical_analysis.md for complete implementations
各模型适用场景:
- 多项式(二次/三次):平滑剂量反应曲线、明确峰值
- 自然样条:灵活、非参数、处理复杂模式
- 线性:简单关系、趋势检验
模型对比指标:
- R平方:整体拟合度(值越高越好)
- 调整后R平方:对复杂度进行惩罚
- F统计量p值:模型显著性
- AIC/BIC:对比非嵌套模型
完整实现请参考:references/statistical_analysis.md
Raw Image Processing Workflow
原始图像处理工作流
When Processing Raw Images
处理原始图像时
Workflow: Load → Preprocess → Segment → Measure → Export
python
undefined工作流:加载 → 预处理 → 分割 → 测量 → 导出
python
undefinedQuick start for cell counting
细胞计数快速入门
from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image(
image_path="cells.tif",
channel=0, # DAPI channel
min_area=50
)
print(f"Found {result['count']} cells")
undefinedfrom scripts.segment_cells import count_cells_in_image
result = count_cells_in_image(
image_path="cells.tif",
channel=0, # DAPI通道
min_area=50
)
print(f"检测到 {result['count']} 个细胞")
undefinedSegmentation Method Selection
分割方法选择
Decision guide:
| Cell Type | Density | Best Method | Notes |
|---|---|---|---|
| Nuclei (DAPI) | Low-Medium | Otsu + watershed | Standard approach |
| Nuclei (DAPI) | High | CellPose/StarDist | Handles touching |
| Colonies | Well-separated | Otsu threshold | Fast, reliable |
| Colonies | Touching | Watershed | Edge detection |
| Cells (phase) | Any | Adaptive threshold | Handles uneven illumination |
| Fluorescence | Low signal | Li threshold | More sensitive |
See: references/segmentation.md and references/cell_counting.md for detailed protocols
决策指南:
| 细胞类型 | 密度 | 最佳方法 | 说明 |
|---|---|---|---|
| 细胞核(DAPI) | 中低密度 | Otsu阈值 + 分水岭算法 | 标准方法 |
| 细胞核(DAPI) | 高密度 | CellPose/StarDist | 处理重叠细胞 |
| 菌落 | 分离良好 | Otsu阈值 | 快速可靠 |
| 菌落 | 相互接触 | 分水岭算法 | 边缘检测 |
| 细胞(相差) | 任意密度 | 自适应阈值 | 处理不均匀光照 |
| 荧光 | 低信号 | Li阈值 | 灵敏度更高 |
详细方案请参考:references/segmentation.md和references/cell_counting.md
Library Selection: scikit-image vs OpenCV
库选择:scikit-image vs OpenCV
Use scikit-image when:
- Scientific measurements needed (area, perimeter, intensity)
- regionprops for object properties
- Publication-quality analysis
- Easier syntax for scientists
Use OpenCV when:
- Processing large image batches
- Speed is critical
- Real-time processing
- Advanced computer vision features
Both work for:
- Thresholding, filtering, morphological operations
- Basic image transformations
- Most segmentation tasks
See: references/image_processing.md "Library Selection Guide"
优先使用scikit-image的场景:
- 需要科学测量(面积、周长、强度)
- 使用regionprops获取物体属性
- 用于发表级别的分析
- 对科学家而言语法更简洁
优先使用OpenCV的场景:
- 处理大批次图像
- 对速度要求高
- 实时处理
- 需要高级计算机视觉功能
两者均可使用的场景:
- 阈值化、滤波、形态学操作
- 基础图像变换
- 大多数分割任务
详细内容请参考:references/image_processing.md中的“库选择指南”
Common BixBench Patterns
常见BixBench模式
Pattern 1: Colony Morphometry (bix-18)
模式1:菌落形态分析(bix-18)
Question type: "Mean circularity of genotype with largest area?"
Data: CSV with Genotype, Area, Circularity columns
Workflow:
- Load CSV → group by Genotype
- Calculate mean Area per genotype
- Identify genotype with max mean Area
- Report mean Circularity for that genotype
See: references/segmentation.md "Colony Morphometry Analysis"
问题类型:“面积最大的基因型的平均圆度是多少?”
数据:包含Genotype、Area、Circularity列的CSV
工作流:
- 加载CSV → 按Genotype分组
- 计算每个基因型的平均Area
- 找出平均Area最大的基因型
- 报告该基因型的平均Circularity
参考:references/segmentation.md中的“菌落形态分析”
Pattern 2: Cell Counting Statistics (bix-19)
模式2:细胞计数统计(bix-19)
Question type: "Cohen's d for NeuN counts between conditions?"
Data: CSV with Condition, NeuN_count, Sex, Hemisphere columns
Workflow:
- Load CSV → filter by hemisphere/sex if needed
- Split by Condition (KD vs CTRL)
- Calculate Cohen's d with pooled SD
- Report effect size
See: references/statistical_analysis.md "Effect Size Calculations"
问题类型:“不同条件下NeuN计数的Cohen's d是多少?”
数据:包含Condition、NeuN_count、Sex、Hemisphere列的CSV
工作流:
- 加载CSV → 按需按半球/性别过滤
- 按Condition分组(KD vs CTRL)
- 使用合并标准差计算Cohen's d
- 报告效应量
参考:references/statistical_analysis.md中的“效应量计算”
Pattern 3: Multi-Group Comparison (bix-41)
模式3:多组对比(bix-41)
Question type: "Dunnett's test: How many ratios equivalent to control?"
Data: CSV with multiple co-culture ratios, Area, Circularity
Workflow:
- Create Strain_Ratio labels
- Run Dunnett's test for Area (vs control)
- Run Dunnett's test for Circularity (vs control)
- Count groups NOT significant in BOTH tests
See: references/statistical_analysis.md "Dunnett's Test"
问题类型:“Dunnett检验:有多少比例与对照组无显著差异?”
数据:包含多种共培养比例、Area、Circularity的CSV
工作流:
- 创建Strain_Ratio标签
- 对Area执行Dunnett检验(与对照组对比)
- 对Circularity执行Dunnett检验(与对照组对比)
- 统计在两项检验中均无显著差异的组
参考:references/statistical_analysis.md中的“Dunnett检验”
Pattern 4: Regression Optimization (bix-54)
模式4:回归优化(bix-54)
Question type: "Peak frequency from natural spline model?"
Data: CSV with co-culture frequencies and Area measurements
Workflow:
- Convert ratio strings to frequencies
- Fit natural spline model (df=4)
- Find peak via grid search
- Report peak frequency + confidence interval
See: references/statistical_analysis.md "Regression Modeling"
问题类型:“自然样条模型的峰值频率是多少?”
数据:包含共培养频率和Area测量值的CSV
工作流:
- 将比例字符串转换为频率
- 拟合自然样条模型(df=4)
- 通过网格搜索找到峰值
- 报告峰值频率及置信区间
参考:references/statistical_analysis.md中的“回归建模”
Quick Reference Table
快速参考表
| Task | Primary Tool | Reference |
|---|---|---|
| Load measurement CSV | pandas.read_csv() | This file |
| Group statistics | df.groupby().agg() | This file |
| T-test | scipy.stats.ttest_ind() | statistical_analysis.md |
| ANOVA | statsmodels.ols + anova_lm() | statistical_analysis.md |
| Dunnett's test | scipy.stats.dunnett() | statistical_analysis.md |
| Cohen's d | Custom function (pooled SD) | statistical_analysis.md |
| Power analysis | statsmodels TTestIndPower | statistical_analysis.md |
| Polynomial regression | statsmodels.OLS + poly features | statistical_analysis.md |
| Natural spline | patsy.cr() + statsmodels.OLS | statistical_analysis.md |
| Cell segmentation | skimage.filters + watershed | cell_counting.md |
| Colony segmentation | skimage.filters.threshold_otsu | segmentation.md |
| Fluorescence quantification | skimage.measure.regionprops | fluorescence_analysis.md |
| Colocalization | Pearson/Manders | fluorescence_analysis.md |
| Image loading | tifffile, skimage.io | image_processing.md |
| Batch processing | scripts/batch_process.py | scripts/ |
| 任务 | 主要工具 | 参考文档 |
|---|---|---|
| 加载测量CSV | pandas.read_csv() | 本文档 |
| 分组统计 | df.groupby().agg() | 本文档 |
| t检验 | scipy.stats.ttest_ind() | statistical_analysis.md |
| ANOVA | statsmodels.ols + anova_lm() | statistical_analysis.md |
| Dunnett检验 | scipy.stats.dunnett() | statistical_analysis.md |
| Cohen's d | 自定义函数(合并标准差) | statistical_analysis.md |
| 功效分析 | statsmodels TTestIndPower | statistical_analysis.md |
| 多项式回归 | statsmodels.OLS + 多项式特征 | statistical_analysis.md |
| 自然样条 | patsy.cr() + statsmodels.OLS | statistical_analysis.md |
| 细胞分割 | skimage.filters + 分水岭算法 | cell_counting.md |
| 菌落分割 | skimage.filters.threshold_otsu | segmentation.md |
| 荧光定量 | skimage.measure.regionprops | fluorescence_analysis.md |
| 共定位 | Pearson/Manders系数 | fluorescence_analysis.md |
| 图像加载 | tifffile, skimage.io | image_processing.md |
| 批量处理 | scripts/batch_process.py | scripts/ |
Example Scripts
示例脚本
Ready-to-use scripts in directory:
scripts/- segment_cells.py - Cell/nuclei counting with watershed
- measure_fluorescence.py - Multi-channel intensity quantification
- batch_process.py - Process folders of images
- colony_morphometry.py - Measure colony area/circularity
- statistical_comparison.py - Group comparison statistics
Usage:
bash
undefinedscripts/- segment_cells.py - 使用分水岭算法计数细胞/细胞核
- measure_fluorescence.py - 多通道强度定量
- batch_process.py - 处理文件夹中的图像
- colony_morphometry.py - 测量菌落面积/圆度
- statistical_comparison.py - 分组对比统计
使用方法:
bash
undefinedCount cells in image
计数图像中的细胞
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50
Batch process folder
批量处理文件夹
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count
---python scripts/batch_process.py input_folder/ output.csv --analysis cell_count
---Detailed Reference Guides
详细参考指南
For complete implementations and protocols:
- references/statistical_analysis.md - All statistical tests, regression models
- references/cell_counting.md - Cell/nuclei counting protocols
- references/segmentation.md - Colony and object segmentation
- references/fluorescence_analysis.md - Intensity quantification, colocalization
- references/image_processing.md - Image loading, preprocessing, library selection
- references/troubleshooting.md - Common issues and solutions
完整实现与方案请参考:
- references/statistical_analysis.md - 所有统计检验、回归模型
- references/cell_counting.md - 细胞/细胞核计数方案
- references/segmentation.md - 菌落与物体分割
- references/fluorescence_analysis.md - 强度定量、共定位
- references/image_processing.md - 图像加载、预处理、库选择
- references/troubleshooting.md - 常见问题与解决方案
Important Notes
重要说明
Matching R Statistical Functions
与R统计函数匹配
Some BixBench questions use R for analysis. Python equivalents:
- R's Dunnett test () →
multcomp::glht(scipy ≥ 1.10)scipy.stats.dunnett() - R's natural spline () →
ns(x, df=4)with explicit quantile knotspatsy.cr(x, knots=...) - R's t-test () →
t.test()scipy.stats.ttest_ind() - R's ANOVA () →
aov()+statsmodels.formula.api.ols()sm.stats.anova_lm()
See: references/statistical_analysis.md for exact parameter matching
部分BixBench问题使用R进行分析,对应的Python等效实现:
- R的Dunnett检验 () →
multcomp::glht(scipy ≥ 1.10)scipy.stats.dunnett() - R的自然样条 () →
ns(x, df=4)(使用显式分位数节点)patsy.cr(x, knots=...) - R的t检验 () →
t.test()scipy.stats.ttest_ind() - R的ANOVA () →
aov()+statsmodels.formula.api.ols()sm.stats.anova_lm()
参数匹配细节请参考:references/statistical_analysis.md
Answer Formatting
答案格式
BixBench expects specific formats:
- "to the nearest thousand":
int(round(val, -3)) - Percentages: Usually integer or 1-2 decimal places
- Cohen's d: 3 decimal places
- Sample sizes: Always integer (ceiling)
- Ratios: String format "5:1"
BixBench要求特定格式:
- “四舍五入到千位”:
int(round(val, -3)) - 百分比:通常为整数或1-2位小数
- Cohen's d:3位小数
- 样本量:始终为整数(向上取整)
- 比例:字符串格式“5:1”
Completeness Checklist
完整性检查清单
Before returning your answer, verify:
- Loaded all data files and inspected column names
- Identified the specific statistic or model requested
- Used correct grouping variables and filter conditions
- Applied correct rounding or format
- For "how many" questions: counted correctly based on criteria
- For statistical tests: used appropriate multiple comparison correction
- For regression: properly prepared and transformed data
- Double-checked direction of comparisons
- Verified answer falls within expected range
返回答案前,请验证:
- 已加载所有数据文件并检查列名
- 已明确用户要求的具体统计量或模型
- 使用了正确的分组变量与过滤条件
- 应用了正确的舍入或格式
- 对于“数量多少”类问题:根据标准正确计数
- 对于统计检验:使用了合适的多重比较校正
- 对于回归:已正确准备与转换数据
- 已仔细核对对比方向
- 已验证答案在预期范围内
Getting Help
获取帮助
- Start with decision tree at top of this file
- Check relevant reference guide for detailed protocol
- Use example scripts as templates
- See troubleshooting guide for common issues
- All statistical implementations in statistical_analysis.md
- 从本文档顶部的决策树开始
- 查看相关参考指南获取详细方案
- 使用示例脚本作为模板
- 查看故障排除指南解决常见问题
- 所有统计实现均在statistical_analysis.md中