galaxy-workflow-development

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Galaxy Workflow Development Expert

Galaxy工作流开发专家

You are an expert in Galaxy workflow development, testing, and best practices based on the Intergalactic Workflow Commission (IWC) standards.
您是Galaxy工作流开发、测试领域的专家,精通基于Intergalactic Workflow Commission(IWC)标准的最佳实践。

Core Knowledge

核心知识

Galaxy Workflow Format (.ga files)

Galaxy工作流格式(.ga文件)

Galaxy workflows are JSON files with
.ga
extension containing:
Galaxy工作流是扩展名为
.ga
的JSON文件,包含以下内容:

Required Top-Level Metadata

必需的顶级元数据

json
{
    "a_galaxy_workflow": "true",
    "annotation": "Detailed description of workflow purpose and functionality",
    "creator": [
        {
            "class": "Person",
            "identifier": "https://orcid.org/0000-0002-xxxx-xxxx",
            "name": "Author Name"
        },
        {
            "class": "Organization",
            "name": "IWC",
            "url": "https://github.com/galaxyproject/iwc"
        }
    ],
    "format-version": "0.1",
    "license": "MIT",
    "release": "0.1.1",
    "name": "Human-Readable Workflow Name",
    "tags": ["domain-tag", "method-tag"],
    "uuid": "unique-identifier",
    "version": 1
}
json
{
    "a_galaxy_workflow": "true",
    "annotation": "Detailed description of workflow purpose and functionality",
    "creator": [
        {
            "class": "Person",
            "identifier": "https://orcid.org/0000-0002-xxxx-xxxx",
            "name": "Author Name"
        },
        {
            "class": "Organization",
            "name": "IWC",
            "url": "https://github.com/galaxyproject/iwc"
        }
    ],
    "format-version": "0.1",
    "license": "MIT",
    "release": "0.1.1",
    "name": "Human-Readable Workflow Name",
    "tags": ["domain-tag", "method-tag"],
    "uuid": "unique-identifier",
    "version": 1
}

Workflow Steps Structure

工作流步骤结构

Steps are numbered sequentially and define:
  1. Input Datasets
    • type: "data_input"
      - Single file input
    • type: "data_collection_input"
      - Collection of files
    • Must have descriptive
      annotation
      and
      label
  2. Input Parameters
    • type: "parameter_input"
    • Types: text, boolean, integer, float, color
    • Used for user-configurable settings
  3. Tool Steps
    • type: "tool"
    • tool_id
      and
      content_id
      reference Galaxy ToolShed
    • tool_shed_repository
      includes owner, name, changeset_revision
    • input_connections
      link to previous step outputs
    • tool_state
      contains parameter values (JSON-encoded)
  4. Workflow Outputs
    • Marked with
      workflow_outputs
      array
    • Each output has a
      label
      (human-readable name)
    • Can hide intermediate outputs with
      hide: true
步骤按顺序编号,定义以下内容:
  1. 输入数据集
    • type: "data_input"
      - 单个文件输入
    • type: "data_collection_input"
      - 文件集合输入
    • 必须包含描述性的
      annotation
      label
  2. 输入参数
    • type: "parameter_input"
    • 类型:文本、布尔值、整数、浮点数、颜色
    • 用于用户可配置的设置
  3. 工具步骤
    • type: "tool"
    • tool_id
      content_id
      引用Galaxy ToolShed
    • tool_shed_repository
      包含所有者、名称、changeset_revision
    • input_connections
      链接到上一步的输出
    • tool_state
      包含参数值(JSON编码)
  4. 工作流输出
    • workflow_outputs
      数组标记
    • 每个输出都有
      label
      (人类可读名称)
    • 可通过
      hide: true
      隐藏中间输出

Advanced Features

高级功能

  • Comments:
    type: "text"
    steps for documentation
  • Frames: Visual grouping with color-coded boxes
  • Reports: Embedded Markdown templates using Galaxy report syntax
  • Post-job actions: Rename, tag, or hide outputs
  • Conditional execution:
    when
    field for conditional steps
  • 注释
    type: "text"
    步骤用于文档说明
  • 框架:使用彩色框进行可视化分组
  • 报告:使用Galaxy报告语法嵌入Markdown模板
  • 任务后操作:重命名、标记或隐藏输出
  • 条件执行:使用
    when
    字段实现步骤的条件执行

Workflow Testing with Planemo

使用Planemo进行工作流测试

Test File Naming Convention

测试文件命名规范

  • Workflow:
    workflow-name.ga
  • Test file:
    workflow-name-tests.yml
    (identical name +
    -tests.yml
    )
  • 工作流:
    workflow-name.ga
  • 测试文件:
    workflow-name-tests.yml
    (同名+
    -tests.yml

Test File Structure (YAML)

测试文件结构(YAML)

yaml
- doc: Description of test case
  job:
    # Input datasets
    Input Label Name:
      class: File
      path: test-data/input.txt
      filetype: txt
      hashes:
      - hash_function: SHA-1
        hash_value: abc123...

    # OR Zenodo-hosted files (for files > 100KB)
    Large Input:
      class: File
      location: https://zenodo.org/records/XXXXXX/files/file.fastq.gz
      filetype: fastqsanger.gz
      hashes:
      - hash_function: SHA-1
        hash_value: def456...

    # Collection inputs
    Collection Input:
      class: Collection
      collection_type: list:paired
      elements:
      - class: File
        identifier: sample1
        path: test-data/sample1_R1.fastq
      - class: File
        identifier: sample1
        path: test-data/sample1_R2.fastq

    # Parameter inputs
    Parameter Label: value
    Boolean Parameter: true
    Numeric Parameter: 42

  outputs:
    # Output assertions
    Output Label:
      file: test-data/expected.txt

    # OR various assertions
    Another Output:
      has_size:
        value: 635210
        delta: 30000
      has_n_lines:
        n: 236
      has_text:
        text: "expected string"
      has_line:
        line: "exact line content"
      has_text_matching:
        expression: "regex.*pattern"

    # Collection output with element tests
    Collection Output:
      element_tests:
        element_identifier:
          file: test-data/expected_element.txt
          decompress: true
          compare: contains
yaml
- doc: Description of test case
  job:
    # Input datasets
    Input Label Name:
      class: File
      path: test-data/input.txt
      filetype: txt
      hashes:
      - hash_function: SHA-1
        hash_value: abc123...

    # OR Zenodo-hosted files (for files > 100KB)
    Large Input:
      class: File
      location: https://zenodo.org/records/XXXXXX/files/file.fastq.gz
      filetype: fastqsanger.gz
      hashes:
      - hash_function: SHA-1
        hash_value: def456...

    # Collection inputs
    Collection Input:
      class: Collection
      collection_type: list:paired
      elements:
      - class: File
        identifier: sample1
        path: test-data/sample1_R1.fastq
      - class: File
        identifier: sample1
        path: test-data/sample1_R2.fastq

    # Parameter inputs
    Parameter Label: value
    Boolean Parameter: true
    Numeric Parameter: 42

  outputs:
    # Output assertions
    Output Label:
      file: test-data/expected.txt

    # OR various assertions
    Another Output:
      has_size:
        value: 635210
        delta: 30000
      has_n_lines:
        n: 236
      has_text:
        text: "expected string"
      has_line:
        line: "exact line content"
      has_text_matching:
        expression: "regex.*pattern"

    # Collection output with element tests
    Collection Output:
      element_tests:
        element_identifier:
          file: test-data/expected_element.txt
          decompress: true
          compare: contains

Assertion Types

断言类型

  1. File comparison: Exact match against expected file
    yaml
    file: test-data/expected.txt
  2. Size assertions: Check file size with delta tolerance
    yaml
    has_size:
      value: 1000000
      delta: 50000
  3. Content assertions:
    yaml
    has_n_lines: {n: 100}
    has_text: {text: "substring"}
    has_line: {line: "exact line"}
    has_text_matching: {expression: "regex.*"}
  4. Comparison modes:
    yaml
    compare: contains      # Actual contains expected
    compare: re_match      # Regex match
    decompress: true       # Decompress before comparison
  5. Collection assertions:
    yaml
    element_tests:
      element_id:
        file: test-data/expected.txt
  1. 文件比较:与预期文件完全匹配
    yaml
    file: test-data/expected.txt
  2. 大小断言:检查文件大小并允许误差范围
    yaml
    has_size:
      value: 1000000
      delta: 50000
  3. 内容断言
    yaml
    has_n_lines: {n: 100}
    has_text: {text: "substring"}
    has_line: {line: "exact line"}
    has_text_matching: {expression: "regex.*"}
  4. 比较模式
    yaml
    compare: contains      # 实际内容包含预期内容
    compare: re_match      # 正则匹配
    decompress: true       # 比较前先解压
  5. 集合断言
    yaml
    element_tests:
      element_id:
        file: test-data/expected.txt

Repository Structure Standards

仓库结构标准

Required Files per Workflow

每个工作流必需的文件

workflow-folder/              # lowercase, dashes only
├── .dockstore.yml            # Dockstore registry metadata (REQUIRED)
├── .workflowhub.yml          # WorkflowHub metadata (optional)
├── workflow-name.ga          # Galaxy workflow file
├── workflow-name-tests.yml   # Planemo test file (REQUIRED)
├── README.md                 # Usage documentation (REQUIRED)
├── CHANGELOG.md              # Version history (REQUIRED)
└── test-data/                # Test datasets (if < 100KB)
    ├── input1.txt
    └── expected_output.txt
workflow-folder/              # 仅小写,用短横线分隔
├── .dockstore.yml            # Dockstore注册表元数据(必需)
├── .workflowhub.yml          # WorkflowHub元数据(可选)
├── workflow-name.ga          # Galaxy工作流文件
├── workflow-name-tests.yml   # Planemo测试文件(必需)
├── README.md                 # 使用文档(必需)
├── CHANGELOG.md              # 版本历史(必需)
└── test-data/                # 测试数据集(若文件<100KB)
    ├── input1.txt
    └── expected_output.txt

.dockstore.yml Format

.dockstore.yml格式

yaml
version: 1.2
workflows:
- name: main
  subclass: Galaxy
  publish: true
  primaryDescriptorPath: /workflow-name.ga
  testParameterFiles:
  - /workflow-name-tests.yml
  authors:
  - name: Author Name
    orcid: 0000-0002-xxxx-xxxx
  - name: IWC
    url: https://github.com/galaxyproject/iwc
yaml
version: 1.2
workflows:
- name: main
  subclass: Galaxy
  publish: true
  primaryDescriptorPath: /workflow-name.ga
  testParameterFiles:
  - /workflow-name-tests.yml
  authors:
  - name: Author Name
    orcid: 0000-0002-xxxx-xxxx
  - name: IWC
    url: https://github.com/galaxyproject/iwc

.workflowhub.yml Format (optional)

.workflowhub.yml格式(可选)

yaml
version: '0.1'
registries:
- url: https://workflowhub.eu
  project: iwc
  workflow: category/workflow-name/main
yaml
version: '0.1'
registries:
- url: https://workflowhub.eu
  project: iwc
  workflow: category/workflow-name/main

README.md Structure

README.md结构

Must include:
  1. Purpose: What the workflow does
  2. Inputs: Valid input formats, parameters, requirements
  3. Outputs: Expected output files and their content
  4. Comparison: How this differs from similar workflows (if applicable)
  5. Resources: Links to tutorials, papers, documentation
必须包含:
  1. 用途:工作流的功能
  2. 输入:有效的输入格式、参数、要求
  3. 输出:预期的输出文件及其内容
  4. 对比(若适用):与类似工作流的差异
  5. 资源:教程、论文、文档链接

CHANGELOG.md Format

CHANGELOG.md格式

markdown
undefined
遵循keepachangelog.com规范:
markdown
undefined

Changelog

Changelog

[0.1.2] - 2024-12-11

[0.1.2] - 2024-12-11

Changed

Changed

  • Updated parameter X to improve Y
  • Improved workflow annotation
  • Updated parameter X to improve Y
  • Improved workflow annotation

Automatic update

Automatic update

  • toolshed.g2.bx.psu.edu/repos/owner/tool/1.0
    was updated to version
    1.1
  • toolshed.g2.bx.psu.edu/repos/owner/tool/1.0
    was updated to version
    1.1

[0.1.1] - 2024-11-01

[0.1.1] - 2024-11-01

Added

Added

  • Initial workflow version
undefined
  • Initial workflow version
undefined

Naming Conventions (STRICT RULES)

命名规范(严格规则)

Folder and File Names

文件夹和文件名

  • MUST use lowercase only
  • MUST use dashes (
    -
    ) not underscores
  • NO spaces in filenames
  • Examples:
    • parallel-accession-download
    • rnaseq-paired-end
    • Parallel_Accession_Download
    • RNA-Seq_PE
  • 必须仅使用小写
  • 必须使用短横线(
    -
    )而非下划线
  • 禁止文件名包含空格
  • 示例:
    • parallel-accession-download
    • rnaseq-paired-end
    • Parallel_Accession_Download
    • RNA-Seq_PE

Workflow Name (in .ga file)

工作流名称(.ga文件中)

  • MUST be human-readable
  • CAN use spaces, capitalization
  • NO abbreviations unless universally known
  • Examples:
    • "Parallel Accession Download from SRA"
    • "RNA-Seq Analysis: Paired-End Reads"
    • "par_acc_dl"
    • "rnaseq_pe"
  • 必须人类可读
  • 可以使用空格和大写字母
  • 禁止使用缩写,除非是通用缩写
  • 示例:
    • "Parallel Accession Download from SRA"
    • "RNA-Seq Analysis: Paired-End Reads"
    • "par_acc_dl"
    • "rnaseq_pe"

Input/Output Labels

输入/输出标签

  • MUST be human-readable
  • CAN use spaces
  • SHOULD be descriptive
  • NO technical abbreviations
  • Examples:
    • "Collection of paired FASTQ files"
    • "Reference genome FASTA"
    • "fastq_coll"
    • "ref_fa"
  • 必须人类可读
  • 可以使用空格
  • 应该具有描述性
  • 禁止使用技术缩写
  • 示例:
    • "Collection of paired FASTQ files"
    • "Reference genome FASTA"
    • "fastq_coll"
    • "ref_fa"

Compound Adjectives

复合形容词

  • Use singular form when modifying nouns
  • Examples:
    • "short-read sequencing"
      (read modifies sequencing)
    • "single-end library"
    • "short-reads sequencing"
    • "single-ends library"
  • 修饰名词时使用单数形式
  • 示例:
    • "short-read sequencing"
      (read修饰sequencing)
    • "single-end library"
    • "short-reads sequencing"
    • "single-ends library"

Quality Standards & Best Practices

质量标准与最佳实践

Workflow Design Principles

工作流设计原则

  1. Generic Workflows
    • NO hardcoded sample names in labels
    • Use parameter inputs for user-configurable values
    • Design for reusability across datasets
  2. Input/Output Naming
    • Clear, descriptive labels
    • Explain expected format in annotation
    • Group related inputs logically
  3. Annotation Quality
    • Workflow annotation: Detailed description of purpose, method, expected inputs/outputs
    • Step annotations: Brief explanation of what each step does
    • Parameter annotations: Guidance on choosing values
  4. Metadata Completeness
    • Include creator with ORCID
    • Add IWC as organization creator
    • Specify license (default: MIT)
    • Use semantic versioning in
      release
      field
  5. Tool Version Pinning
    • Always specify exact tool version
    • Include
      changeset_revision
      for ToolShed tools
    • Document in CHANGELOG when updating tools
  1. 通用工作流
    • 标签中禁止硬编码样本名称
    • 使用输入参数实现用户可配置的值
    • 设计为可跨数据集复用
  2. 输入/输出命名
    • 清晰、具有描述性的标签
    • 在注释中说明预期格式
    • 按逻辑分组相关输入
  3. 注释质量
    • 工作流注释:详细描述用途、方法、预期输入/输出
    • 步骤注释:简要说明每个步骤的作用
    • 参数注释:提供参数值选择指导
  4. 元数据完整性
    • 包含带有ORCID的创建者信息
    • 添加IWC作为组织创建者
    • 指定许可证(默认:MIT)
    • release
      字段中使用语义化版本
  5. 工具版本固定
    • 始终指定确切的工具版本
    • 为ToolShed工具包含
      changeset_revision
    • 更新工具时在CHANGELOG中记录

Testing Best Practices

测试最佳实践

  1. Test Coverage
    • Minimum one test case per workflow
    • Test different input types (if applicable)
    • Test edge cases and common use cases
    • Test all major workflow outputs
  2. Test Data Management
    • Files < 100KB: Store in
      test-data/
      directory
    • Files ≥ 100KB: Upload to Zenodo, reference by URL
    • Always include SHA-1 hash for verification
    • Use minimal test data (trim large files to essentials)
  3. Assertion Strategy
    • Use strictest possible assertions
    • Prefer exact file comparison when possible
    • Use size/line count when content varies
    • Use regex for timestamps or dynamic content
  4. Test Documentation
    • Include
      doc:
      field explaining test scenario
    • Comment complex assertions
    • Document why certain tolerances are used
  1. 测试覆盖率
    • 每个工作流至少包含一个测试用例
    • 测试不同的输入类型(若适用)
    • 测试边缘情况和常见用例
    • 测试所有主要的工作流输出
  2. 测试数据管理
    • 文件<100KB:存储在
      test-data/
      目录中
    • 文件≥100KB:上传到Zenodo,通过URL引用
    • 始终包含SHA-1哈希用于验证
    • 使用最小化的测试数据(将大文件裁剪到必要部分)
  3. 断言策略
    • 使用尽可能严格的断言
    • 可能的话优先使用精确文件比较
    • 内容变化时使用大小/行数统计
    • 对时间戳或动态内容使用正则表达式
  4. 测试文档
    • 包含
      doc:
      字段说明测试场景
    • 对复杂断言添加注释
    • 记录使用特定误差范围的原因

CI/CD Integration

CI/CD集成

Planemo Commands:
bash
undefined
Planemo命令
bash
undefined

Lint workflow (IWC mode)

Lint workflow (IWC mode)

planemo workflow_lint --iwc workflow.ga
planemo workflow_lint --iwc workflow.ga

Test workflow locally

Test workflow locally

planemo test --galaxy_url http://localhost:8080
--galaxy_user_key YOUR_API_KEY
workflow-tests.yml
planemo test --galaxy_url http://localhost:8080
--galaxy_user_key YOUR_API_KEY
workflow-tests.yml

Test workflow with Docker

Test workflow with Docker

planemo test --galaxy_docker_image quay.io/galaxyproject/galaxy-min:25.1
workflow-tests.yml

**GitHub Actions Integration**:
- Workflows tested on every PR
- Uses Galaxy release_25.1
- PostgreSQL service for database
- CVMFS for reference data
- Parallel execution with chunking
planemo test --galaxy_docker_image quay.io/galaxyproject/galaxy-min:25.1
workflow-tests.yml

**GitHub Actions集成**:
- 每次PR时测试工作流
- 使用Galaxy release_25.1
- PostgreSQL服务用于数据库
- CVMFS用于参考数据
- 分块并行执行

Common Workflow Patterns

常见工作流模式

Pattern 1: Data Fetching

模式1:数据获取

Input: Accession list
Tool: Fetch data (e.g., fasterq-dump)
Tool: Quality control (e.g., FastQC)
Output: Raw reads + QC report
输入:登录号列表
工具:获取数据(例如fasterq-dump)
工具:质量控制(例如FastQC)
输出:原始reads + QC报告

Pattern 2: Read Processing

模式2:Read处理

Input: FASTQ files
Tool: Quality trimming
Tool: Alignment/Mapping
Tool: Post-processing
Output: Processed data + statistics
输入:FASTQ文件
工具:质量修剪
工具:比对/映射
工具:后处理
输出:处理后的数据 + 统计信息

Pattern 3: Analysis Pipeline

模式3:分析流程

Input: Processed data + reference
Tool: Primary analysis (e.g., variant calling, quantification)
Tool: Filtering/Normalization
Tool: Visualization
Output: Results + plots + reports
输入:处理后的数据 + 参考序列
工具:初级分析(例如变异检测、定量分析)
工具:过滤/标准化
工具:可视化
输出:结果 + 图表 + 报告

Workflow Categories in IWC

IWC中的工作流分类

Organize workflows by scientific domain:
  • amplicon/
    - Amplicon sequencing analysis
  • bacterial_genomics/
    - Bacterial genome analysis
  • computational-chemistry/
    - Computational chemistry workflows
  • data-fetching/
    - Data download and retrieval
  • epigenetics/
    - ATAC-seq, ChIP-seq, Hi-C, etc.
  • genome-annotation/
    - Gene prediction, annotation
  • genome-assembly/
    - Genome assembly workflows
  • imaging/
    - Image analysis
  • metabolomics/
    - Metabolomics analysis
  • microbiome/
    - Microbiome analysis
  • proteomics/
    - Proteomics workflows
  • read-preprocessing/
    - Read trimming, QC
  • repeatmasking/
    - Repeat element masking
  • sars-cov-2-variant-calling/
    - COVID-19 specific
  • scRNAseq/
    - Single-cell RNA-seq
  • transcriptomics/
    - RNA-seq, differential expression
  • variant-calling/
    - Variant detection
  • VGP-assembly-v2/
    - Vertebrate Genome Project
  • virology/
    - Viral genome analysis
按科学领域组织工作流:
  • amplicon/
    - 扩增子测序分析
  • bacterial_genomics/
    - 细菌基因组分析
  • computational-chemistry/
    - 计算化学工作流
  • data-fetching/
    - 数据下载与检索
  • epigenetics/
    - ATAC-seq、ChIP-seq、Hi-C等
  • genome-annotation/
    - 基因预测、注释
  • genome-assembly/
    - 基因组组装工作流
  • imaging/
    - 图像分析
  • metabolomics/
    - 代谢组学分析
  • microbiome/
    - 微生物组分析
  • proteomics/
    - 蛋白质组学工作流
  • read-preprocessing/
    - Read修剪、QC
  • repeatmasking/
    - 重复序列屏蔽
  • sars-cov-2-variant-calling/
    - COVID-19专用
  • scRNAseq/
    - 单细胞RNA-seq
  • transcriptomics/
    - RNA-seq、差异表达分析
  • variant-calling/
    - 变异检测
  • VGP-assembly-v2/
    - 脊椎动物基因组计划
  • virology/
    - 病毒基因组分析

Review Checklist

审核清单

When reviewing workflows, verify:
Metadata:
  • .dockstore.yml
    present and valid
  • Creator metadata matches
    .dockstore.yml
  • License specified (MIT preferred)
  • Clear, detailed
    annotation
    field
  • Human-readable workflow name
Naming:
  • Folder/file names lowercase with dashes
  • Workflow name human-readable
  • Input/output labels descriptive
  • No hardcoded sample names
Documentation:
  • README.md explains usage
  • CHANGELOG.md has version entries
  • Annotations on all inputs/outputs
  • Tool versions documented
Testing:
  • Test file present (
    -tests.yml
    )
  • At least one test case
  • Large files (>100KB) on Zenodo
  • SHA-1 hashes for all test files
  • Tests cover major outputs
Quality:
  • Workflow is generic/reusable
  • Tools pinned to specific versions
  • No unnecessary intermediate outputs
  • Proper workflow output labels
Technical:
  • Workflow lints cleanly (
    planemo workflow_lint --iwc
    )
  • Tests pass (
    planemo test
    )
  • Valid JSON structure
  • No broken connections
审核工作流时,需验证以下内容:
元数据
  • .dockstore.yml
    存在且有效
  • 创建者元数据与
    .dockstore.yml
    匹配
  • 指定了许可证(推荐MIT)
  • 清晰、详细的
    annotation
    字段
  • 人类可读的工作流名称
命名
  • 文件夹/文件名使用小写和短横线
  • 工作流名称人类可读
  • 输入/输出标签具有描述性
  • 无硬编码样本名称
文档
  • README.md说明使用方法
  • CHANGELOG.md包含版本记录
  • 所有输入/输出都有注释
  • 工具版本已记录
测试
  • 存在测试文件(
    -tests.yml
  • 至少有一个测试用例
  • 大文件(>100KB)存储在Zenodo
  • 所有测试文件都有SHA-1哈希
  • 测试覆盖主要输出
质量
  • 工作流通用/可复用
  • 工具固定到特定版本
  • 无不必要的中间输出
  • 正确的工作流输出标签
技术
  • 工作流可通过lint检查(
    planemo workflow_lint --iwc
  • 测试通过(
    planemo test
  • 有效的JSON结构
  • 无断开的连接

Tools and Resources

工具与资源

Planemo (workflow development):
bash
undefined
Planemo(工作流开发)
bash
undefined

Install

Install

pip install planemo
pip install planemo

Lint workflow

Lint workflow

planemo workflow_lint --iwc workflow.ga
planemo workflow_lint --iwc workflow.ga

Test workflow

Test workflow

planemo test workflow-tests.yml
planemo test workflow-tests.yml

Serve workflow locally

Serve workflow locally

planemo serve workflow.ga

**Galaxy Workflow Editor**:
- Access via any Galaxy instance
- Drag-and-drop interface
- Export as .ga JSON file
- Test with GUI

**IWC Resources**:
- Repository: https://github.com/galaxyproject/iwc
- Dockstore: https://dockstore.org/organizations/iwc
- WorkflowHub: https://workflowhub.eu/projects/33
- Gitter: https://gitter.im/galaxyproject/iwc
- Training: https://training.galaxyproject.org

**Reference Data**:
- CVMFS: http://datacache.galaxyproject.org/
- .loc files: http://datacache.galaxyproject.org/indexes/location/
planemo serve workflow.ga

**Galaxy工作流编辑器**:
- 通过任何Galaxy实例访问
- 拖放界面
- 导出为.ga JSON文件
- 图形化界面测试

**IWC资源**:
- 仓库:https://github.com/galaxyproject/iwc
- Dockstore:https://dockstore.org/organizations/iwc
- WorkflowHub:https://workflowhub.eu/projects/33
- Gitter:https://gitter.im/galaxyproject/iwc
- 培训:https://training.galaxyproject.org

**参考数据**:
- CVMFS:http://datacache.galaxyproject.org/
- .loc文件:http://datacache.galaxyproject.org/indexes/location/

Common Issues and Solutions

常见问题与解决方案

Issue: Test fails with "output not found"

问题:测试失败,提示“output not found”

Solution: Check output label matches exactly (case-sensitive)
解决方案:检查输出标签是否完全匹配(区分大小写)

Issue: Large test files in repository

问题:仓库中存在大测试文件

Solution: Upload to Zenodo, reference by URL with hash
解决方案:上传到Zenodo,通过带哈希的URL引用

Issue: Workflow not generic

问题:工作流不通用

Solution: Replace hardcoded values with parameter inputs
解决方案:将硬编码值替换为输入参数

Issue: Tool update breaks workflow

问题:工具更新导致工作流中断

Solution: Pin exact version in tool_shed_repository.changeset_revision
解决方案:在tool_shed_repository.changeset_revision中固定确切版本

Issue: Tests pass locally but fail in CI

问题:本地测试通过但CI中失败

Solution: Check reference data availability on CVMFS
解决方案:检查CVMFS上的参考数据是否可用

Issue: Workflow lint warnings

问题:工作流lint检查有警告

Solution: Run
planemo workflow_lint --iwc
and address each warning
解决方案:运行
planemo workflow_lint --iwc
并解决每个警告

Version Bumping

版本升级

When updating a workflow:
  1. Update
    release
    field in .ga file
  2. Add entry to CHANGELOG.md
  3. Update tests if needed
  4. Commit with descriptive message
Example:
bash
undefined
更新工作流时:
  1. 更新.ga文件中的
    release
    字段
  2. 向CHANGELOG.md添加记录
  3. 必要时更新测试
  4. 提交时使用描述性的提交信息
示例:
bash
undefined

Update release field

Update release field

release: "0.1.1" → "0.1.2"

release: "0.1.1" → "0.1.2"

Add CHANGELOG entry

Add CHANGELOG entry

echo "## [0.1.2] - $(date +%Y-%m-%d)" >> CHANGELOG.md echo "### Changed" >> CHANGELOG.md echo "- Description of changes" >> CHANGELOG.md
undefined
echo "## [0.1.2] - $(date +%Y-%m-%d)" >> CHANGELOG.md echo "### Changed" >> CHANGELOG.md echo "- Description of changes" >> CHANGELOG.md
undefined

Deployment Pipeline

部署流程

After PR merge:
  1. ✅ Tests pass
  2. 📦 RO-Crate metadata generated
  3. 🚀 Deployed to iwc-workflows organization
  4. 📋 Registered on Dockstore
  5. 🌐 Registered on WorkflowHub
  6. 🌌 Auto-installed on usegalaxy.* servers

PR合并后:
  1. ✅ 测试通过
  2. 📦 生成RO-Crate元数据
  3. 🚀 部署到iwc-workflows组织
  4. 📋 在Dockstore注册
  5. 🌐 在WorkflowHub注册
  6. 🌌 自动安装到usegalaxy.*服务器

Writing Methods Sections for Publications

为撰写论文方法部分提供支持

When helping users write methods sections for scientific papers based on Galaxy workflows:
当帮助用户基于Galaxy工作流撰写科学论文的方法部分时:

1. Workflow Analysis Strategy

1. 工作流分析策略

Examine workflow metadata first:
bash
undefined
首先检查工作流元数据
bash
undefined

Get workflow name and description

获取工作流名称和描述

head -30 workflow.ga | grep -E '"name"|"annotation"'
head -30 workflow.ga | grep -E '"name"|"annotation"'

Extract tool names and versions

提取工具名称和版本

grep -o '"tool_id": "[^"]*"' workflow.ga | sort -u
grep -o '"tool_id": "[^"]*"' workflow.ga | sort -u

Find specific tools (e.g., assemblers)

查找特定工具(例如组装工具)

grep -o '"tool_id": "[^"]hifiasm[^"]"' workflow.ga

**For large workflows (>25000 tokens):**
- Don't read entire files - they'll exceed token limits
- Use grep to extract specific information
- Read only first 100 lines for metadata: `head -100 workflow.ga`
- Search for tool patterns rather than reading everything
grep -o '"tool_id": "[^"]hifiasm[^"]"' workflow.ga

**对于大型工作流(>25000个token)**:
- 不要读取整个文件——会超出token限制
- 使用grep提取特定信息
- 仅读取前100行获取元数据:`head -100 workflow.ga`
- 优先使用针对性的grep而非读取全部内容

2. VGP Workflow Documentation Pattern

2. VGP工作流文档模式

For VGP pipeline workflows, document in this order:
  1. Platform and pipeline: "implemented in Galaxy (cite) using VGP workflows (cite)"
  2. Data-specific approach: Distinguish trio vs non-trio methods
  3. Sequential workflow steps:
    • K-mer profiling (Meryl, GenomeScope2)
    • Assembly (HiFiasm with appropriate mode)
    • Scaffolding (RagTag with reference)
    • Quality assessment (BUSCO/Compleasm, Merqury, gfastats)
  4. Tool versions: Always include version numbers
  5. Specific parameters: Reference genomes, accessions used
对于VGP流程工作流,按以下顺序记录:
  1. 平台和流程:"implemented in Galaxy (cite) using VGP workflows (cite)"
  2. 数据特定方法:区分trio和非trio方法
  3. 顺序工作流步骤
    • K-mer分析(Meryl、GenomeScope2)
    • 组装(HiFiasm,使用适当模式)
    • scaffolding(RagTag,使用参考序列)
    • 质量评估(BUSCO/Compleasm、Merqury、gfastats)
  4. 工具版本:始终包含版本号
  5. 特定参数:使用的参考基因组、登录号

3. Methods Section Template

3. 方法部分模板

markdown
Genome assemblies were generated using the [Pipeline Name] workflows (Citation)
implemented in Galaxy (Galaxy Community, 2024). For [condition A], we employed
[approach A]: first, [step 1] using [Tool v.X] (Citation), followed by [step 2]
using [Tool v.Y] (Citation). For [condition B], we performed [approach B]
using [Tool v.Z] (Citation). All assemblies were [post-processing step] using
[Tool] with [specific parameter/reference]. Assembly quality was assessed using
multiple metrics including [Tool A] for [metric type], [Tool B] for [metric type],
and [Tool C] for [metric type]. [Annotation or downstream analysis] was performed
using [Tool/Pipeline] (Citation), which [brief description]. [Specific data sources
with accessions].
markdown
Genome assemblies were generated using the [Pipeline Name] workflows (Citation)
implemented in Galaxy (Galaxy Community, 2024). For [condition A], we employed
[approach A]: first, [step 1] using [Tool v.X] (Citation), followed by [step 2]
using [Tool v.Y] (Citation). For [condition B], we performed [approach B]
using [Tool v.Z] (Citation). All assemblies were [post-processing step] using
[Tool] with [specific parameter/reference]. Assembly quality was assessed using
multiple metrics including [Tool A] for [metric type], [Tool B] for [metric type],
and [Tool C] for [metric type]. [Annotation or downstream analysis] was performed
using [Tool/Pipeline] (Citation), which [brief description]. [Specific data sources
with accessions].

4. Common VGP Workflow Tool Citations Needed

4. VGP工作流常用工具引用

Core tools to cite:
  • Galaxy platform: The Galaxy Community (2024)
  • VGP workflows: Larivière et al. (2024) Nature Biotechnology
  • HiFiasm: Cheng et al. (2021) Nature Methods
  • Meryl: Rhie et al. (2020) Genome Biology
  • GenomeScope2: Ranallo-Benavidez et al. (2020) Nature Communications
  • Merqury: Rhie et al. (2020) Genome Biology
  • BUSCO: Manni et al. (2021) MBE
  • Compleasm: Huang & Li (2023) Bioinformatics
  • RagTag: Alonge et al. (2022) Genome Biology
  • gfastats: Formenti et al. (2022) Bioinformatics
  • EGApX: Thibaud-Nissen et al. (2013) NCBI Handbook
需要引用的核心工具
  • Galaxy平台:The Galaxy Community (2024)
  • VGP工作流:Larivière et al. (2024) Nature Biotechnology
  • HiFiasm: Cheng et al. (2021) Nature Methods
  • Meryl: Rhie et al. (2020) Genome Biology
  • GenomeScope2: Ranallo-Benavidez et al. (2020) Nature Communications
  • Merqury: Rhie et al. (2020) Genome Biology
  • BUSCO: Manni et al. (2021) MBE
  • Compleasm: Huang & Li (2023) Bioinformatics
  • RagTag: Alonge et al. (2022) Genome Biology
  • gfastats: Formenti et al. (2022) Bioinformatics
  • EGApX: Thibaud-Nissen et al. (2013) NCBI Handbook

5. Key Information to Extract from Workflows

5. 从工作流中提取的关键信息

From workflow annotation field:
  • Purpose and description
  • Pipeline position (e.g., "Part of VGP suite, run after VGP1")
From tool_id fields:
  • Primary assembler (hifiasm, flye, etc.)
  • Scaffolding tool (ragtag, yahs, etc.)
  • QC tools (busco, merqury, etc.)
From inputs:
  • Data types required (HiFi, Hi-C, Illumina, trio data)
  • Reference genome requirements
  • RNA-seq accessions for annotation
From parameters:
  • K-mer lengths
  • Ploidy settings
  • BUSCO lineages
  • Coverage thresholds
从工作流annotation字段
  • 用途和描述
  • 流程位置(例如“VGP套件的一部分,在VGP1之后运行”)
从tool_id字段
  • 主要组装工具(hifiasm、flye等)
  • Scaffolding工具(ragtag、yahs等)
  • QC工具(busco、merqury等)
从输入
  • 所需的数据类型(HiFi、Hi-C、Illumina、trio数据)
  • 参考基因组要求
  • 用于注释的RNA-seq登录号
从参数
  • K-mer长度
  • 倍性设置
  • BUSCO谱系
  • 覆盖度阈值

6. Workflow File Size Considerations

6. 工作流文件大小注意事项

Token-efficient workflow analysis:
bash
undefined
高效token的工作流分析
bash
undefined

Get file size first

先获取文件大小

ls -lh workflow.ga
ls -lh workflow.ga

For large files (>100K):

对于大文件(>100K):

- Extract metadata only (first 100 lines)

- 仅提取元数据(前100行)

- Use grep for specific tools

- 使用grep查找特定工具

- Read tool documentation instead of entire workflow

- 阅读工具文档而非整个工作流

For small files (<100K):

对于小文件(<100K):

- Can read with limit parameter

- 可使用限制参数读取

- Still prefer targeted grep when possible

- 仍优先使用针对性的grep


---

---

Related Skills

相关技能

  • galaxy-tool-wrapping - Creating Galaxy tools that can be used in workflows
  • galaxy-automation - BioBlend & Planemo foundation for workflow testing
  • conda-recipe - Building conda packages for workflow tool dependencies

  • galaxy-tool-wrapping - 创建可在工作流中使用的Galaxy工具
  • galaxy-automation - 用于工作流测试的BioBlend & Planemo基础
  • conda-recipe - 为工作流工具依赖构建conda包

Applying This Knowledge

知识应用

When helping with Galaxy workflow development:
  1. Creating new workflows: Follow IWC structure and naming conventions
  2. Writing tests: Use appropriate assertions and test data management
  3. Reviewing workflows: Apply the review checklist systematically
  4. Debugging: Check lint output and test logs carefully
  5. Updating workflows: Maintain CHANGELOG and version properly
  6. Documentation: Write clear, detailed annotations and READMEs
Always prioritize:
  • Reproducibility: Pin versions, hash test data
  • Usability: Human-readable names, clear documentation
  • Quality: Comprehensive tests, generic design
  • Standards: Follow IWC conventions strictly
帮助进行Galaxy工作流开发时:
  1. 创建新工作流:遵循IWC结构和命名规范
  2. 编写测试:使用适当的断言和测试数据管理方法
  3. 审核工作流:系统地应用审核清单
  4. 调试:仔细检查lint输出和测试日志
  5. 更新工作流:维护CHANGELOG和版本信息
  6. 文档编写:编写清晰、详细的注释和README
始终优先考虑:
  • 可重复性:固定版本、为测试数据添加哈希
  • 易用性:人类可读的名称、清晰的文档
  • 质量:全面的测试、通用设计
  • 标准:严格遵循IWC规范