dnanexus-integration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DNAnexus Integration

DNAnexus 集成

Overview

概述

DNAnexus is a cloud platform for biomedical data analysis and genomics. Build and deploy apps/applets, manage data objects, run workflows, and use the dxpy Python SDK for genomics pipeline development and execution.
DNAnexus是一个用于生物医学数据分析和基因组学的云平台。支持构建并部署应用/小程序、管理数据对象、运行工作流,以及使用dxpy Python SDK进行基因组学流程的开发与执行。

When to Use This Skill

何时使用该技能

This skill should be used when:
  • Creating, building, or modifying DNAnexus apps/applets
  • Uploading, downloading, searching, or organizing files and records
  • Running analyses, monitoring jobs, creating workflows
  • Writing scripts using dxpy to interact with the platform
  • Setting up dxapp.json, managing dependencies, using Docker
  • Processing FASTQ, BAM, VCF, or other bioinformatics files
  • Managing projects, permissions, or platform resources
在以下场景中应使用本技能:
  • 创建、构建或修改DNAnexus应用/小程序
  • 上传、下载、搜索或整理文件与记录
  • 运行分析、监控任务、创建工作流
  • 使用dxpy编写脚本与平台交互
  • 配置dxapp.json、管理依赖、使用Docker
  • 处理FASTQ、BAM、VCF或其他生物信息学文件
  • 管理项目、权限或平台资源

Core Capabilities

核心能力

The skill is organized into five main areas, each with detailed reference documentation:
本技能分为五个主要领域,每个领域都配有详细的参考文档:

1. App Development

1. 应用开发

Purpose: Create executable programs (apps/applets) that run on the DNAnexus platform.
Key Operations:
  • Generate app skeleton with
    dx-app-wizard
  • Write Python or Bash apps with proper entry points
  • Handle input/output data objects
  • Deploy with
    dx build
    or
    dx build --app
  • Test apps on the platform
Common Use Cases:
  • Bioinformatics pipelines (alignment, variant calling)
  • Data processing workflows
  • Quality control and filtering
  • Format conversion tools
Reference: See
references/app-development.md
for:
  • Complete app structure and patterns
  • Python entry point decorators
  • Input/output handling with dxpy
  • Development best practices
  • Common issues and solutions
目标:创建可在DNAnexus平台上运行的可执行程序(应用/小程序)。
关键操作
  • 使用
    dx-app-wizard
    生成应用骨架
  • 编写带有正确入口点的Python或Bash应用
  • 处理输入/输出数据对象
  • 使用
    dx build
    dx build --app
    部署应用
  • 在平台上测试应用
常见用例
  • 生物信息学流程(比对、变异检测)
  • 数据处理工作流
  • 质量控制与过滤
  • 格式转换工具
参考:详见
references/app-development.md
,包含:
  • 完整的应用结构与模式
  • Python入口点装饰器
  • 使用dxpy处理输入/输出
  • 开发最佳实践
  • 常见问题与解决方案

2. Data Operations

2. 数据操作

Purpose: Manage files, records, and other data objects on the platform.
Key Operations:
  • Upload/download files with
    dxpy.upload_local_file()
    and
    dxpy.download_dxfile()
  • Create and manage records with metadata
  • Search for data objects by name, properties, or type
  • Clone data between projects
  • Manage project folders and permissions
Common Use Cases:
  • Uploading sequencing data (FASTQ files)
  • Organizing analysis results
  • Searching for specific samples or experiments
  • Backing up data across projects
  • Managing reference genomes and annotations
Reference: See
references/data-operations.md
for:
  • Complete file and record operations
  • Data object lifecycle (open/closed states)
  • Search and discovery patterns
  • Project management
  • Batch operations
目标:管理平台上的文件、记录及其他数据对象。
关键操作
  • 使用
    dxpy.upload_local_file()
    dxpy.download_dxfile()
    上传/下载文件
  • 创建并管理带有元数据的记录
  • 按名称、属性或类型搜索数据对象
  • 在项目间克隆数据
  • 管理项目文件夹与权限
常见用例
  • 上传测序数据(FASTQ文件)
  • 整理分析结果
  • 搜索特定样本或实验数据
  • 在项目间备份数据
  • 管理参考基因组与注释信息
参考:详见
references/data-operations.md
,包含:
  • 完整的文件与记录操作方法
  • 数据对象生命周期(开放/关闭状态)
  • 搜索与发现模式
  • 项目管理
  • 批量操作

3. Job Execution

3. 任务执行

Purpose: Run analyses, monitor execution, and orchestrate workflows.
Key Operations:
  • Launch jobs with
    applet.run()
    or
    app.run()
  • Monitor job status and logs
  • Create subjobs for parallel processing
  • Build and run multi-step workflows
  • Chain jobs with output references
Common Use Cases:
  • Running genomics analyses on sequencing data
  • Parallel processing of multiple samples
  • Multi-step analysis pipelines
  • Monitoring long-running computations
  • Debugging failed jobs
Reference: See
references/job-execution.md
for:
  • Complete job lifecycle and states
  • Workflow creation and orchestration
  • Parallel execution patterns
  • Job monitoring and debugging
  • Resource management
目标:运行分析、监控执行过程并编排工作流。
关键操作
  • 使用
    applet.run()
    app.run()
    启动任务
  • 监控任务状态与日志
  • 创建子任务进行并行处理
  • 构建并运行多步骤工作流
  • 通过输出引用链接任务
常见用例
  • 对测序数据运行基因组学分析
  • 并行处理多个样本
  • 多步骤分析流程
  • 监控长时间运行的计算任务
  • 调试失败的任务
参考:详见
references/job-execution.md
,包含:
  • 完整的任务生命周期与状态
  • 工作流创建与编排
  • 并行执行模式
  • 任务监控与调试
  • 资源管理

4. Python SDK (dxpy)

4. Python SDK(dxpy)

Purpose: Programmatic access to DNAnexus platform through Python.
Key Operations:
  • Work with data object handlers (DXFile, DXRecord, DXApplet, etc.)
  • Use high-level functions for common tasks
  • Make direct API calls for advanced operations
  • Create links and references between objects
  • Search and discover platform resources
Common Use Cases:
  • Automation scripts for data management
  • Custom analysis pipelines
  • Batch processing workflows
  • Integration with external tools
  • Data migration and organization
Reference: See
references/python-sdk.md
for:
  • Complete dxpy class reference
  • High-level utility functions
  • API method documentation
  • Error handling patterns
  • Common code patterns
目标:通过Python以编程方式访问DNAnexus平台。
关键操作
  • 使用数据对象处理器(DXFile、DXRecord、DXApplet等)
  • 使用高级函数完成常见任务
  • 直接调用API进行高级操作
  • 创建对象间的链接与引用
  • 搜索与发现平台资源
常见用例
  • 数据管理自动化脚本
  • 自定义分析流程
  • 批量处理工作流
  • 与外部工具集成
  • 数据迁移与整理
参考:详见
references/python-sdk.md
,包含:
  • 完整的dxpy类参考
  • 高级实用函数
  • API方法文档
  • 错误处理模式
  • 常见代码模式

5. Configuration and Dependencies

5. 配置与依赖管理

Purpose: Configure app metadata and manage dependencies.
Key Operations:
  • Write dxapp.json with inputs, outputs, and run specs
  • Install system packages (execDepends)
  • Bundle custom tools and resources
  • Use assets for shared dependencies
  • Integrate Docker containers
  • Configure instance types and timeouts
Common Use Cases:
  • Defining app input/output specifications
  • Installing bioinformatics tools (samtools, bwa, etc.)
  • Managing Python package dependencies
  • Using Docker images for complex environments
  • Selecting computational resources
Reference: See
references/configuration.md
for:
  • Complete dxapp.json specification
  • Dependency management strategies
  • Docker integration patterns
  • Regional and resource configuration
  • Example configurations
目标:配置应用元数据并管理依赖项。
关键操作
  • 编写包含输入、输出和运行规范的dxapp.json
  • 安装系统包(execDepends)
  • 打包自定义工具与资源
  • 使用资产管理共享依赖
  • 集成Docker容器
  • 配置实例类型与超时时间
常见用例
  • 定义应用输入/输出规范
  • 安装生物信息学工具(samtools、bwa等)
  • 管理Python包依赖
  • 使用Docker镜像构建复杂环境
  • 选择计算资源
参考:详见
references/configuration.md
,包含:
  • 完整的dxapp.json规范
  • 依赖管理策略
  • Docker集成模式
  • 区域与资源配置
  • 示例配置

Quick Start Examples

快速入门示例

Upload and Analyze Data

上传并分析数据

python
import dxpy
python
import dxpy

Upload input file

Upload input file

input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")
input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")

Run analysis

Run analysis

job = dxpy.DXApplet("applet-xxxx").run({ "reads": dxpy.dxlink(input_file.get_id()) })
job = dxpy.DXApplet("applet-xxxx").run({ "reads": dxpy.dxlink(input_file.get_id()) })

Wait for completion

Wait for completion

job.wait_on_done()
job.wait_on_done()

Download results

Download results

output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"] dxpy.download_dxfile(output_id, "aligned.bam")
undefined
output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"] dxpy.download_dxfile(output_id, "aligned.bam")
undefined

Search and Download Files

搜索并下载文件

python
import dxpy
python
import dxpy

Find BAM files from a specific experiment

Find BAM files from a specific experiment

files = dxpy.find_data_objects( classname="file", name="*.bam", properties={"experiment": "exp001"}, project="project-xxxx" )
files = dxpy.find_data_objects( classname="file", name="*.bam", properties={"experiment": "exp001"}, project="project-xxxx" )

Download each file

Download each file

for file_result in files: file_obj = dxpy.DXFile(file_result["id"]) filename = file_obj.describe()["name"] dxpy.download_dxfile(file_result["id"], filename)
undefined
for file_result in files: file_obj = dxpy.DXFile(file_result["id"]) filename = file_obj.describe()["name"] dxpy.download_dxfile(file_result["id"], filename)
undefined

Create Simple App

创建简单应用

python
undefined
python
undefined

src/my-app.py

src/my-app.py

import dxpy import subprocess
@dxpy.entry_point('main') def main(input_file, quality_threshold=30): # Download input dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")
# Process
subprocess.check_call([
    "quality_filter",
    "--input", "input.fastq",
    "--output", "filtered.fastq",
    "--threshold", str(quality_threshold)
])

# Upload output
output_file = dxpy.upload_local_file("filtered.fastq")

return {
    "filtered_reads": dxpy.dxlink(output_file)
}
dxpy.run()
undefined
import dxpy import subprocess
@dxpy.entry_point('main') def main(input_file, quality_threshold=30): # Download input dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")
# Process
subprocess.check_call([
    "quality_filter",
    "--input", "input.fastq",
    "--output", "filtered.fastq",
    "--threshold", str(quality_threshold)
])

# Upload output
output_file = dxpy.upload_local_file("filtered.fastq")

return {
    "filtered_reads": dxpy.dxlink(output_file)
}
dxpy.run()
undefined

Workflow Decision Tree

工作流决策树

When working with DNAnexus, follow this decision tree:
  1. Need to create a new executable?
    • Yes → Use App Development (references/app-development.md)
    • No → Continue to step 2
  2. Need to manage files or data?
    • Yes → Use Data Operations (references/data-operations.md)
    • No → Continue to step 3
  3. Need to run an analysis or workflow?
    • Yes → Use Job Execution (references/job-execution.md)
    • No → Continue to step 4
  4. Writing Python scripts for automation?
    • Yes → Use Python SDK (references/python-sdk.md)
    • No → Continue to step 5
  5. Configuring app settings or dependencies?
    • Yes → Use Configuration (references/configuration.md)
Often you'll need multiple capabilities together (e.g., app development + configuration, or data operations + job execution).
使用DNAnexus时,请遵循以下决策树:
  1. 是否需要创建新的可执行程序?
    • 是 → 使用应用开发(参考references/app-development.md)
    • 否 → 继续步骤2
  2. 是否需要管理文件或数据?
    • 是 → 使用数据操作(参考references/data-operations.md)
    • 否 → 继续步骤3
  3. 是否需要运行分析或工作流?
    • 是 → 使用任务执行(参考references/job-execution.md)
    • 否 → 继续步骤4
  4. 是否正在编写Python自动化脚本?
    • 是 → 使用Python SDK(参考references/python-sdk.md)
    • 否 → 继续步骤5
  5. 是否正在配置应用设置或依赖项?
    • 是 → 使用配置管理(参考references/configuration.md)
通常你会需要同时使用多种能力(例如,应用开发+配置管理,或数据操作+任务执行)。

Installation and Authentication

安装与认证

Install dxpy

安装dxpy

bash
uv pip install dxpy
bash
uv pip install dxpy

Login to DNAnexus

登录DNAnexus

bash
dx login
This authenticates your session and sets up access to projects and data.
bash
dx login
此命令将验证你的会话并设置项目与数据的访问权限。

Verify Installation

验证安装

bash
dx --version
dx whoami
bash
dx --version
dx whoami

Common Patterns

常见模式

Pattern 1: Batch Processing

模式1:批量处理

Process multiple files with the same analysis:
python
undefined
使用相同分析流程处理多个文件:
python
undefined

Find all FASTQ files

Find all FASTQ files

files = dxpy.find_data_objects( classname="file", name="*.fastq", project="project-xxxx" )
files = dxpy.find_data_objects( classname="file", name="*.fastq", project="project-xxxx" )

Launch parallel jobs

Launch parallel jobs

jobs = [] for file_result in files: job = dxpy.DXApplet("applet-xxxx").run({ "input": dxpy.dxlink(file_result["id"]) }) jobs.append(job)
jobs = [] for file_result in files: job = dxpy.DXApplet("applet-xxxx").run({ "input": dxpy.dxlink(file_result["id"]) }) jobs.append(job)

Wait for all completions

Wait for all completions

for job in jobs: job.wait_on_done()
undefined
for job in jobs: job.wait_on_done()
undefined

Pattern 2: Multi-Step Pipeline

模式2:多步骤流程

Chain multiple analyses together:
python
undefined
将多个分析任务链接在一起:
python
undefined

Step 1: Quality control

Step 1: Quality control

qc_job = qc_applet.run({"reads": input_file})
qc_job = qc_applet.run({"reads": input_file})

Step 2: Alignment (uses QC output)

Step 2: Alignment (uses QC output)

align_job = align_applet.run({ "reads": qc_job.get_output_ref("filtered_reads") })
align_job = align_applet.run({ "reads": qc_job.get_output_ref("filtered_reads") })

Step 3: Variant calling (uses alignment output)

Step 3: Variant calling (uses alignment output)

variant_job = variant_applet.run({ "bam": align_job.get_output_ref("aligned_bam") })
undefined
variant_job = variant_applet.run({ "bam": align_job.get_output_ref("aligned_bam") })
undefined

Pattern 3: Data Organization

模式3:数据整理

Organize analysis results systematically:
python
undefined
系统地整理分析结果:
python
undefined

Create organized folder structure

Create organized folder structure

dxpy.api.project_new_folder( "project-xxxx", {"folder": "/experiments/exp001/results", "parents": True} )
dxpy.api.project_new_folder( "project-xxxx", {"folder": "/experiments/exp001/results", "parents": True} )

Upload with metadata

Upload with metadata

result_file = dxpy.upload_local_file( "results.txt", project="project-xxxx", folder="/experiments/exp001/results", properties={ "experiment": "exp001", "sample": "sample1", "analysis_date": "2025-10-20" }, tags=["validated", "published"] )
undefined
result_file = dxpy.upload_local_file( "results.txt", project="project-xxxx", folder="/experiments/exp001/results", properties={ "experiment": "exp001", "sample": "sample1", "analysis_date": "2025-10-20" }, tags=["validated", "published"] )
undefined

Best Practices

最佳实践

  1. Error Handling: Always wrap API calls in try-except blocks
  2. Resource Management: Choose appropriate instance types for workloads
  3. Data Organization: Use consistent folder structures and metadata
  4. Cost Optimization: Archive old data, use appropriate storage classes
  5. Documentation: Include clear descriptions in dxapp.json
  6. Testing: Test apps with various input types before production use
  7. Version Control: Use semantic versioning for apps
  8. Security: Never hardcode credentials in source code
  9. Logging: Include informative log messages for debugging
  10. Cleanup: Remove temporary files and failed jobs
  1. 错误处理:始终将API调用包裹在try-except块中
  2. 资源管理:为工作负载选择合适的实例类型
  3. 数据整理:使用一致的文件夹结构与元数据
  4. 成本优化:归档旧数据,使用合适的存储类别
  5. 文档:在dxapp.json中包含清晰的描述
  6. 测试:在生产环境使用前,用多种输入类型测试应用
  7. 版本控制:为应用使用语义化版本
  8. 安全:切勿在源代码中硬编码凭证
  9. 日志:添加用于调试的信息性日志消息
  10. 清理:删除临时文件与失败的任务

Resources

资源

This skill includes detailed reference documentation:
本技能包含详细的参考文档:

references/

references/

  • app-development.md - Complete guide to building and deploying apps/applets
  • data-operations.md - File management, records, search, and project operations
  • job-execution.md - Running jobs, workflows, monitoring, and parallel processing
  • python-sdk.md - Comprehensive dxpy library reference with all classes and functions
  • configuration.md - dxapp.json specification and dependency management
Load these references when you need detailed information about specific operations or when working on complex tasks.
  • app-development.md - 构建与部署应用/小程序的完整指南
  • data-operations.md - 文件管理、记录、搜索与项目操作
  • job-execution.md - 任务运行、工作流、监控与并行处理
  • python-sdk.md - 包含所有类与函数的dxpy库综合参考
  • configuration.md - dxapp.json规范与依赖管理
当你需要了解特定操作的详细信息或处理复杂任务时,请查阅这些参考文档。

Getting Help

获取帮助