dnanexus-integration

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

DNAnexus Integration

DNAnexus 集成

Overview

概述

DNAnexus is a cloud platform for biomedical data analysis and genomics. Build and deploy apps/applets, manage data objects, run workflows, and use the dxpy Python SDK for genomics pipeline development and execution.

DNAnexus是一个用于生物医学数据分析和基因组学的云平台。支持构建并部署应用/小程序、管理数据对象、运行工作流，以及使用dxpy Python SDK进行基因组学流程的开发与执行。

When to Use This Skill

何时使用该技能

This skill should be used when:

Creating, building, or modifying DNAnexus apps/applets
Uploading, downloading, searching, or organizing files and records
Running analyses, monitoring jobs, creating workflows
Writing scripts using dxpy to interact with the platform
Setting up dxapp.json, managing dependencies, using Docker
Processing FASTQ, BAM, VCF, or other bioinformatics files
Managing projects, permissions, or platform resources

在以下场景中应使用本技能：

创建、构建或修改DNAnexus应用/小程序
上传、下载、搜索或整理文件与记录
运行分析、监控任务、创建工作流
使用dxpy编写脚本与平台交互
配置dxapp.json、管理依赖、使用Docker
处理FASTQ、BAM、VCF或其他生物信息学文件
管理项目、权限或平台资源

Core Capabilities

核心能力

The skill is organized into five main areas, each with detailed reference documentation:

本技能分为五个主要领域，每个领域都配有详细的参考文档：

1. App Development

1. 应用开发

Purpose: Create executable programs (apps/applets) that run on the DNAnexus platform.

Key Operations:

Generate app skeleton with
```
dx-app-wizard
```
Write Python or Bash apps with proper entry points
Handle input/output data objects
Deploy with
```
dx build
```
or
```
dx build --app
```
Test apps on the platform

Common Use Cases:

Bioinformatics pipelines (alignment, variant calling)
Data processing workflows
Quality control and filtering
Format conversion tools

Reference: See

references/app-development.md

for:

Complete app structure and patterns
Python entry point decorators
Input/output handling with dxpy
Development best practices
Common issues and solutions

目标：创建可在DNAnexus平台上运行的可执行程序（应用/小程序）。

关键操作：

使用
```
dx-app-wizard
```
生成应用骨架
编写带有正确入口点的Python或Bash应用
处理输入/输出数据对象
使用
```
dx build
```
或
```
dx build --app
```
部署应用
在平台上测试应用

常见用例：

生物信息学流程（比对、变异检测）
数据处理工作流
质量控制与过滤
格式转换工具

参考：详见

references/app-development.md

，包含：

完整的应用结构与模式
Python入口点装饰器
使用dxpy处理输入/输出
开发最佳实践
常见问题与解决方案

2. Data Operations

2. 数据操作

Purpose: Manage files, records, and other data objects on the platform.

Key Operations:

Upload/download files with

dxpy.upload_local_file()

and

dxpy.download_dxfile()

Create and manage records with metadata
Search for data objects by name, properties, or type
Clone data between projects
Manage project folders and permissions

Common Use Cases:

Uploading sequencing data (FASTQ files)
Organizing analysis results
Searching for specific samples or experiments
Backing up data across projects
Managing reference genomes and annotations

Reference: See

references/data-operations.md

for:

Complete file and record operations
Data object lifecycle (open/closed states)
Search and discovery patterns
Project management
Batch operations

目标：管理平台上的文件、记录及其他数据对象。

关键操作：

使用

dxpy.upload_local_file()

和

dxpy.download_dxfile()

上传/下载文件

创建并管理带有元数据的记录
按名称、属性或类型搜索数据对象
在项目间克隆数据
管理项目文件夹与权限

常见用例：

上传测序数据（FASTQ文件）
整理分析结果
搜索特定样本或实验数据
在项目间备份数据
管理参考基因组与注释信息

参考：详见

references/data-operations.md

，包含：

完整的文件与记录操作方法
数据对象生命周期（开放/关闭状态）
搜索与发现模式
项目管理
批量操作

3. Job Execution

3. 任务执行

Purpose: Run analyses, monitor execution, and orchestrate workflows.

Key Operations:

Launch jobs with
```
applet.run()
```
or
```
app.run()
```
Monitor job status and logs
Create subjobs for parallel processing
Build and run multi-step workflows
Chain jobs with output references

Common Use Cases:

Running genomics analyses on sequencing data
Parallel processing of multiple samples
Multi-step analysis pipelines
Monitoring long-running computations
Debugging failed jobs

Reference: See

references/job-execution.md

for:

Complete job lifecycle and states
Workflow creation and orchestration
Parallel execution patterns
Job monitoring and debugging
Resource management

目标：运行分析、监控执行过程并编排工作流。

关键操作：

使用
```
applet.run()
```
或
```
app.run()
```
启动任务
监控任务状态与日志
创建子任务进行并行处理
构建并运行多步骤工作流
通过输出引用链接任务

常见用例：

对测序数据运行基因组学分析
并行处理多个样本
多步骤分析流程
监控长时间运行的计算任务
调试失败的任务

参考：详见

references/job-execution.md

，包含：

完整的任务生命周期与状态
工作流创建与编排
并行执行模式
任务监控与调试
资源管理

4. Python SDK (dxpy)

4. Python SDK（dxpy）

Purpose: Programmatic access to DNAnexus platform through Python.

Key Operations:

Work with data object handlers (DXFile, DXRecord, DXApplet, etc.)
Use high-level functions for common tasks
Make direct API calls for advanced operations
Create links and references between objects
Search and discover platform resources

Common Use Cases:

Automation scripts for data management
Custom analysis pipelines
Batch processing workflows
Integration with external tools
Data migration and organization

Reference: See

references/python-sdk.md

for:

Complete dxpy class reference
High-level utility functions
API method documentation
Error handling patterns
Common code patterns

目标：通过Python以编程方式访问DNAnexus平台。

关键操作：

使用数据对象处理器（DXFile、DXRecord、DXApplet等）
使用高级函数完成常见任务
直接调用API进行高级操作
创建对象间的链接与引用
搜索与发现平台资源

常见用例：

数据管理自动化脚本
自定义分析流程
批量处理工作流
与外部工具集成
数据迁移与整理

参考：详见

references/python-sdk.md

，包含：

完整的dxpy类参考
高级实用函数
API方法文档
错误处理模式
常见代码模式

5. Configuration and Dependencies

5. 配置与依赖管理

Purpose: Configure app metadata and manage dependencies.

Key Operations:

Write dxapp.json with inputs, outputs, and run specs
Install system packages (execDepends)
Bundle custom tools and resources
Use assets for shared dependencies
Integrate Docker containers
Configure instance types and timeouts

Common Use Cases:

Defining app input/output specifications
Installing bioinformatics tools (samtools, bwa, etc.)
Managing Python package dependencies
Using Docker images for complex environments
Selecting computational resources

Reference: See

references/configuration.md

for:

Complete dxapp.json specification
Dependency management strategies
Docker integration patterns
Regional and resource configuration
Example configurations

目标：配置应用元数据并管理依赖项。

关键操作：

编写包含输入、输出和运行规范的dxapp.json
安装系统包（execDepends）
打包自定义工具与资源
使用资产管理共享依赖
集成Docker容器
配置实例类型与超时时间

常见用例：

定义应用输入/输出规范
安装生物信息学工具（samtools、bwa等）
管理Python包依赖
使用Docker镜像构建复杂环境
选择计算资源

参考：详见

references/configuration.md

，包含：

完整的dxapp.json规范
依赖管理策略
Docker集成模式
区域与资源配置
示例配置

Quick Start Examples

快速入门示例

Upload and Analyze Data

上传并分析数据

python

import dxpy

python

import dxpy

Upload input file

input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")

Run analysis

job = dxpy.DXApplet("applet-xxxx").run({ "reads": dxpy.dxlink(input_file.get_id()) })

Wait for completion

job.wait_on_done()

Download results

output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"] dxpy.download_dxfile(output_id, "aligned.bam")

undefined

output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"] dxpy.download_dxfile(output_id, "aligned.bam")

undefined

Search and Download Files

搜索并下载文件

python

import dxpy

python

import dxpy

Find BAM files from a specific experiment

files = dxpy.find_data_objects( classname="file", name="*.bam", properties={"experiment": "exp001"}, project="project-xxxx" )

Download each file

for file_result in files: file_obj = dxpy.DXFile(file_result["id"]) filename = file_obj.describe()["name"] dxpy.download_dxfile(file_result["id"], filename)

undefined

for file_result in files: file_obj = dxpy.DXFile(file_result["id"]) filename = file_obj.describe()["name"] dxpy.download_dxfile(file_result["id"], filename)

undefined

Create Simple App

创建简单应用

python

undefined

python

undefined

src/my-app.py

import dxpy import subprocess

@dxpy.entry_point('main') def main(input_file, quality_threshold=30): # Download input dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")

# Process
subprocess.check_call([
    "quality_filter",
    "--input", "input.fastq",
    "--output", "filtered.fastq",
    "--threshold", str(quality_threshold)
])

# Upload output
output_file = dxpy.upload_local_file("filtered.fastq")

return {
    "filtered_reads": dxpy.dxlink(output_file)
}

dxpy.run()

undefined

import dxpy import subprocess

@dxpy.entry_point('main') def main(input_file, quality_threshold=30): # Download input dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")

# Process
subprocess.check_call([
    "quality_filter",
    "--input", "input.fastq",
    "--output", "filtered.fastq",
    "--threshold", str(quality_threshold)
])

# Upload output
output_file = dxpy.upload_local_file("filtered.fastq")

return {
    "filtered_reads": dxpy.dxlink(output_file)
}

dxpy.run()

undefined

Workflow Decision Tree

工作流决策树

When working with DNAnexus, follow this decision tree:

Need to create a new executable?
- Yes → Use App Development (references/app-development.md)
- No → Continue to step 2
Need to manage files or data?
- Yes → Use Data Operations (references/data-operations.md)
- No → Continue to step 3
Need to run an analysis or workflow?
- Yes → Use Job Execution (references/job-execution.md)
- No → Continue to step 4
Writing Python scripts for automation?
- Yes → Use Python SDK (references/python-sdk.md)
- No → Continue to step 5
Configuring app settings or dependencies?
- Yes → Use Configuration (references/configuration.md)

Often you'll need multiple capabilities together (e.g., app development + configuration, or data operations + job execution).

使用DNAnexus时，请遵循以下决策树：

是否需要创建新的可执行程序？
- 是 → 使用应用开发（参考references/app-development.md）
- 否 → 继续步骤2
是否需要管理文件或数据？
- 是 → 使用数据操作（参考references/data-operations.md）
- 否 → 继续步骤3
是否需要运行分析或工作流？
- 是 → 使用任务执行（参考references/job-execution.md）
- 否 → 继续步骤4
是否正在编写Python自动化脚本？
- 是 → 使用Python SDK（参考references/python-sdk.md）
- 否 → 继续步骤5
是否正在配置应用设置或依赖项？
- 是 → 使用配置管理（参考references/configuration.md）

通常你会需要同时使用多种能力（例如，应用开发+配置管理，或数据操作+任务执行）。

Installation and Authentication

安装与认证

Install dxpy

安装dxpy

bash

uv pip install dxpy

bash

uv pip install dxpy

Login to DNAnexus

登录DNAnexus

bash

dx login

This authenticates your session and sets up access to projects and data.

bash

dx login

此命令将验证你的会话并设置项目与数据的访问权限。

Verify Installation

验证安装

bash

dx --version
dx whoami

bash

dx --version
dx whoami

Common Patterns

常见模式

Pattern 1: Batch Processing

模式1：批量处理

Process multiple files with the same analysis:

python

undefined

使用相同分析流程处理多个文件：

python

undefined

Find all FASTQ files

files = dxpy.find_data_objects( classname="file", name="*.fastq", project="project-xxxx" )

Launch parallel jobs

jobs = [] for file_result in files: job = dxpy.DXApplet("applet-xxxx").run({ "input": dxpy.dxlink(file_result["id"]) }) jobs.append(job)

Wait for all completions

for job in jobs: job.wait_on_done()

undefined

for job in jobs: job.wait_on_done()

undefined

Pattern 2: Multi-Step Pipeline

模式2：多步骤流程

Chain multiple analyses together:

python

undefined

将多个分析任务链接在一起：

python

undefined

Step 1: Quality control

qc_job = qc_applet.run({"reads": input_file})

Step 2: Alignment (uses QC output)

align_job = align_applet.run({ "reads": qc_job.get_output_ref("filtered_reads") })

Step 3: Variant calling (uses alignment output)

variant_job = variant_applet.run({ "bam": align_job.get_output_ref("aligned_bam") })

undefined

variant_job = variant_applet.run({ "bam": align_job.get_output_ref("aligned_bam") })

undefined

Pattern 3: Data Organization

模式3：数据整理

Organize analysis results systematically:

python

undefined

系统地整理分析结果：

python

undefined

Create organized folder structure

dxpy.api.project_new_folder( "project-xxxx", {"folder": "/experiments/exp001/results", "parents": True} )

Upload with metadata

result_file = dxpy.upload_local_file( "results.txt", project="project-xxxx", folder="/experiments/exp001/results", properties={ "experiment": "exp001", "sample": "sample1", "analysis_date": "2025-10-20" }, tags=["validated", "published"] )

undefined

undefined

Best Practices

最佳实践

Error Handling: Always wrap API calls in try-except blocks
Resource Management: Choose appropriate instance types for workloads
Data Organization: Use consistent folder structures and metadata
Cost Optimization: Archive old data, use appropriate storage classes
Documentation: Include clear descriptions in dxapp.json
Testing: Test apps with various input types before production use
Version Control: Use semantic versioning for apps
Security: Never hardcode credentials in source code
Logging: Include informative log messages for debugging
Cleanup: Remove temporary files and failed jobs

错误处理：始终将API调用包裹在try-except块中
资源管理：为工作负载选择合适的实例类型
数据整理：使用一致的文件夹结构与元数据
成本优化：归档旧数据，使用合适的存储类别
文档：在dxapp.json中包含清晰的描述
测试：在生产环境使用前，用多种输入类型测试应用
版本控制：为应用使用语义化版本
安全：切勿在源代码中硬编码凭证
日志：添加用于调试的信息性日志消息
清理：删除临时文件与失败的任务

Resources

资源

This skill includes detailed reference documentation:

本技能包含详细的参考文档：

references/

app-development.md - Complete guide to building and deploying apps/applets
data-operations.md - File management, records, search, and project operations
job-execution.md - Running jobs, workflows, monitoring, and parallel processing
python-sdk.md - Comprehensive dxpy library reference with all classes and functions
configuration.md - dxapp.json specification and dependency management

Load these references when you need detailed information about specific operations or when working on complex tasks.

app-development.md - 构建与部署应用/小程序的完整指南
data-operations.md - 文件管理、记录、搜索与项目操作
job-execution.md - 任务运行、工作流、监控与并行处理
python-sdk.md - 包含所有类与函数的dxpy库综合参考
configuration.md - dxapp.json规范与依赖管理

当你需要了解特定操作的详细信息或处理复杂任务时，请查阅这些参考文档。

Getting Help

获取帮助

Official documentation: https://documentation.dnanexus.com/
API reference: http://autodoc.dnanexus.com/
GitHub repository: https://github.com/dnanexus/dx-toolkit
Support: support@dnanexus.com

官方文档：https://documentation.dnanexus.com/
API参考：http://autodoc.dnanexus.com/
GitHub仓库：https://github.com/dnanexus/dx-toolkit
支持：support@dnanexus.com