Loading...
Loading...
Compare original and translation side by side
s3://genomeark/--no-sign-requests3://genomeark/--no-sign-requests3://genomeark/
└── species/
└── {Genus_species}/ # e.g., Rhinolophus_ferrumequinum
└── {ToLID}/ # e.g., mRhiFer1 (VGP specimen ID)
├── assembly_vgp_{type}_{version}/
│ ├── evaluation/ # QC metrics (MAIN ACCESS POINT)
│ │ ├── genomescope/
│ │ ├── busco/
│ │ ├── merqury/
│ │ └── ...
│ └── intermediates/ # K-mer databases, temp files
│ └── meryl/
└── genomic_data/ # Raw sequencing data folderss3://genomeark/
└── species/
└── {Genus_species}/ # 示例:Rhinolophus_ferrumequinum
└── {ToLID}/ # 示例:mRhiFer1(VGP样本ID)
├── assembly_vgp_{type}_{version}/
│ ├── evaluation/ # QC指标(主要访问入口)
│ │ ├── genomescope/
│ │ ├── busco/
│ │ ├── merqury/
│ │ └── ...
│ └── intermediates/ # K-mer数据库、临时文件
│ └── meryl/
└── genomic_data/ # 原始测序数据文件夹assembly_vgp_HiC_2.0assembly_vgp_standard_2.0assembly_vgp_hic_2.0assembly_vgp_trio_2.0assembly_vgp_standard_1.6assembly_vgp_standard_1.0assembly_vgp_HiC_1.6assembly_vgp_HiC_1.0assembly_vgp_HiC_1.4assembly_verkko_1.4/assembly_verkko_1.1-0.1/assembly_verkko_1.1-0.1-freeze/assembly_verkko_1.1-0.2/assembly_verkko_1.4.1r/assembly_primate_v1.4.2/assembly_fish_*assembly_bird_*assembly_rockefeller/assembly_cambridge/assembly_MT_rockefeller/assembly_mt_rockefeller/assembly_mt_milan/vgp_standard_1.6/vgp_standard_1.0/vgp_HiC_1.6/assembly_curated/assembly_vgp_hic_2.0assembly_vgp_HiC_2.0assembly_vgp_HiC_2.0assembly_vgp_standard_2.0assembly_vgp_hic_2.0assembly_vgp_trio_2.0assembly_vgp_standard_1.6assembly_vgp_standard_1.0assembly_vgp_HiC_1.6assembly_vgp_HiC_1.0assembly_vgp_HiC_1.4assembly_verkko_1.4/assembly_verkko_1.1-0.1/assembly_verkko_1.1-0.1-freeze/assembly_verkko_1.1-0.2/assembly_verkko_1.4.1r/assembly_primate_v1.4.2/assembly_fish_*assembly_bird_*assembly_rockefeller/assembly_cambridge/assembly_MT_rockefeller/assembly_mt_rockefeller/assembly_mt_milan/vgp_standard_1.6/vgp_standard_1.0/vgp_HiC_1.6/assembly_curated/assembly_vgp_hic_2.0assembly_vgp_HiC_2.0| Data Type | Location | Key Notes |
|---|---|---|
| GenomeScope | | 3 filename patterns (double/single/no underscore); validate heterozygosity ranges |
| BUSCO | | Dynamic subdir search (c/, p/, c1/, p1/); parse |
| Merqury | | Two path layouts (direct vs nested); QV in column 4 |
| Meryl hist | | Use |
| Assembly dates | FASTA filenames | YYYYMMDD stamps; see assembly-date-extraction.md |
| Technology | | |
| 数据类型 | 位置 | 关键说明 |
|---|---|---|
| GenomeScope | | 3种文件名模式(双下划线/单下划线/无下划线);需验证杂合度范围 |
| BUSCO | | 动态搜索子目录(c/、p/、c1/、p1/);解析 |
| Merqury | | 两种路径布局(直接/嵌套);QV值位于第4列 |
| Meryl直方图 | | 仅使用 |
| 组装日期 | FASTA文件名 | YYYYMMDD时间戳;请查阅assembly-date-extraction.md |
| 测序技术 | | |
def normalize_s3_path(s3_path):
"""Normalize path for GenomeArk (case sensitivity!)"""
if not s3_path:
return None
s3_path = s3_path.replace('/assembly_vgp_hic_2.0/', '/assembly_vgp_HiC_2.0/')
if not s3_path.endswith('/'):
s3_path += '/'
return s3_pathdef normalize_s3_path(s3_path):
"""标准化GenomeArk的路径(注意区分大小写!)"""
if not s3_path:
return None
s3_path = s3_path.replace('/assembly_vgp_hic_2.0/', '/assembly_vgp_HiC_2.0/')
if not s3_path.endswith('/'):
s3_path += '/'
return s3_path{ToLID}_genomescope__Summary.txt{ToLID}_genomescope_Summary.txt{ToLID}_Summary.txt{ToLID}_genomescope__Summary.txt{ToLID}_genomescope_Summary.txt{ToLID}_Summary.txthttps://genomeark.s3.amazonaws.com/species/{species}/{tolid}/assembly_vgp_standard_1.0/intermediates/meryl/{tolid}.cut.meryl.histhttps://genomeark.s3.amazonaws.com/species/{species}/{tolid}/assembly_vgp_standard_1.0/intermediates/meryl/{tolid}.cut.meryl.histcmd = ['aws', 's3', 'cp', s3_path, '-', '--no-sign-request']
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)hicHiC.histcmd = ['aws', 's3', 'cp', s3_path, '-', '--no-sign-request']
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)hicHiC.hist