Skip to content

Latest commit

 

History

History
42 lines (29 loc) · 5.01 KB

4_FILE-TYPES.md

File metadata and controls

42 lines (29 loc) · 5.01 KB

Genomic File Types

This page lists types of files used in genomic analysis. To work with actual (sub-sampled size) example genomic files of the types listed below go to this link

Summary Table

Type Name Phase Notes Example File Image
FASTA sequencer file 1a-from sequencer includes dictionary & index (.fai) files FASTA
FASTQ sequencer file w/quality 1b-from sequencer includes base quality PHRED score FASTQ
UBAM unmapped binary alignment file 1c-from sequencer (processed) binary format No Image
SAM sequence alignment file 2a-align to reference text format SAM-format SAM
BAM binary alignment file 2b-align to reference binary format, being read with IGV viewer, can include index (.bai) files BAM
CRAM compressed binary alignment file 2c-align to reference binary format no image
VCF variant call format 3a-find variants plain text VCF VCF-format
GVCF genomic variant call format 3b-find variants contains extra info GVCF
Other text files - TSV, CSV, BED, BZ2 (compressed text) text files for genomics 4-any phase contains extra info no image

Learn More -> Links

  • 📘Big List of genomic file types and descriptions - link from The Broad
  • 📘IGV (Integrative Genomics Viewer) tool - link from The Broad
  • :octocat: Learning how to work with VCF (Variant Call Format) files link
  • 📘General reference 'How sequencing works' - link
  • 📘GATK tools (from The Broad) to convert genomic files - link - from/to common formats (i.e. paired FASTQ to unmapped BAM, etc...)
  • 📘How to generate a BAM - link & image below from The Broad

Generate-BAM

Image References

  • FASTA/FASTQ images from link
  • BAM w/IGV from link
  • VCF image from link
  • VCF alterate image from link
  • GVCF vs. VCF comparison from link