Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using demultiplexed BAM files output from zUMIs with Picard Tools #380

Open
seifudd opened this issue Nov 9, 2023 · 1 comment
Open

Comments

@seifudd
Copy link

seifudd commented Nov 9, 2023

Hi,

I am trying to use the demultiplexed BAM output from zUMIs with Picard Tools but, it does not seem to be working.

Below are a few lines from a demultiplexed BAM file (one sample) output from zUMIs:

A00267:423:HFMHMDRX3:1:2101:1000:34663  99      19      48452905        255     88M     =       48453094        277     GCTGTTCGTGCACCAGGGCGAGACCGAGCTGAAGGAGCTGCACT
GGCACCCGCAGTGCCCAGGGCTCCTGGTCAGCACGGCGCTGTCA    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:174    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCCGCANCAGCTCTGGATCAGAGC   XS:Z:Assigned2  XN:i:1  XT:Z:ENSG00000105447
A00267:423:HFMHMDRX3:1:2101:1000:34663  147     19      48453094        255     88M     =       48452905        -277    GGTTCATTCAGGTCTGTTGACTGAGACTGGCCGGCCTGTGGGCT
GCCGTGATGGATTCTGTTTGACGTATTGTTCTCTAGAAGGCCTG    FFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:174    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCCGCANCAGCTCTGGATCAGAGC   XS:Z:Assigned2  XN:i:1  XT:Z:ENSG00000105447
A00267:423:HFMHMDRX3:1:2101:1000:35978  83      2       105342985       255     88M     =       105339692       -3381   CCAGTAATGCCTTTAGAAAATTATCAAATTCCTCTTCGAGTGTT
TCACCCCTAATTTTGTCTTCCAATTTGCCTGTGAACAATAAAAC    FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:175    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:TTATTGTGTTCCCGAAGAATAGAT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000135974
A00267:423:HFMHMDRX3:1:2101:1000:35978  163     2       105339692       255     58M3098N30M     =       105342985       3381    GGGGGAAAATGATGGAAAAGAAAAGAGAACAACATG
AGATTAAAAATGAGACTAAAAGGAGTAGCACTGTAGATGGGTTAAGGAAAAG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F:FFF        NH:i:1  HI:i
:1      AS:i:175        nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:TTATTGTGTTCCCGAAGAATAGAT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000135974
A00267:423:HFMHMDRX3:1:2101:1000:36166  83      16      69718137        255     88M     =       69713114        -5111   GGTCTGCGGCTTCCAGCTTCTTTTGTTCAGCCACAATATCTGGG
CTCAGATGGCCTTCTTTATAAGCCAGAACAGACTCGGCAGGATA    :FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:175    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GCGAACTTTCAGTGGTGATGGAAA   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000181019
A00267:423:HFMHMDRX3:1:2101:1000:36166  163     16      69713114        255     16M1834N72M     =       69718137        5111    GCACTGCCTTCTTACTCCGGAAGGGTCCTTTGTCAT
ACATGGCAGCGTAAGTGTAAGCAAACTCTCCTATGAACACTCGCTCAAACCA    FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFF,,        NH:i:1  HI:i
:1      AS:i:175        nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GCGAACTTTCAGTGGTGATGGAAA   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000181019
A00267:423:HFMHMDRX3:1:2101:1009:15107  99      10      26501106        255     6M2085N78M8472N4M       =       26511819        10801   TCTCAGGAAGAGGAAGAAGCCCAAGCCA
AGGCTGATAAAATTAAGCTGGCGCTGGAAAAACTGAAGGAGGCCAAGGTTAAGAAGCTCG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF        NH:i
:1      HI:i:1  AS:i:177        nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCTGAACCTCTCCAAAAAACCTCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000077420
A00267:423:HFMHMDRX3:1:2101:1009:15107  147     10      26511819        255     88M     =       26501106        -10801  GATGTTCTGGACAACCTTTTCGAGAAAACTCATTGTGACTGCAA
TGTAGACTGGTGTCTTTATGAAATCTACCCGGAACTACAAATTG    :FFFFF:FFFF:FFFF,FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:177    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCTGAACCTCTCCAAAAAACCTCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000077420
A00267:423:HFMHMDRX3:1:2101:1009:15515  83      11      66003740        255     88M     =       66003676        -152    TGCCTTCGAGAGTGGTGCGACGCCTTCTTGTGATGCTCTCTGGG
AAGCTCTCAATCCCCAGCCCTCATCCAGAGTTTGCAGCCGAGTA    FFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:173    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GGGAGGAGTCCCAGATGAAGACCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000175334
A00267:423:HFMHMDRX3:1:2101:1009:15515  163     11      66003676        255     87M1S   =       66003740        152     CTTCCGGGAATGGCTGAAAGACACTTGTGGCGCCAACGCCAAGC
AGTCCCGGGACTGCTTCGGATGCCTTCGAGAGTGGTGCGACGCG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFF,        NH:i:1  HI:i:1  AS:i
:173    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GGGAGGAGTCCCAGATGAAGACCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000175334

Below is the output from Picard Tools CollectAlignmentSummaryMetrics run, assuming BAM files are coordinate sorted:

## htsjdk.samtools.metrics.StringHeader
# CollectAlignmentSummaryMetrics EXPECTED_PAIR_ORIENTATIONS=[] INPUT=Tunic.AGTGACCTCTCCTAGA.demx.bam OUTPUT=Tunic.AGTGACCTCTCCTAGA.demx.summary.metrics.txt    MAX_INSERT_SIZE=100000 ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG] METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
## htsjdk.samtools.metrics.StringHeader
# Started on: Thu Nov 09 00:35:11 EST 2023

## METRICS CLASS        picard.analysis.AlignmentSummaryMetrics
CATEGORY        TOTAL_READS     PF_READS        PCT_PF_READS    PF_NOISE_READS  PF_READS_ALIGNED        PCT_PF_READS_ALIGNED    PF_ALIGNED_BASES        PF_HQ_ALIGNED_READS PF_HQ_ALIGNED_BASES     PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE        PF_HQ_ERROR_RATE        PF_INDEL_RATE   MEAN_READ_LENGTH   READS_ALIGNED_IN_PAIRS   PCT_READS_ALIGNED_IN_PAIRS      PF_READS_IMPROPER_PAIRS PCT_PF_READS_IMPROPER_PAIRS     BAD_CYCLES      STRAND_BALANCE  PCT_CHIMERAS    PCT_ADAPTER SAMPLE  LIBRARY READ_GROUP
FIRST_OF_PAIR   71490794        71490794        1       62694189        0       0       0       0       0       0       0       0       0       0       88      0  0
        0       0       0       0       0       0.003427
SECOND_OF_PAIR  71475327        71475327        1       62037196        0       0       0       0       0       0       0       0       0       0       88      0  0
        0       0       0       0       0       0.000061
PAIR    142966121       142966121       1       124731385       0       0       0       0       0       0       0       0       0       0       88      0       0  0
        0       0       0       0       0.001744

There is no output for PF_READS_ALIGNED? Everything seems to be going to PF_NOISE_READS.

The same behavior happens when I try to use the <>.filtered.tagged.Aligned.out.bam

Am I missing something? I thought that the BAM files output from zUMIs were compatible with Picard tools etc.

Attached is the yaml file:

Tunic.zUMIs_config_formated.yaml.txt

Attached is the command line log file output from zUMIs:

Tunic.command_line_output_zummis.txt

Thank you for your help. Appreciate it.

Thanks, Fayaz

@cziegenhain
Copy link
Collaborator

Hi,

Not sure here because the BAM file output you show looks properly formatted.
Maybe Picard tools expects a sorted file? Try running samtools sort prior to your Picard command.

Best,
Christoph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants