Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[A SOLUTION TO] => samtools issue : incomplete aux field/truncated file #89

Open
RemiMaglione opened this issue Nov 22, 2020 · 2 comments

Comments

@RemiMaglione
Copy link

Hello nglmr users/dev,

Looks like a few people ran into the incomplete aux field/truncated file samtools issue. Since I also ran into this problem, I put my investigator hat, dig on each "wrong/corrupt" lines and found that they all shared the same issue: the mapping quality was < 0 (e.g: "-2147483648").
Since it concern only a few sequences (my worse case scenario was 1 over 3.000), here's how I handle this :

awk -F "\t" '{if ($5 >= 0 || substr ($1, 0, 1)=="@") print $0} myBad.sam' > myGood.sam

Basically, it keeps only lines where quality feilds ($5) are equal to 0 or more, and keep header lines (i.e: lines starting with @)

If you have multiple .sam files :
for i in *.sam ; do echo $i ; awk -F "\t" '{if ($5 > -1 || substr ($1, 0, 1)=="@") print $0}' $i > $(basename $i ".sam")"g.sam" ; done

Hope it could help,
Cheers

@RemiMaglione
Copy link
Author

Note: I also found that using samtools in multi-threading mode didn't work with files generated with ngmlr. It still send me the incomplete aux field/truncated error even if it worked perfectly in a single thread. I don't know if it's a personal config issue, didn't find a solution to fix this so far, just though it could help.

@wangshun1121
Copy link

Reads with bad quality should be marked rather than removing them directly. Here's my simple perl script MarkBedQUAL.pl :

#!/usr/bin/env perl
while(<>){
        if(m/^\@/){print $_;next;}
        @a = split/\t/;
        if($a[4] <0){
                $a[4] = 0;
                unless($a[1] & 512){
                        $a[1] += 512;
                }
        }
        $_=join("\t",@a);
        print $_;
} 

Then combinengmlr and samtools with pipe:

ngmlr -t 4 -x ont -r ref.fa -q ont.fastq.gz | 
        perl  MarkBedQUAL.pl |
        samtools view -bhS |
        samtools sort > align.bam

Reads with bad quality can be filtered directly by flag:

samtools view -F 512 align.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants