You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
...
processing pos 115000000
processing pos 116000000
Traceback (most recent call last):
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 226, in
if mergedMask.getVal(snp_pos):
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 51, in getVal
return all((m.getVal(pos) for m in self.maskIterators))
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 51, in
return all((m.getVal(pos) for m in self.maskIterators))
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 35, in getVal
assert pos >= self.lastPos
AssertionError
My vcf and masks are of the complete genome, i.e. multiple chromosomes.
If I run the generate_multihetsep.py on the whole genome or just select a chromosome, I always get the same result. Last lines of the different multihetsep.txt files (upto abort):
beginning of the vcf/chromosome Z (length 117067010)
SUPER_Z 116781329 76024 AG
SUPER_Z 116791378 10049 TC
SUPER_Z 116900943 94985 AG
chromosome W (length 37400844) - selected as --chr SUPER_W
SUPER_W 116781329 76024 AG
SUPER_W 116791378 10049 TC
SUPER_W 116900943 94985 AG
chromosome 28_unloc_1(length 32972) - selected as --chr SUPER_28_unloc_1
SUPER_28_unloc_1 116781329 76024 AG
SUPER_28_unloc_1 116791378 10049 TC
SUPER_28_unloc_1 116900943 94985 AG
The outputs are identical. generate_multihetsep.py always starts at the beginning of the vcf (SUPER_Z) and continues until it reads through the first chromosome (SUPER_Z) and then crashes.
My vcf was generated using gatk, and filtered using bcftools (standard hard filters, and to include biallelic SNPs only). There are no issues analysing or manipulating this vcf in other programs. The last three SNPs of the SUPER_Z chromosome, and first two SNPs of the SUPER_W chromosome
SUPER_Z 116781329 . A G 70.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=0;DP=4;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=17.66;ReadPosRankSum=0.674;SOR=2.303 GT:AD:DP:GQ:PL 0|1:2,2:4:78:78,0,78
SUPER_Z 116791378 . T C 70.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-0.967;DP=3;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=23.55;ReadPosRankSum=0.967;SOR=0.223 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:1,2:3:36:0|1:116791365_CA_C:78,0,36:116791365
SUPER_Z 116900943 . A G 73.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-0.431;DP=3;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=24.55;ReadPosRankSum=-0.967;SOR=0.223 GT:AD:DP:GQ:PL 0|1:1,2:3:36:81,0,36
SUPER_W 270520 . A T 179.96 PASS AC=2;AF=1;AN=2;DP=5;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=50.92;QD=26.52;SOR=1.022 GT:AD:DP:GQ:PL 1|1:0,5:5:15:194,15,0
SUPER_W 270593 . T C 210.96 PASS AC=2;AF=1;AN=2;DP=5;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=50.92;QD=30.56;SOR=1.022 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,5:5:15:1|1:270593_T_C:225,15,0:270593
SUPER_W 270597 . G C 210.96 PASS AC=2;AF=1;AN=2;DP=5;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=50.92;QD=30.23;SOR=1.022 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,5:5:15:1|1:270593_T_C:225,15,0:270593
My mask was generated using bedtools bam2bed.
SUPER_Z 1661 15824 m54306U_210130_202436/90440587/ccs 60 -
SUPER_Z 2021 21862 m54306U_210130_202436/983948/ccs 60 -
SUPER_Z 7945 15108 m54306U_210130_202436/33752681/ccs 60 -
My mappability mask was generated using the steps in https://lh3lh3.users.sourceforge.net/snpable.shtml, and corresponding bed file extracted using a python script.
SUPER_Z 84 270
SUPER_Z 287 1210
SUPER_Z 1234 1323
Based on this, am I to assume that the generate_multihetsep.py was written to process only one chromosome at a time? Or is this a bug in generate_multihetsep.py?
The text was updated successfully, but these errors were encountered:
I am getting a consistent assertion error:
...
processing pos 115000000
processing pos 116000000
Traceback (most recent call last):
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 226, in
if mergedMask.getVal(snp_pos):
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 51, in getVal
return all((m.getVal(pos) for m in self.maskIterators))
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 51, in
return all((m.getVal(pos) for m in self.maskIterators))
File "/home/tomas/bin/msmc-tools/generate_multihetsep.py", line 35, in getVal
assert pos >= self.lastPos
AssertionError
call:
generate_multihetsep.py --mask=harpia_mask.bed.gz --negative_mask=harpia_mappability_mask.bed.gz harpia_pacbio_genotyped_SNP_final.vcf.gz > harpia.multihetsep.txt
My vcf and masks are of the complete genome, i.e. multiple chromosomes.
If I run the generate_multihetsep.py on the whole genome or just select a chromosome, I always get the same result. Last lines of the different multihetsep.txt files (upto abort):
beginning of the vcf/chromosome Z (length 117067010)
SUPER_Z 116781329 76024 AG
SUPER_Z 116791378 10049 TC
SUPER_Z 116900943 94985 AG
chromosome W (length 37400844) - selected as --chr SUPER_W
SUPER_W 116781329 76024 AG
SUPER_W 116791378 10049 TC
SUPER_W 116900943 94985 AG
chromosome 28_unloc_1(length 32972) - selected as --chr SUPER_28_unloc_1
SUPER_28_unloc_1 116781329 76024 AG
SUPER_28_unloc_1 116791378 10049 TC
SUPER_28_unloc_1 116900943 94985 AG
The outputs are identical. generate_multihetsep.py always starts at the beginning of the vcf (SUPER_Z) and continues until it reads through the first chromosome (SUPER_Z) and then crashes.
My vcf was generated using gatk, and filtered using bcftools (standard hard filters, and to include biallelic SNPs only). There are no issues analysing or manipulating this vcf in other programs. The last three SNPs of the SUPER_Z chromosome, and first two SNPs of the SUPER_W chromosome
SUPER_Z 116781329 . A G 70.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=0;DP=4;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=17.66;ReadPosRankSum=0.674;SOR=2.303 GT:AD:DP:GQ:PL 0|1:2,2:4:78:78,0,78
SUPER_Z 116791378 . T C 70.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-0.967;DP=3;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=23.55;ReadPosRankSum=0.967;SOR=0.223 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:1,2:3:36:0|1:116791365_CA_C:78,0,36:116791365
SUPER_Z 116900943 . A G 73.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-0.431;DP=3;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=24.55;ReadPosRankSum=-0.967;SOR=0.223 GT:AD:DP:GQ:PL 0|1:1,2:3:36:81,0,36
SUPER_W 270520 . A T 179.96 PASS AC=2;AF=1;AN=2;DP=5;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=50.92;QD=26.52;SOR=1.022 GT:AD:DP:GQ:PL 1|1:0,5:5:15:194,15,0
SUPER_W 270593 . T C 210.96 PASS AC=2;AF=1;AN=2;DP=5;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=50.92;QD=30.56;SOR=1.022 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,5:5:15:1|1:270593_T_C:225,15,0:270593
SUPER_W 270597 . G C 210.96 PASS AC=2;AF=1;AN=2;DP=5;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=50.92;QD=30.23;SOR=1.022 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,5:5:15:1|1:270593_T_C:225,15,0:270593
My mask was generated using bedtools bam2bed.
SUPER_Z 1661 15824 m54306U_210130_202436/90440587/ccs 60 -
SUPER_Z 2021 21862 m54306U_210130_202436/983948/ccs 60 -
SUPER_Z 7945 15108 m54306U_210130_202436/33752681/ccs 60 -
My mappability mask was generated using the steps in https://lh3lh3.users.sourceforge.net/snpable.shtml, and corresponding bed file extracted using a python script.
SUPER_Z 84 270
SUPER_Z 287 1210
SUPER_Z 1234 1323
Based on this, am I to assume that the generate_multihetsep.py was written to process only one chromosome at a time? Or is this a bug in generate_multihetsep.py?
The text was updated successfully, but these errors were encountered: