Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got "IndexError: list index out of range" in generate_multihetsep.py #31

Open
ymatmt opened this issue Jun 2, 2020 · 12 comments
Open

Comments

@ymatmt
Copy link

ymatmt commented Jun 2, 2020

I am trying to use MSMC in the Cat species. But I got an error as follows.

msmc-tools-master/generate_multihetsep.py --chr=${CHR} --mask=${BAM}out_mask_chr${CHR}.vcf.gz ${VCF}${CHR}phased.vcf.gz > ${VCF}${CHR}_multihetsep.txt

generating msmc input file with 2 haplotypes
adding mask: cat_msmc_test/bam/ERR2497923_sorted.bam_out_mask_chrA1.vcf.gz
Traceback (most recent call last):
File "msmc-tools-master/generate_multihetsep.py", line 200, in
maskIterators.append(MaskIterator(f))
File "msmc-tools-master/generate_multihetsep.py", line 19, in init
self.readLine()
File "msmc-tools-master/generate_multihetsep.py", line 29, in readLine
self.start = int(fields[1]) + 1
IndexError: list index out of range

I am wondering about one possibility getting the error is Chromosome number in Cat (e.g. A1, A2...). Do you have any ideas to solve it? Thank you!

@stschiff
Copy link
Owner

Are you sure you have a correct mask file here? It seems you're giving it a VCF instead of a BED file for the mask.

@Hjorvik
Copy link

Hjorvik commented Mar 8, 2022

Hi! I'm facing a similar error message but this time while reading the vcf file.

Traceback (most recent call last):
File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 195, in
joinedVcfIterator = JoinedVcfIterator(args.files, trios)
File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 131, in init
self.current_lines = [next(v) for v in self.vcfIterators]
File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 131, in
self.current_lines = [next(v) for v in self.vcfIterators]
File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 68, in next
chrom = fields[0]

The command I used is this:
python3 /proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py --mask=mask_files/$ind.$chr.bed.gz --mask=mapping_mask/Oar3.1.maskchr${chr}.mask.bed.gz vcf_out/$ind.$chr.vcf.gz > input_singleind/$ind.$chr.ff.msmc.inp

Python version: Python 3.7.3

@stschiff
Copy link
Owner

could you post the error message, not just the Traceback, please?

@Hjorvik
Copy link

Hjorvik commented Mar 21, 2022

The error message was the same as in the first message: "IndexError: list index out of range"

@stschiff
Copy link
Owner

Well, if the error occurred with the command chromosome = fields[0], it means that you have an empty line there that you're trying to parse. Your input files must be off.

@yangwukaidi
Copy link

@stschiff
I am using python 3.9. The command line I use and output are as follows.
python generate_multihetsep.py --chr 1 --mask 1Z-CWX10A-1.mask.bed.gz --mask 2QBZ-LFT3-1.mask.bed.gz --mask 3HNHZ-BLM3-1.mask.bed.gz --mask 4YZ-BLM2-1.mask.bed.gz --mask reLG01.mask.bed.gz 1Z-CWX10A-1.vcf.gz 2QBZ-LFT3-1.vcf.gz 3HNHZ-BLM3-1.vcf.gz 4YZ-BLM2-1.vcf.gz > LG01.multihetsep.txt
generating msmc input file with 8 haplotypes
Traceback (most recent call last):
File "generate_multihetsep.py", line 195, in
joinedVcfIterator = JoinedVcfIterator(args.files, trios)
File "generate_multihetsep.py", line 131, in init
self.current_lines = [next(v) for v in self.vcfIterators]
File "generate_multihetsep.py", line 131, in
self.current_lines = [next(v) for v in self.vcfIterators]
File "generate_multihetsep.py", line 73, in next
geno = fields[9][:3]
IndexError: list index out of range

I saw @Hjorvik had a similar problem, the difference is that the last line of output returned by my command is
"File "generate_multihetsep.py", line 73, in next
geno = fields[9][:3]".

What is the cause of this error, and how can I fix it?
Looking forward to your reply!

@stschiff
Copy link
Owner

This means that one of your VCFs is not in the right shape. My program expects the genotypes (e.g. 0|1) in the 10th columns (so index 9), and that doesn't seem to be the case in one line in your input.

@zcharlene
Copy link

Hi! @stschiff

I encountered a similar IndexError but this time about "alleles = [fields[3]]":

Traceback (most recent call last):
File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 202, in
joinedVcfIterator = JoinedVcfIterator(args.files, trios, as_phased)
File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 132, in init
self.current_lines = [next(v) for v in self.vcfIterators]
File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 132, in
self.current_lines = [next(v) for v in self.vcfIterators]
File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 71, in next
alleles = [fields[3]]
IndexError: list index out of range

The code I am applying is:
${TOOLdir}generate_multihetsep.py --chr ${LG}
--mask=${INDVMASKdir}${CHILD1}${LG}.bed.gz
--mask=${INDVMASKdir}${DAD1}
${LG}.bed.gz
--mask=${INDVMASKdir}${MOM1}${LG}.bed.gz
--mask=${INDVMASKdir}${CHILD2}
${LG}.bed.gz
--mask=${INDVMASKdir}${DAD2}${LG}.bed.gz
--mask=${INDVMASKdir}${MOM2}
${LG}.bed.gz
--trio 1,2,3
--trio 4,5,6
--mask= ${MAPMASKdir}V7_${LG}.mask.bed.gz
${VCFdir}${CHILD1}${LG}.vcf.gz ${VCFdir}${DAD1}${LG}.vcf.gz ${VCFdir}${MOM1}${LG}.vcf.gz
${VCFdir}${CHILD2}
${LG}.vcf.gz ${VCFdir}${DAD2}${LG}.vcf.gz ${VCFdir}${MOM2}${LG}.vcf.gz
> $OUTDIR/${CHILD1}.${LG}.multihetsep.txt

Would you please suggest why this issue potentially occurs? Thanks and look forward to your reply!

@stschiff
Copy link
Owner

As above, please check that your VCF file is the shape that my scripts expects it. See my previous comment.

@zcharlene
Copy link

Hi! @stschiff

Thank you so much for your quick response.

I would like to provide additional information regarding the issue I'm facing. Initially, I thought that including the trio information in the script would eliminate the need to phase the VCF files. However, as a troubleshooting step, I decided to phase the VCF files anyway. The heterozygotes are consistently represented as '0|1' or '1|0' after phasing. Unfortunately, despite this effort, I am still encountering the same error.

I would greatly appreciate any further suggestions or guidance you can offer to help resolve this issue. Thank you for your attention.

@zcharlene
Copy link

Hi @stschiff

After some investigation, I find it's actually the problem with the output file from SNPable. It works when I remove the mapability masks. What do you think the potential impact of removing this mask to my final results? Thanks!

@stschiff
Copy link
Owner

Could it just be that you use --trio 1,2,3 --trio 4,5,6 when it should actually be --trio 0,1,2 --trio 3,4,5? My indices are all 0-based, not 1-based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants