fixed error for teflon_collapse.py but getting an error in teflon_genotype.py #16

heidihyang · 2024-04-23T22:34:50Z

Hi,

Thank you for making TEFLoN! I ran into an issue where teflon_collapse.py was not grabbing the number of raw total sequences and was rather grabbing the last element of the line that included "raw total sequences:". I resolved the issue by removing the comment at the end of that line sed 's/# excluding supplementary and secondary reads//1' sample.stats.txt, but just a heads up for those using the script and running into this error:

Traceback (most recent call last): File "./teflon_collapse.py", line 165, in <module> main() File "./teflon_collapse.py", line 89, in main total_n=int(l.split()[-1]) ValueError: invalid literal for int() with base 10: 'reads'

I have now gotten to the teflon_genotype.py script and am getting the following error:

Lower-bound coverage threshold filters corresponding to samples ['ANG.5'] is [1] NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9 Upper-bound coverage threshold filters corresponding to samples ['ANG.5'] is [102] NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9 cdm: gunzip -c /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz > /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp loading pickle: /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp NOTE: this step can be time and memory intensive for large reference genomes pickle loaded! Converting coordinates from pseudospace to reference-based coordinates... Traceback (most recent call last): File "./teflon_genotype.py", line 122, in <module> main() File "./teflon_genotype.py", line 116, in main pt.pt_portal(countDir,genoDir,samples, posMap, stats, p2rC, l_thresh, h_thresh) File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/genotyper_poolType.py", line 51, in pt_portal p2rC.pseudo2refConvert_portal(outFILE1,posMap,outFILE2) File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/pseudo2refConvert.py", line 26, in pseudo2refConvert_portal with open(bedFILE, 'r') as fIN, open(outFILE, 'w') as fOUT: IOError: [Errno 2] No such file or directory: '/u/home/h/hyangg/project-vlsork/TEFLON/genotypes/ANG.5.genotypes.txt'

Any suggestions on how to resolve this? It seems that it's not finding the genotypes.txt file, which means there's an error in producing it. Thanks for your help in advance!

Best,
Heidi

The text was updated successfully, but these errors were encountered:

heidihyang · 2024-05-04T23:04:14Z

Hi, I am no longer getting this error but the genotype folder is empty despite the program saying it's finished.

Below is my joblog:

(teflon_env) [hyangg@n1826 TEFLON]$ python ./teflon_genotype.py \

-wd $HOME/project-vlsork/TEFLON/
-d $HOME/project-vlsork/TEFLON/qlob.prep_TF/
-s $HOME/project-vlsork/TEFLON/samples.txt
-dt "pooled"
Lower-bound coverage threshold filters corresponding to samples ['ANG.5', 'WOO.4.w6'] is [1, 1]
NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9
Upper-bound coverage threshold filters corresponding to samples ['ANG.5', 'WOO.4.w6'] is [82, 366]
NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9
cdm: gunzip -c /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz > /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp
loading pickle: /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp
NOTE: this step can be time and memory intensive for large reference genomes
pickle loaded!
Converting coordinates from pseudospace to reference-based coordinates...
Traceback (most recent call last):
File "./teflon_genotype.py", line 122, in
main()
File "./teflon_genotype.py", line 116, in main
pt.pt_portal(countDir,genoDir,samples, posMap, stats, p2rC, l_thresh, h_thresh)
File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/genotyper_poolType.py", line 51, in pt_portal
p2rC.pseudo2refConvert_portal(outFILE1,posMap,outFILE2)
File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/pseudo2refConvert.py", line 50, in pseudo2refConvert_portal
ls[2]=pseudoMap[chrom][int(ls[2])]
IndexError: list index out of range

heidihyang · 2024-05-15T18:27:36Z

I solved the issue - RepeatMasker annotates for satellite repeats and low-complexity repeats in addition to TEs, but these were not in my TE hierarchy text. You can probably find a way to include them in the workflow but once I removed those from my TE reference bed file and re-ran the reference prep step everything worked smoothly.

heidihyang changed the title ~~fixed error for teflon_collapse.py but still getting an error in teflon_genotype.py~~ fixed error for teflon_collapse.py but getting an error in teflon_genotype.py Apr 23, 2024

heidihyang closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed error for teflon_collapse.py but getting an error in teflon_genotype.py #16

fixed error for teflon_collapse.py but getting an error in teflon_genotype.py #16

heidihyang commented Apr 23, 2024

heidihyang commented May 4, 2024 •

edited

heidihyang commented May 15, 2024

fixed error for teflon_collapse.py but getting an error in teflon_genotype.py #16

fixed error for teflon_collapse.py but getting an error in teflon_genotype.py #16

Comments

heidihyang commented Apr 23, 2024

heidihyang commented May 4, 2024 • edited

heidihyang commented May 15, 2024

heidihyang commented May 4, 2024 •

edited