Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed error for teflon_collapse.py but getting an error in teflon_genotype.py #16

Closed
heidihyang opened this issue Apr 23, 2024 · 2 comments

Comments

@heidihyang
Copy link

Hi,

Thank you for making TEFLoN! I ran into an issue where teflon_collapse.py was not grabbing the number of raw total sequences and was rather grabbing the last element of the line that included "raw total sequences:". I resolved the issue by removing the comment at the end of that line sed 's/# excluding supplementary and secondary reads//1' sample.stats.txt, but just a heads up for those using the script and running into this error:

Traceback (most recent call last): File "./teflon_collapse.py", line 165, in <module> main() File "./teflon_collapse.py", line 89, in main total_n=int(l.split()[-1]) ValueError: invalid literal for int() with base 10: 'reads'

I have now gotten to the teflon_genotype.py script and am getting the following error:

Lower-bound coverage threshold filters corresponding to samples ['ANG.5'] is [1] NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9 Upper-bound coverage threshold filters corresponding to samples ['ANG.5'] is [102] NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9 cdm: gunzip -c /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz > /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp loading pickle: /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp NOTE: this step can be time and memory intensive for large reference genomes pickle loaded! Converting coordinates from pseudospace to reference-based coordinates... Traceback (most recent call last): File "./teflon_genotype.py", line 122, in <module> main() File "./teflon_genotype.py", line 116, in main pt.pt_portal(countDir,genoDir,samples, posMap, stats, p2rC, l_thresh, h_thresh) File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/genotyper_poolType.py", line 51, in pt_portal p2rC.pseudo2refConvert_portal(outFILE1,posMap,outFILE2) File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/pseudo2refConvert.py", line 26, in pseudo2refConvert_portal with open(bedFILE, 'r') as fIN, open(outFILE, 'w') as fOUT: IOError: [Errno 2] No such file or directory: '/u/home/h/hyangg/project-vlsork/TEFLON/genotypes/ANG.5.genotypes.txt'

Any suggestions on how to resolve this? It seems that it's not finding the genotypes.txt file, which means there's an error in producing it. Thanks for your help in advance!

Best,
Heidi

@heidihyang heidihyang changed the title fixed error for teflon_collapse.py but still getting an error in teflon_genotype.py fixed error for teflon_collapse.py but getting an error in teflon_genotype.py Apr 23, 2024
@heidihyang
Copy link
Author

heidihyang commented May 4, 2024

Hi, I am no longer getting this error but the genotype folder is empty despite the program saying it's finished.

Below is my joblog:

(teflon_env) [hyangg@n1826 TEFLON]$ python ./teflon_genotype.py \

-wd $HOME/project-vlsork/TEFLON/
-d $HOME/project-vlsork/TEFLON/qlob.prep_TF/
-s $HOME/project-vlsork/TEFLON/samples.txt
-dt "pooled"
Lower-bound coverage threshold filters corresponding to samples ['ANG.5', 'WOO.4.w6'] is [1, 1]
NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9
Upper-bound coverage threshold filters corresponding to samples ['ANG.5', 'WOO.4.w6'] is [82, 366]
NOTE: all sites with adjusted read counts > upper-bound coverage threshold will be marked -9
cdm: gunzip -c /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz > /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp
loading pickle: /u/home/h/hyangg/project-vlsork/TEFLON/qlob.prep_TF/qlob.pseudo2ref.pickle.gz.tmp
NOTE: this step can be time and memory intensive for large reference genomes
pickle loaded!
Converting coordinates from pseudospace to reference-based coordinates...
Traceback (most recent call last):
File "./teflon_genotype.py", line 122, in
main()
File "./teflon_genotype.py", line 116, in main
pt.pt_portal(countDir,genoDir,samples, posMap, stats, p2rC, l_thresh, h_thresh)
File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/genotyper_poolType.py", line 51, in pt_portal
p2rC.pseudo2refConvert_portal(outFILE1,posMap,outFILE2)
File "/u/project/vlsork/hyangg/TEFLON/teflon_scripts/pseudo2refConvert.py", line 50, in pseudo2refConvert_portal
ls[2]=pseudoMap[chrom][int(ls[2])]
IndexError: list index out of range

@heidihyang
Copy link
Author

I solved the issue - RepeatMasker annotates for satellite repeats and low-complexity repeats in addition to TEs, but these were not in my TE hierarchy text. You can probably find a way to include them in the workflow but once I removed those from my TE reference bed file and re-ran the reference prep step everything worked smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant