Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem in getting trees by running raxml_sliding_windows.py #31

Open
Niloofar-Alaei opened this issue Apr 28, 2022 · 5 comments
Open

Comments

@Niloofar-Alaei
Copy link

Hi Simon
I want to run TWISST for my dataset. I followed your tutorial to get the trees as input for TWISST.

I have a diploid genotype and my data is not suitable for phasing, then I used your script to produce the .geno file. And then, I used raxml_sliding_windows.py to get the window trees. It ran without any error and also gave me the .data.tsv file but the .trees.gz file was included only NA. I also ran this script for the example file (msms_4of10_l50k_r500_sweep.seqgen.SNP.geno.gz), but the same happens.

I used python2 and this command:

python raxml_sliding_windows.py -g msms_4of10_l50k_r500_sweep.seqgen.SNP.geno.gz --prefix out.50 --windType sites -w 50 --model HKY85

Do you have any idea why it happened?

I also have another question: Is it possible to use the ML trees I got by using sequences (including both variant and invariant sites) in IQTREE2 as input for TWISST?

With the best and thanks
Niloo

@simonhmartin
Copy link
Owner

Hi Niloo,
You can use any trees s input for Twisst. I personally recommend using the phyml windows script, as we got better results with that approach when testing. To diagnose your problem, I recommend trying the --test option, which will cause the script to make only 10 windows, and retain the temporary files so you can check them for issues. The --log option can also help in case raxml or phyml are giving errors.
I hope this helps you find the problem.
Simon

@Niloofar-Alaei
Copy link
Author

Thanks a lot for your response.

Then I will use the ML trees that I got from the IQtree. Because in that approach, I used the unphased data.

With the best

@simonhmartin
Copy link
Owner

Just to add an opinion on this. I assume you mean that you have taken each diploid individual as a single tip in the tree? Using this approach is not ideal, because actually each haploid allele inside a diploid cell has an independent evolutionary history, and could belong in an entirely different part of the genealogy. When forcing ML software to consider a diploid individual as a tip, you are violating an assumption of the tree inference, which is that each tip has a single evolutionary history. Instead, the ML inference will be trying to force two different histories into one for each individual.
In practice, running Twisst on a tree in which the two alleles from each individual are forced into a single tip will probably not impact your ability to find extreme outliers: in these regions of the genome that approach a single monophyletic tree, the alleles in a single diploid individual will tend to be closely related, so it is not too unreasonable to combine them in the tree. However, I would be very cautious about interpreting intermediate topology weightings, because these could reflect trees where the ML inference did the best it could with highly inconsistent signals, potentially resulting in a tree that is quite far from the reality.

@Niloofar-Alaei
Copy link
Author

Thanks, Simon, for letting me know your opinion.
Yes, I have an unphased vcfFile (including both variant and invariant sites) and then use my script to define some non-overlapping windows that include IUPAC Ambiguity Codes. Then as you said, in the trees, each diploid individual is a single tip.

Then you think it's better just to use the vcf files including SNPs and then phase it. am I right?

@simonhmartin
Copy link
Owner

Hi Niloo, sorry for never responding to this. Yes, my recommendation is to first phase before making trees, if possible.
Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants