Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol to configure AUGUSTUS to predict genes in genomes with different isochore regions? #378

Open
SchwarzEM opened this issue Jan 15, 2023 · 1 comment

Comments

@SchwarzEM
Copy link

One of the issues I may be facing in running AUGUSTUS on a new genome is the possibility that it has substantial variations in GC% content in different parts of the genome. As has been noted elsewhere, this can cause problems in trying to get accurate gene predictions with AUGUSTUS (which typically generates just one parameter set for an entire genome).

One of your pages of documentation for how to configure and run AUGUSTUS (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.Augustus) has the interesting link title, "Configure AUGUSTUS to predict genes in genomes with different isochore regions". However, when I click on the relevant URL:

http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.Isochores

I find myself looking at a blank page that requires a password login!

Is there some way to get access to actual, readable documentation for protocols that may help configure AUGUSTUS so that it deals with varying isochore regions in a genome?

Thank you for any information that you have.

@KatharinaHoff
Copy link
Member

This wiki should be taken offline... so sorry that it's still there. The page that you cannot see is empty, anyway...

Solving the problem is not hard, though. Every species has a *_parameters.cfg file. In that file, there are the following lines (probably with different numbers, I take the example from the honeybee parameter set):

/Constant/gc_range_min                0.15   # This range has an effect only when decomp_num_steps>1. 
/Constant/gc_range_max                0.55   # States the minimal and maximal percentage of c or g
/Constant/decomp_num_steps            7      # I recommend keeping this to 1 for most species.

There, you manually adjust the lower boundary of gc-content the genome, the upper boundary of gc-content in the genome, and then increase the decomp_num_steps to a value larger than 1. In human, we use the value 2. I am not 100% sure how we ended up using 7 in honeybee, it was probably a try-and-evalue-process on my end ;-)

And then, you run the training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants