Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AccuSyn Not accepting Chromosome IDs #3

Open
sanyalab opened this issue Mar 12, 2021 · 1 comment
Open

AccuSyn Not accepting Chromosome IDs #3

sanyalab opened this issue Mar 12, 2021 · 1 comment

Comments

@sanyalab
Copy link

sanyalab commented Mar 12, 2021

Hello,

I would like to use the AccuSyn Tool, but get an error
The first column in the GFF file is not following the correct format. The last character of the chromosomes needs to be a number or an uppercase letter (i.e. chr1, chrA1, chr1A).

Here is an example of the GFF file format I am trying to load
Chr10ED85E zm.1.ed85e.fgjs 4573532 4577563
Chr10ED85E zm.1.ed85e.fgjt 4578936 4581199
Chr10ED85E zm.1.ed85e.fgju 4582450 4588256
Chr10ED85E zm.1.ed85e.fgjv 4591373 4593178
Chr10ED85E zm.1.ed85e.fgjw 4594233 4595951
Chr10ED85E zm.1.ed85e.fgjx 4598827 4601066

Here is am example of the collinearity file
## Alignment 8010: score=410.0 e_value=5.7e-21 N=9 Chr04ED85E&Chr06ED85E minus
8010- 0: zm.1.ed85e.cgqt zm.1.ed85e.djio 1e-89
8010- 1: zm.1.ed85e.cgrc zm.1.ed85e.djik 2e-174
8010- 2: zm.1.ed85e.cgrj zm.1.ed85e.djij 0
8010- 3: zm.1.ed85e.cgrw zm.1.ed85e.djhw 0
8010- 4: zm.1.ed85e.cgrx zm.1.ed85e.djhv 0

I think I am following the guidelines. Can you please advice?

Secondly, can you remove this restriction about the way chromosomes are named? For example I may want to use my data that is in the scaffolded format. Instead you can implement that the name of the chromosome has to be alphanumeric

Thanks
Abhijit

@jorgenunezsiri
Copy link
Owner

Hello Abhijit,

Thank you for reaching out to me and for your feedback. I am sorry that it took me a while to get back to you. Please see my comments below.

Explanation:

The reason you are getting that error is that the chromosome identifiers can only have a maximum of 5 characters in length.

These restrictions were put in place given some common limitations of any circular visualization. We want to be able to appropriately distribute all chromosomes within the circular layout, meaning that if we allow longer names there will be less space to distribute them. Similarly, the more chromosomes we allow to show at the same time, the less space there will be, which is why scaffolded format chromosome identifiers are being ignored. Moreover, inside the "Connections" tab we are grouping how the chromosome identifiers are listed (e.g., feel free to view "Wheat (IWGSC)" from the sample files to get an idea of this), and having shorter names makes this problem trivial.

Solution:

Given that your chromosome identifiers are 10 characters in length, I would suggest running a script to remove the repeating pattern "ED85E" from all of them, and then AccuSyn should successfully load your files. You could also apply the same idea to transform the chromosome identifiers of your scaffolded format data if you would like to load it with AccuSyn (e.g., you could transform a scaffold format chromosome identifier from "scaffold001" to "sf001").

I recognize that a more appropriate long-term solution would be for the tool to load all chromosome identifiers and then allow pre-filtering of which ones to interact with inside the "Connections" tab, but unfortunately, this is not a priority for me to tackle at the moment.

I hope this answers your concern and that you can get the insights you were looking for by using AccuSyn.

Best wishes,
Jorge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants