Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble using 'first' op on dbNSFP txt file #129

Open
IvantheDugtrio opened this issue Sep 23, 2020 · 5 comments
Open

Trouble using 'first' op on dbNSFP txt file #129

IvantheDugtrio opened this issue Sep 23, 2020 · 5 comments

Comments

@IvantheDugtrio
Copy link

I having trouble removing multi-allelic annotations from the dbNSFP reference text file, as it seems to always parse these fields as 'self'. I am thinking the problem has to do with limitations in parsing a text file versus a VCF. Is there a workaround?

@brentp
Copy link
Owner

brentp commented Sep 23, 2020

my multi-allelic, you mean it has one line per alternate allele? i think self is the only option that will work correctly for that. what is your config?

@IvantheDugtrio
Copy link
Author

IvantheDugtrio commented Sep 23, 2020

By multi-allelic, I mean one column, say HGVSc_snpEff, has multiple annotations separated by commas, each corresponding to a different transcript ID, while the rest of the file is tab-delimited.

For example, one annotation line can have an HGVSc_snpEff field that looks like:
c.44C>G,c.44C>G,c.57C>G,c.57C>G

My config is as follows:

file=dbNSFP.txt.gz
columns=[13,21,22]
ops=["first","first","first"]
names=["dbNSFP_genename","dbNSFP_HGVSc_snpEff","dbNSFP_HGVSp_snpEff"]

@brentp
Copy link
Owner

brentp commented Sep 23, 2020

hmm. I'm not sure this can work. what does ALT look like for that dbNSFP line?

@IvantheDugtrio
Copy link
Author

IvantheDugtrio commented Sep 23, 2020

Oh, I mis-stated, the multiple annotations in the same column are separated by semicolons. An actual line from the file looks like in the attached.

The REF and the ALT are still just single entries, in this case, a C for the REF, and a T for the ALT.
T315I.txt

@liserjrqlxue
Copy link
Contributor

dbNSFP has multi records of same variants which have different amino acid change.
This may also case unexpect result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants