Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClinSig changes for some halpotypes in join_data.R - become wrong! #46

Open
raymond301 opened this issue Sep 26, 2017 · 2 comments
Open

Comments

@raymond301
Copy link
Contributor

This is the result of join_data.R for multi

chrom pos ref alt measureset_type measureset_id rcv allele_id symbol hgvs_c clinical_significance clinical_significance_ordered pathogenic benign conflicted
2 166277030 T G Haplotype 30359 RCV000023304 39315 SCN9A NM_002977.3:c.2794A>C Benign/Likely benign Pathogenic 0 1 0

It's wrong: NM_002977.3(SCN9A):c.[2794A>C;2971G>T] – Haplotype

Looking at each step:

clinvar_table_raw.multi.tsv - Correct
clinvar_table_normalized.multi.tsv.gz - Correct
clinvar_allele_trait_pairs.multi.tsv.gz - Correct
clinvar_alleles_grouped.multi.tsv.gz - Correct
clinvar_alleles_combined.multi.tsv.gz - WRONG

There are 20 halpotype variants with this same issue:

measureset_type measureset_id rcv allele_id
Haplotype 1631 RCV000001698 16670
Haplotype 5706 RCV000006060 20745
Haplotype 5813 RCV000006169 20852
Haplotype 7239 RCV000007661 22244
Haplotype 13065 RCV000013940 28104
Haplotype 13399 RCV000014336;RCV000014337 28436
Haplotype 16318 RCV000017711;RCV000201276 31357
Haplotype 16876;16877 RCV000018372;RCV000018373 31916
Haplotype 4297 RCV000004533;RCV000004534;RCV000004535;RCV000004536 38384
Haplotype 4297 RCV000004533;RCV000004534;RCV000004535;RCV000004536 38385
Haplotype 4816 RCV000005085 38434
Haplotype 9398 RCV000010000 38447
Haplotype 9407 RCV000010010 38448
Haplotype 16318;217371 RCV000017711;RCV000201276;RCV000201278 38476
Haplotype 30359 RCV000023304 39315
Haplotype 38571 RCV000021985 46849
Haplotype 402236 RCV000454199 98655
Haplotype 218894 RCV000203245 137950
Haplotype 225143 RCV000210779 227037
Haplotype 188053 RCV000167863 255673
@raymond301
Copy link
Contributor Author

raymond301 commented Oct 16, 2017

Looks like this is still an issue with the new python join script, except it outputs two lines now. Both are technically wrong.

variation_type = "Haplotype"
variation_id = "30359"
allele_id = "39315"

clinical_significance = "Benign/Likely benign"
clinical_significance_ordered = ["pathogenic"]

New Flags Look good.
"pathogenic":1,"likely_pathogenic":0,"uncertain_significance":"0","likely_benign":0,"benign":0

But....
allele_id = "39316"
clinical_significance = "Conflicting interpretations of pathogenicity"
clinical_significance_ordered = ["pathogenic"]

@bw2
Copy link
Contributor

bw2 commented Dec 3, 2017

This may be a clinvar data consistency issue:

join_variant_summary_with_clinvar_alleles.py
generates clinvar_alleles_combined.multi.tsv.gz by joining clinvar_alleles_grouped.multi.tsv.gz to variant_summary.txt.gz
using allele_id as the join key.
While doing this, it switches the clinical_significance column to the value from variant_summary.txt.gz (but doesn't update the clinical_significance_ordered column), so the values differ between the .xml release and variant_summary.txt.gz

I wonder if we should set these as 'conflicting interpretation' in both clinical_significance and clinical_significance_ordered columns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants