Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internally curated consensus sequences for lineages #30

Open
gcha31 opened this issue Oct 19, 2022 · 1 comment
Open

Internally curated consensus sequences for lineages #30

gcha31 opened this issue Oct 19, 2022 · 1 comment

Comments

@gcha31
Copy link

gcha31 commented Oct 19, 2022

Hi TOAST team,

I am interested in curating the representative genomes for VOCs/VBMs. And according to your recent publication (Xiaoli, Lingzi, et al. "Benchmark datasets for SARS-CoV-2 surveillance bioinformatics." PeerJ 10 (2022): e13821.), your dataset 4&5 were prepared based on alignments to the 'internally curated consensus sequences'. May I ask for the details about how you curated those internally? Thank you.

Best,
Gyuhyon

@lskatz
Copy link
Collaborator

lskatz commented Mar 24, 2023

Hi there,
We had contacted the SSEV team at CDC for the representative genomes. We are told that they were gathered in a two step process:

  1. Pull all representatives listed in the pangolin repository https://github.com/cov-lineages/pango-designation/blob/master/curation_notes/curation_notes.tsv
  2. If a representative sequence is not present, pull the longest/cleanest available sequence from the lineage (least amount of mixed bases and missing data) with the earliest collection date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants