-
Notifications
You must be signed in to change notification settings - Fork 1
/
pipeline.txt
66 lines (48 loc) · 2.07 KB
/
pipeline.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
Datasets:
Genomic:
- Local dataset
- Global dataset (2021)
- Global dataset (2022)
Expected versus non-permitted lineage assignments between each pangolin version:
- expected.13.14.tsv;
- expected.14.15.tsv;
- expected.15.16.tsv;
- expected.2021-11-09_v1.2.133.tsv
Lineage Assignments:
- Pangolin v.3/pango designation v1.2.76-93
- pangolin v.3.1.13
- pangolin v.3.1.14
- pangolin v.3.1.15
- pangolin v.3.1.16
*pangoLEARN uses a decision tree model
Command line: pangolin <samples.fasta> --skip-designation-hash --max-ambig .99 --min-length 1 --outfile <pango_nohash.csv>
pangolin <samples.fasta> --skip-designation-hash --max-ambig .99 --min-length 1 --usher --outfile <usher_nohash.csv>
- Pangolin v.4.0.2
- pangolin-data v1.2.133
*pangoLEARN uses a random forest model
Command line: pangolin <samples.fasta> --skip-designation-cache --max-ambig .99 --min-length 1 --analysis-mode pangolearn --outfile <pangov4_nohash.csv>
pangolin <samples.fasta> --skip-designation-cache --max-ambig .99 --min-length 1 --analysis-mode usher --outfile <usherv4_nohash.csv>
Alternate lineage assignment:
- Align to reference
- MAFFT:
Command line: mafft --anysymbol --keeplength --6merpair --addfragments <samples.fasta> <MN908947.3_coronavirus_wuhan_nCov19-refseq.fasta>
- Lineage assignment:
- MAPLE:
Command line: pypy3 MAPLEv0.1.9.py --inputTree outputMaple_binary_tree.tree --assignmentFile ref.csv --output MAPLE019 --overwrite
- NextClade:
Command line: nextclade --in-order --input-fasta <samples.fasta> --input-dataset /opt/nextclade_database/sars-cov-2_MN908947_2023-06-16T12\:00\:00Z/
Lineage Assignment Validation:
- Adjusted Mutual Information:
Script: comparison_script_w_ami.py
- Ancestral accuracy:
Script: compareLineages.py
Pangolin versions lineage assignment comparison (Assignment comparisons):
- Lineage assignment consistency:
- SNP Distance:
snp-dists v.0.8.2
Script: snp_scorpio-comparisons.sh, snp_scorpio-comparisons.Rmd
- Stability comparison:
- Expected vs. non-expected
Script: tables_and_violin_plots.R
- Sankey Plots:
Script: sankey_plots.R