To-do #1

apcamargo · 2021-01-23T19:29:25Z

Use Vamb's transformation to reduce the number of TNF dimensions (103, instead of 136)
Reduce memory footprint:
- Use screed
- Use hashes
- Use Rust in taxopy
Implement a modular interface, so that users can choose between several combinations. For example:
- Sequence composition
- Sequence composition + coverage
- Sequence composition + coverage + codon usage
- Sequence composition + coverage + codon usage + taxonomy
- Coverage + codon usage
- …

jakobnissen · 2022-07-13T11:00:16Z

If you need help implementing the transformation, let me know.
You can find a description of the idea in this paper, where I got the idea from: Kislyuk, A., Bhatnagar, S., Dushoff, J. et al. Unsupervised statistical clustering of environmental shotgun sequences. BMC Bioinformatics 10, 316 (2009). https://doi.org/10.1186/1471-2105-10-316

Practically speaking, I've found the best way is to compute a kernel beforehand using Scipy, save it to a file. then just load the kernel at runtime. You can just copy the file src/create_kernel.py directly from Vamb.

apcamargo · 2022-07-13T17:40:37Z

Thanks, @jakobnissen!

Now that I finished the first version of geNomad I might pick this up again. Your approach looks clean, I'll use it :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To-do #1

To-do #1

apcamargo commented Jan 23, 2021 •

edited

jakobnissen commented Jul 13, 2022

apcamargo commented Jul 13, 2022

To-do #1

To-do #1

Comments

apcamargo commented Jan 23, 2021 • edited

jakobnissen commented Jul 13, 2022

apcamargo commented Jul 13, 2022

apcamargo commented Jan 23, 2021 •

edited