Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support simplified samples.csv / prefix usage #11

Open
ctb opened this issue Dec 27, 2023 · 0 comments
Open

support simplified samples.csv / prefix usage #11

ctb opened this issue Dec 27, 2023 · 0 comments

Comments

@ctb
Copy link
Member

ctb commented Dec 27, 2023

in practice a lot of samples stuff seems to be "use this prefix, find paired end files, sketch".

could just write a script or set of scripts to generate samples.csv for different situations, actually...

so an alternative could be: samples.csv gets more complicated, but is easier to autogenerate?

bluegenes added a commit to sourmash-bio/sourmash_plugin_branchwater that referenced this issue Mar 1, 2024
## Prefix-based sketching

#184  introduces a new input type that better supports metagenome reads, but doesn't really make things that much simpler for the power user. We can probably support prefix-style naming, as suggested in dib-lab/sourmash-slainte#11.

Here we introduce a 'prefix' CSV type with the following columns:
`name,input_moltype,prefix,exclude`.

Here we:
1. glob to find all files that match prefix
2. glob to find all files that match exclude
3. filter prefix files to exclude `exclude` files

This just uses `glob`, no `regex`, so `*` are fine in `prefix` and `exclude`, but not full regex patterns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant