Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get fastmultigather to write to a specific output file #299

Open
AnneliektH opened this issue Apr 12, 2024 · 6 comments
Open

Can't get fastmultigather to write to a specific output file #299

AnneliektH opened this issue Apr 12, 2024 · 6 comments

Comments

@AnneliektH
Copy link

Running fastmultigather on a set of metagenome reads x a set of zip files and i want the output to be named as metagenome.x.zip.csv

Running as:
sourmash scripts fastmultigather ../../sourmash_sketches/sketch_reads/ERR1135178.k21.zip ../../sourmash_sketches/split_sig/ERR11351_1.zip --output ERR1135178xERR11351_1.csv -k 21 -t 1000 -s 100 -c 1

Gives me 2 output files: ERR1135178.gather.csv and ERR1135178.gather.csv, which is not what I want

@ctb
Copy link
Collaborator

ctb commented Apr 12, 2024

right, see #239 - --output only works with a rocksdb. We have two quite different modes of operation of fastmultigather that need to converge at some point :).

@AnneliektH
Copy link
Author

Ahh too bad. Would there be a way to work around this using zip files? as i currently cannot get rockdbs to work

@bluegenes
Copy link
Contributor

@ctb are you ok with switching to single-file output for all queries? Or should we aim to support both?

@ctb
Copy link
Collaborator

ctb commented Apr 13, 2024

@ctb are you ok with switching to single-file output for all queries? Or should we aim to support both?

Two hot takes -

I'm kind of against switching things up wilhe-nilhe just because? The branchwater plugin is becoming a bit of a UX nightmare already and changing up what we do in the middle of a point release seems suboptimal. We've done it before, we'll do it again, but I question the necessity here ;). Can't we just give @amterhorst a csvtk command that aggregates all of the CSV files into one or something??

I would be more supportive of making it so that when -o is give, it aggregates all the output into a single file, but when it is not given, it sticks with the current behavior. I fear that will lead to a lot of code nastiness, which I dislike more than confusing users for this particular package.

This is all in the background of a complicated set of design decisions discussed over in sourmash-bio/sourmash#2328, for which I have no answer.

So, umm, after all of this - yeah, sure, switch to -o aggregating all output into a single file. It'll be a required parameter now, right?

@AnneliektH
Copy link
Author

Ok: rocksdb not working was my fault, as i tried to create directly from a sig.gz file, not a list of filepaths.
So I can create output files with a specific name now

As for a csvtk command, that be great, but my issue is that the output file gets overwritten.
Im trying to run the same metagenome against different sets of databases, and output files are named after the metag.

Doing this because if using one db, it doesn't finish.

Anyway: Rocksdb fixed my problem.

@ctb
Copy link
Collaborator

ctb commented Apr 21, 2024

As for a csvtk command, that be great, but my issue is that the output file gets overwritten.
Im trying to run the same metagenome against different sets of databases, and output files are named after the metag.

True!

Another workaround is to run it in a separate directory.

bluegenes added a commit that referenced this issue May 8, 2024
…ather (#320)

Fixes #313
Fixes #285

Addresses #239 / #299 by bailing out when an output path is provided but will not be used:
```
RuntimeError: output path specified, but not running fastmultigather against a rocksdb. See 
issue #239
```
although we are leaning towards respecting `-o` in the future, see #299 discussion, so this will soon get revisited.
---------

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants