Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a --dry-run flag, generally #343

Open
baileythegreen opened this issue Oct 1, 2021 · 8 comments
Open

Add a --dry-run flag, generally #343

baileythegreen opened this issue Oct 1, 2021 · 8 comments
Labels
enhancement something we'd like pyani to do that it doesn't already interface issues related to how the user tells pyani to do something
Projects
Milestone

Comments

@baileythegreen
Copy link
Contributor

Summary:

This exists for one subcommand (--download), but would be more generally useful (I think).

Description:

Currently, I want to see the specific ANIm commands that would be run, even if they have actually already been done, the resulting comparisons are present in the database, and therefore pyani isn't actually going to do, anything.

Even with --debug and -v set, the commands don't appear in the log file.

@baileythegreen baileythegreen added enhancement something we'd like pyani to do that it doesn't already interface issues related to how the user tells pyani to do something labels Oct 1, 2021
@widdowquinn
Copy link
Owner

I agree, a --dry-run option would be useful for a number of subcommands.

However, the ANIm commands already show up in my logs, e.g.

[...]
[INFO] [pyani.scripts.pyani_script]: command-line: /mnt/shared/scratch/lpritcha/apps/conda/envs/pyani_py38/bin/pyani anim --debug -l 02-anim.log -i bolteae_clostridioforme_symbiosum_genomes -o anim_out --scheduler SLURM --labels bolteae_clostridioforme_symbiosum_genomes/labels.txt --classes bolteae_clostridioforme_symbiosum_genomes/classes.txt
[...]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Creating NUCmer jobs for ANIm
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Commands to run:
	nucmer --mum -p anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_005845085.1_ASM584508v1_genomic /mnt/shared/scratch/lpritcha/private/clostridia/bolteae_clostridioforme_symbiosum_genomes/GCF_003433765.1_ASM343376v1_genomic.fna /mnt/shared/scratch/lpritcha/private/clostridia/bolteae_clostridioforme_symbiosum_genomes/GCF_005845085.1_ASM584508v1_genomic.fna
	delta_filter_wrapper.py delta-filter -1 anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_005845085.1_ASM584508v1_genomic.delta anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_005845085.1_ASM584508v1_genomic.filter
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Expected output file for db: anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_005845085.1_ASM584508v1_genomic.filter
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Building job
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Commands to run:
	nucmer --mum -p anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_900049235.1_C.symbiosum_MappedAssembly_genomic /mnt/shared/scratch/lpritcha/private/clostridia/bolteae_clostridioforme_symbiosum_genomes/GCF_003433765.1_ASM343376v1_genomic.fna /mnt/shared/scratch/lpritcha/private/clostridia/bolteae_clostridioforme_symbiosum_genomes/GCF_900049235.1_C.symbiosum_MappedAssembly_genomic.fna
	delta_filter_wrapper.py delta-filter -1 anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_900049235.1_C.symbiosum_MappedAssembly_genomic.delta anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_900049235.1_C.symbiosum_MappedAssembly_genomic.filter
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Expected output file for db: anim_out/nucmer_output/GCF_003433765.1_ASM343376v1_genomic_vs_GCF_900049235.1_C.symbiosum_MappedAssembly_genomic.filter
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Building job
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Commands to run:
[...]

Has something changed in the branch you're on?

@baileythegreen
Copy link
Contributor Author

I don't think so. They show up even the second time you run an analysis?

This is my entire log file for the second time I submit an analysis:
pyani anim -i scratch/small_test/ -o scratch/issue_342 --dbpath scratch/issue_342-2.db -v --debug -l issue_342.log --maxmatch --noextend

[INFO] [pyani.scripts.pyani_script]: Processed arguments: Namespace(citation=False, classes=None, dbpath=PosixPath('scratch/issue_342-2.db'), debug=True, disable_tqdm=False, filter_exe=PosixPath('delta-filter'), func=<function subcmd_anim at 0x119e05d30>, indir=PosixPath('scratch/small_test'), jobprefix='PYANI', labels=None, logfile=PosixPath('issue_342.log'), maxmatch=True, name=None, noextend=True, nofilter=False, nucmer_exe=PosixPath('nucmer'), outdir=PosixPath('scratch/issue_342'), recovery=False, scheduler='multiprocessing', sgeargs=None, sgegroupsize=10000, verbose=True, version=False, workers=None)
[INFO] [pyani.scripts.pyani_script]: command-line: /Users/baileythegreen/Software/miniconda3/bin/pyani anim -i scratch/small_test/ -o scratch/issue_342 --dbpath scratch/issue_342-2.db -v --debug -l issue_342.log --maxmatch --noextend
[INFO] [pyani.scripts.pyani_script]: pyani version: 0.3.0-alpha
[INFO] [pyani.scripts.pyani_script]: CITATION INFO
[INFO] [pyani.scripts.pyani_script]: If you use pyani in your work, please cite the following publication:
[INFO] [pyani.scripts.pyani_script]: 	Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G.,
[INFO] [pyani.scripts.pyani_script]: 	& Toth, I.K. (2016) 'Genomics and taxonomy in diagnostics for
[INFO] [pyani.scripts.pyani_script]: 	food security: soft-rotting enterobacterial plant pathogens.'
[INFO] [pyani.scripts.pyani_script]: 	Analytical Methods, 8(1), 1224. http://doi.org/10.1039/C5AY02550H
[INFO] [pyani.scripts.pyani_script]: DEPENDENCIES
[INFO] [pyani.scripts.pyani_script]: The authors of pyani gratefully acknowledge its dependence on
[INFO] [pyani.scripts.pyani_script]: the following bioinformatics software:
[INFO] [pyani.scripts.pyani_script]: 	MUMmer3: S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway,
[INFO] [pyani.scripts.pyani_script]: 	C. Antonescu, and S.L. Salzberg (2004), 'Versatile and open software
[INFO] [pyani.scripts.pyani_script]: 	for comparing large genomes' Genome Biology 5:R12
[INFO] [pyani.scripts.pyani_script]: 	BLAST+: Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J.,
[INFO] [pyani.scripts.pyani_script]: 	Bealer K., & Madden T.L. (2008) 'BLAST+: architecture and applications.'
[INFO] [pyani.scripts.pyani_script]: 	BMC Bioinformatics 10:421.
[INFO] [pyani.scripts.pyani_script]: 	BLAST: Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J.,
[INFO] [pyani.scripts.pyani_script]: 	Zhang, Z., Miller, W. & Lipman, D.J. (1997) 'Gapped BLAST and PSI-BLAST:
[INFO] [pyani.scripts.pyani_script]: 	a new generation of protein database search programs.' Nucleic Acids Res.
[INFO] [pyani.scripts.pyani_script]: 	25:3389-3402
[INFO] [pyani.scripts.pyani_script]: 	Biopython: Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A,
[INFO] [pyani.scripts.pyani_script]: 	Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL
[INFO] [pyani.scripts.pyani_script]: 	(2009) Biopython: freely available Python tools for computational
[INFO] [pyani.scripts.pyani_script]: 	molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Running ANIm analysis
[INFO] [pyani.scripts.subcommands.subcmd_anim]: MUMMer nucmer version: Darwin_3.1 (/Users/baileythegreen/Software/miniconda3/bin/nucmer)
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Analysis name: ANIm_2021-10-01T12:07:18.710726
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Connecting to database scratch/issue_342-2.db
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Adding run info to database scratch/issue_342-2.db...
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: ...added run ID: Run 3: ANIm_2021-10-01T12:07:18.710726 (2021-10-01 12:07:18.710726) to the database
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Adding genomes for run Run 3: ANIm_2021-10-01T12:07:18.710726 (2021-10-01 12:07:18.710726) to database...
[INFO] [pyani.pyani_files]: Checking for hashfile: scratch/small_test/GCF_000023545.1_ASM2354v1_genomic.fna.md5.
[INFO] [pyani.pyani_files]: Checking for hashfile: scratch/small_test/GCF_000011605.1_ASM1160v1_genomic.fna.md5.
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: 	...added genome IDs: [1, 2]
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Generating ANIm command-lines
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: NUCmer output will be written temporarily to scratch/issue_342/nucmer_output
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Creating output directory scratch/issue_342/nucmer_output
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling genomes for comparison
[DEBUG] [pyani.scripts.subcommands.subcmd_anim]: Collected 2 genomes for this run
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Compiling pairwise comparisons (this can take time for large datasets)...
[INFO] [pyani.scripts.subcommands.subcmd_anim]: 	...total parwise comparisons to be performed: 1
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Checking database for existing comparison data...
[INFO] [pyani.scripts.subcommands.subcmd_anim]: 	...after check, still need to run 0 comparisons
[INFO] [pyani.scripts.subcommands.subcmd_anim]: All comparison results present in database (skipping comparisons)
[INFO] [pyani.scripts.subcommands.subcmd_anim]: Updating summary matrices with existing results
[INFO] [pyani.scripts.pyani_script]: Completed. Time taken: 0.335

@widdowquinn
Copy link
Owner

I don't think so. They show up even the second time you run an analysis?

No. pyani does not repeat alignment runs that are already present in the database. If you're testing, you should probably force creation of an empty database in your scripted test, e.g. pyani createdb --force && pyani anim …

@baileythegreen
Copy link
Contributor Author

Ah, okay. I do appreciate that showing comparison commands when they wouldn't be run is maybe not the usual behaviour for a --dry-run option. I can use --force for that when testing, I guess.

More generally, my first thought would be to implement this as a top-level option. Though what is actually sensible might depend on what the desired output looks like for each subcommand.

@widdowquinn
Copy link
Owner

Maybe a way to go here is to recognise the --dry-run flag when compiling commands and, instead of flitering them against the database before reporting only those which are not present in the db, keep a record of commands, and which are kept, Then we can report:

[INFO] nucmer ... (result already present, would not be run)
[INFO] nucmer ...

style output?

@baileythegreen
Copy link
Contributor Author

I think this is what I would want to see from such an option. And this would also be useful to me, when I am testing. Sometimes I want to verify that the things I think should run will run, and not other comparisons which may already be present, but am not also trying to actually run them; --forceing a new database wouldn't help with this.

@widdowquinn
Copy link
Owner

widdowquinn commented Oct 1, 2021

I think making showing the jobs that would have been run if they didn't exist in the db is a reasonable [DEBUG] output without --dry-run, but an [INFO] option with --dry-run.

What do you think?

@baileythegreen
Copy link
Contributor Author

I think I agree, in principle, though I wonder if there is a concise way to code that. (There may be some nifty part of the logging library that does this, and I just don't know of it, yet.)

@widdowquinn widdowquinn added this to the 0.3.1 milestone Apr 27, 2022
@widdowquinn widdowquinn added this to To do in pyani via automation Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement something we'd like pyani to do that it doesn't already interface issues related to how the user tells pyani to do something
Projects
pyani
  
To do
Development

No branches or pull requests

2 participants