Issue #123: Add `anib` #338

baileythegreen · 2021-09-13T23:32:54Z

Adds anib subcommand to v3.

Closes #123.

Type of change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality not to work as expected)
This change requires a documentation update
This is a documentation update

Action Checklist

Including changing the parameter `mode` to `method`

`run_anib_jobs()`, `update_comparison_results()`

If this is not a boolean, it prevents sqlite from recognising duplicate entries

Up to adding them to the database

Now passes the lists of all `fragfiles` and `fraglens`

Necessary for sqlite to recognise duplicate values

`

Currently, we don't pass a list of genome pairs to `anib.py`.

…ib_123

baileythegreen · 2021-09-14T16:30:35Z

I am running into an issue:

In the monolithic v2, anib and aniblastall were conjoined; we discussed separating them for v3.
pyani/scripts/average_nucleotide_identity.py exists in the v3 branch, and both it, and its tests, expect anib and aniblastall to be conjoined.

As of yet, I am not sure how to separate the two, while keeping backwards-compatibility and also not repeating a lot of code.

Unless, I can also change pyani/scripts/average_nucleotide_identity.py.

baileythegreen · 2021-10-05T17:03:28Z

Failing test for get_version() on Linux because the information is in stdout, not stderr.

…ib_123

baileythegreen · 2022-04-13T16:58:52Z

@widdowquinn I think this is ready to merge. The things being caught by codefactor are calls to subprocess.run that use shell=True, which I don't think we can do differently, and the use of global anib in average_nucleotide_identity.py, which is necessary to maintain backwards compatibility so the legacy tests pass.

…ib_123

baileythegreen · 2022-05-16T23:30:39Z

@widdowquinn I think this is ready to merge. The things being caught by codefactor are calls to subprocess.run that use shell=True, which I don't think we can do differently, and the use of global anib in average_nucleotide_identity.py, which is necessary to maintain backwards compatibility so the legacy tests pass.

This is once again the state of this PR. Can it be merged?

widdowquinn

If tests are no longer relevant or applicable, they should be removed or updated to make them relevant. The "unsure this is needed" message here (cc4d346 and 3233fd5) reads as though the person making the change does not know what the test does, and/or why it fails or might be important.

In this case, the intent of the tests was to supplement the corresponding *single() function (which tests generation of a single command line, given input files), to check whether the appropriate reciprocal BLAST comparison command lines were being correctly generated. This allowed some level of diagnosis when a comparison fails to establish that it is not (i) the command-line itself that fails or (ii) failure to generate the reciprocal comparison command-lines.

These are unit tests, whose coverage may be masked by a passing integration test so removal may not indicate a problem if only the coverage information is being checked. Please can you clarify whether the intent of these skipped tests was maintained elsewhere? If not, do you consider that any of the following hold: (i) the unit test is redundant; (ii) the unit test needs to be updated to reflect the new way that reciprocal comparison commands are generated; (iii) the time taken to update the unit tests would be better spent converting command-line generation to something SLURM-compatible?

If (i) then please delete the tests. If (ii) please update the tests accordingly. If (iii) please ignore - other than to note that (iii) is the case - and work on the new command-line generation.

- the sort_index() call now needs axis as an explicit keyword - pandas.utils.testing is now pandas.testing

If an import is no longer required, it should be removed, rather than commented as "probably not required" or similar.

…ib_123

baileythegreen · 2022-06-21T18:23:57Z

After reviewing the code and tests the reason for the comments (left by past me) has become clear.

subcmd_anib.py uses itertools.permutations to generate the list of comparisons that may need to be run, which takes care of the reciprocity made necessary by the inherent asymmetry of the underlying ANI algorithm.

>>> list(permutations(['A', 'B', 'C'], 2))

[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

The resulting list of comparisons is looped over in subcmd_anib.generate_joblist(). Because this version is baked by a database, and both ('A', 'B') and ('B', 'A') appear in the comparison list as (query, subject) pairs, each call to anib.generate_blastn_commands() only needs to return a single command.

The *multiple tests were based on initial test code written before I joined the project (see here, and their intention seemed to be to take an unordered pair of input genomes (genome1, genome2) and produce the command lines for both genome1 vs genome2 and genome2 vs genome1, with each genome treated as the query and subject in turn.

I would suggest these are redundant, as to match the current expected output is functionally no different than calling the corresponding *single tests twice with the order of the input files swapped, but I may be overlooking some utility that should be preserved?

The only possibility I can think of is related to this:

failure to generate the reciprocal comparison command-lines

though, a specific failure to produce a reciprocal command line would generally mean the reciprocal one was not generated via the same mechanism as the 'original'—and that is not the case in this PR.

baileythegreen added 24 commits July 15, 2021 20:27

Initial split of v2 ANIb code into anib.py and aniblastall.py files

b64231c

Including changing the parameter `mode` to `method`

Updated comments/logging

39bdde1

Added sysexit when adding genomes to the database fails

7e9ebe2

Added initial versions of functions and comments about needed code

70abeb9

`run_anib_jobs()`, `update_comparison_results()`

Rename fraglens to fragsizes for consistency with anib.py

9dcade6

Add add_blastdb() to pyani_orm.py

c660e63

Change default for maxmatch from None to False

bf4fa20

If this is not a boolean, it prevents sqlite from recognising duplicate entries

Add/expand code to process input genomes

c1b4be1

Up to adding them to the database

Split genome files into contiguous fragments

34f9ec3

Implement generate_joblist()

5cdd452

Implement run_anib_jobs()

d78b479

Implement update_comparisons_results() and commit to database

0d42e79

Update call to generate_joblist()

9e25141

Now passes the lists of all `fragfiles` and `fraglens`

Update name of output file in fragment_fasta_file()

1e8a47d

Update value passed for maxmatch to a boolean

b07a1dd

Necessary for sqlite to recognise duplicate values

Implement remainder of subcmd_anib()

8f4b131

Add a commented question

c76f54c

Alter generate_blastn_commands() to only take one query/subject pair

c71ce5d

Alter construct_blastn_cmdline() for a single query/subject pair

585b98f

Remove method parameter from process_blast() call

059ffdf

Change outfilename in `construct_makeblastdb_cmd()

aed81e1

`

Move aniblastall-specific tests to test_aniblastall.py

0f0653f

Change call to parse_blast_tab() to not use method parameter

fc273df

Skip tests as a result of changes to how command lines are generated

cc4d346

Currently, we don't pass a list of genome pairs to `anib.py`.

baileythegreen added enhancement something we'd like pyani to do that it doesn't already method the issue relates to how results are calculated VERSION_3 issues relating to version 0.3.x of pyani labels Sep 13, 2021

baileythegreen added 2 commits September 14, 2021 00:35

Add get_version() tests and boilerplate

0e3d471

Merge branch 'master' of https://github.com/widdowquinn/pyani into an…

f6f52ec

…ib_123

baileythegreen added 3 commits October 5, 2021 15:13

Add test_subcmd_10_aniblastall.py

13add41

Make test_subcmd_04_anim.py match other method subcommand tests

7566329

Fix names of executable variables

3c688d4

baileythegreen added 6 commits October 7, 2021 11:59

Pass both stdout and stderr to the regex version search

bc25ccc

Set indir and outdir to required in anib_parser.py

a313d2f

Add docs/subcmd_anib.rst

3ccd1d9

Add documentation for aniblastall

f423dbe

Merge branch 'master' of https://github.com/widdowquinn/pyani into an…

c252188

…ib_123

Replace f-strings in logging statements

8c40113

baileythegreen marked this pull request as ready for review December 13, 2021 01:34

baileythegreen requested a review from widdowquinn as a code owner December 13, 2021 01:34

baileythegreen changed the title ~~DRAFT: anib issue_123~~ Add anib issue_123 Dec 13, 2021

baileythegreen changed the title ~~Add anib issue_123~~ Issue #123: Add anib Dec 13, 2021

baileythegreen added 4 commits April 13, 2022 16:28

Update .gitignore

0596e77

Merge branch 'master' of https://github.com/widdowquinn/pyani into an…

03a28fa

…ib_123

Update call to add_run() to fit the function's new return value

6b7b52f

Remove f-strings from logging calls

23d951f

baileythegreen added this to the 0.3.0 milestone May 4, 2022

baileythegreen added the PR of Supreme Importance The PR Bailey really, really wants merged right now label May 11, 2022

Merge branch 'master' of https://github.com/widdowquinn/pyani into an…

8b43a50

…ib_123

baileythegreen mentioned this pull request May 13, 2022

Bug in CircleCI tests for python 3.6: caching issue #397

Closed

Merge branch 'master' of https://github.com/widdowquinn/pyani into an…

5b0cd43

…ib_123

widdowquinn requested changes Jun 7, 2022

View reviewed changes

widdowquinn and others added 3 commits June 7, 2022 18:14

update deprecated behaviours in pandas

ffba902

- the sort_index() call now needs axis as an explicit keyword - pandas.utils.testing is now pandas.testing

remove unused import

bdefef0

If an import is no longer required, it should be removed, rather than commented as "probably not required" or similar.

Merge branch 'master' of https://github.com/widdowquinn/pyani into an…

dfee675

…ib_123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #123: Add `anib` #338

Issue #123: Add `anib` #338

baileythegreen commented Sep 13, 2021 •

edited

baileythegreen commented Sep 14, 2021 •

edited

baileythegreen commented Oct 5, 2021

baileythegreen commented Apr 13, 2022 •

edited

baileythegreen commented May 16, 2022

widdowquinn left a comment •

edited

baileythegreen commented Jun 21, 2022

Issue #123: Add anib #338

Are you sure you want to change the base?

Issue #123: Add anib #338

Conversation

baileythegreen commented Sep 13, 2021 • edited

Type of change

Action Checklist

baileythegreen commented Sep 14, 2021 • edited

baileythegreen commented Oct 5, 2021

baileythegreen commented Apr 13, 2022 • edited

baileythegreen commented May 16, 2022

widdowquinn left a comment • edited

Choose a reason for hiding this comment

baileythegreen commented Jun 21, 2022

Issue #123: Add `anib` #338

Issue #123: Add `anib` #338

baileythegreen commented Sep 13, 2021 •

edited

baileythegreen commented Sep 14, 2021 •

edited

baileythegreen commented Apr 13, 2022 •

edited

widdowquinn left a comment •

edited