Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #123: Add anib #338

Open
wants to merge 81 commits into
base: master
Choose a base branch
from
Open

Issue #123: Add anib #338

wants to merge 81 commits into from

Conversation

baileythegreen
Copy link
Contributor

@baileythegreen baileythegreen commented Sep 13, 2021

Adds anib subcommand to v3.

Closes #123.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality not to work as expected)
  • This change requires a documentation update
  • This is a documentation update

Action Checklist

  • Work on a single issue/concept (if there are multiple separate issues to address, please use a separate pull request for each)
  • Fork the pyani repository under your own account (please allow write access for repository maintainers)
  • Set up an appropriate development environment (please see CONTRIBUTING.md)
  • Create a new branch with a short, descriptive name
  • Work on this branch
    • style guidelines have been followed
    • new code is commented, especially in hard-to-understand areas
    • corresponding changes to documentation have been made
    • tests for the change have been added that demonstrate the fix or feature works
  • Test locally with pytest -v non-passing code will not be merged
  • Rebase against origin/master
  • Check changes with flake8 and black before submission
  • Commit branch
  • Submit pull request, describing the content and intent of the pull request
  • Request a code review
  • Continue the discussion at Pull requests section in the pyani repository

Including changing the parameter `mode` to `method`
`run_anib_jobs()`, `update_comparison_results()`
If this is not a boolean, it prevents sqlite from recognising duplicate 
entries
Up to adding them to the database
Now passes the lists of all `fragfiles` and `fraglens`
Necessary for sqlite to recognise duplicate values
Currently, we don't pass a list of genome pairs to `anib.py`.
@baileythegreen baileythegreen added enhancement something we'd like pyani to do that it doesn't already method the issue relates to how results are calculated VERSION_3 issues relating to version 0.3.x of pyani labels Sep 13, 2021
@baileythegreen
Copy link
Contributor Author

baileythegreen commented Sep 14, 2021

I am running into an issue:

  • In the monolithic v2, anib and aniblastall were conjoined; we discussed separating them for v3.
  • pyani/scripts/average_nucleotide_identity.py exists in the v3 branch, and both it, and its tests, expect anib and aniblastall to be conjoined.

As of yet, I am not sure how to separate the two, while keeping backwards-compatibility and also not repeating a lot of code.

Unless, I can also change pyani/scripts/average_nucleotide_identity.py.

@baileythegreen
Copy link
Contributor Author

Failing test for get_version() on Linux because the information is in stdout, not stderr.

@baileythegreen baileythegreen marked this pull request as ready for review December 13, 2021 01:34
@baileythegreen baileythegreen changed the title DRAFT: anib issue_123 Add anib issue_123 Dec 13, 2021
@baileythegreen baileythegreen changed the title Add anib issue_123 Issue #123: Add anib Dec 13, 2021
@baileythegreen
Copy link
Contributor Author

baileythegreen commented Apr 13, 2022

@widdowquinn I think this is ready to merge. The things being caught by codefactor are calls to subprocess.run that use shell=True, which I don't think we can do differently, and the use of global anib in average_nucleotide_identity.py, which is necessary to maintain backwards compatibility so the legacy tests pass.

@baileythegreen baileythegreen added this to the 0.3.0 milestone May 4, 2022
@baileythegreen baileythegreen added the PR of Supreme Importance The PR Bailey really, really wants merged right now label May 11, 2022
@baileythegreen
Copy link
Contributor Author

@widdowquinn I think this is ready to merge. The things being caught by codefactor are calls to subprocess.run that use shell=True, which I don't think we can do differently, and the use of global anib in average_nucleotide_identity.py, which is necessary to maintain backwards compatibility so the legacy tests pass.

This is once again the state of this PR. Can it be merged?

Copy link
Owner

@widdowquinn widdowquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tests are no longer relevant or applicable, they should be removed or updated to make them relevant. The "unsure this is needed" message here (cc4d346 and 3233fd5) reads as though the person making the change does not know what the test does, and/or why it fails or might be important.

In this case, the intent of the tests was to supplement the corresponding *single() function (which tests generation of a single command line, given input files), to check whether the appropriate reciprocal BLAST comparison command lines were being correctly generated. This allowed some level of diagnosis when a comparison fails to establish that it is not (i) the command-line itself that fails or (ii) failure to generate the reciprocal comparison command-lines.

These are unit tests, whose coverage may be masked by a passing integration test so removal may not indicate a problem if only the coverage information is being checked. Please can you clarify whether the intent of these skipped tests was maintained elsewhere? If not, do you consider that any of the following hold: (i) the unit test is redundant; (ii) the unit test needs to be updated to reflect the new way that reciprocal comparison commands are generated; (iii) the time taken to update the unit tests would be better spent converting command-line generation to something SLURM-compatible?

If (i) then please delete the tests. If (ii) please update the tests accordingly. If (iii) please ignore - other than to note that (iii) is the case - and work on the new command-line generation.

widdowquinn and others added 3 commits June 7, 2022 18:14
- the sort_index() call now needs axis as an explicit keyword
- pandas.utils.testing is now pandas.testing
If an import is no longer required, it should be removed, rather
than commented as "probably not required" or similar.
@baileythegreen
Copy link
Contributor Author

After reviewing the code and tests the reason for the comments (left by past me) has become clear.

subcmd_anib.py uses itertools.permutations to generate the list of comparisons that may need to be run, which takes care of the reciprocity made necessary by the inherent asymmetry of the underlying ANI algorithm.

>>> list(permutations(['A', 'B', 'C'], 2))

[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

The resulting list of comparisons is looped over in subcmd_anib.generate_joblist(). Because this version is baked by a database, and both ('A', 'B') and ('B', 'A') appear in the comparison list as (query, subject) pairs, each call to anib.generate_blastn_commands() only needs to return a single command.

The *multiple tests were based on initial test code written before I joined the project (see here, and their intention seemed to be to take an unordered pair of input genomes (genome1, genome2) and produce the command lines for both genome1 vs genome2 and genome2 vs genome1, with each genome treated as the query and subject in turn.

I would suggest these are redundant, as to match the current expected output is functionally no different than calling the corresponding *single tests twice with the order of the input files swapped, but I may be overlooking some utility that should be preserved?

The only possibility I can think of is related to this:

failure to generate the reciprocal comparison command-lines

though, a specific failure to produce a reciprocal command line would generally mean the reciprocal one was not generated via the same mechanism as the 'original'—and that is not the case in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement something we'd like pyani to do that it doesn't already method the issue relates to how results are calculated PR of Supreme Importance The PR Bailey really, really wants merged right now VERSION_3 issues relating to version 0.3.x of pyani
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement anib on development branch
2 participants