You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add subcommand to use dnadiff approach to calculate ANI %ID and coverage
Description:
Different methods/approaches can lead to slightly different numbers being reported. In my previous meetings with @widdowquinn, we agreed that adding a dnadiff subcommand to replicate the values for ANI %ID and coverage would be a good idea.
We previously attempted to replicate the values of AlignedBases and AverageIdentity given in the .report file returned by dnadiff. However, we were unable to do so solely by parsing the delta files due to differences in how they are processed by different programs (e.g., show-coords). One way of doing this would be to run dnadiff to obtain all necessary files (.coords and .rdiff), and calculate the values from them.
The text was updated successfully, but these errors were encountered:
Our main issue is that dnadiff returns 9 files for a single pairwise comparison. We do not want to generate that many files, especially when they are not needed. After my investigation, I managed to replicate the values for AlignedBases and AvgIdentity by running only scripts/programs that generate what we need.
To replicate the results, we need to generate 4 command lines:
nucmer to generate alignments with the --maxmatch parameter.
delta-filter to generate M-to-M alignments by calling the -m parameter.
NOTE: Both commands 1 and 2 are different from the ones currently generated in the pyani anim subcommand.
show-coords to generate the .mcoords file needed to calculate AlignedBases and AvgIdentity.
show-diff to generate the .rdiff files needed for the AlignedBases calculation.
I have used the exact process implemented by dnadiff and successfully replicated the numbers for three separate test sets. You can find all the scripts and data here.
pyani dnadiff is something probably best reserved for the ground-up rebuild in pyani-plus - let's keep development for that project, not for this v0.3.
Summary:
Add subcommand to use
dnadiff
approach to calculate ANI %ID and coverageDescription:
Different methods/approaches can lead to slightly different numbers being reported. In my previous meetings with @widdowquinn, we agreed that adding a dnadiff subcommand to replicate the values for ANI %ID and coverage would be a good idea.
We previously attempted to replicate the values of
AlignedBases
andAverageIdentity
given in the.report
file returned bydnadiff
. However, we were unable to do so solely by parsing thedelta
files due to differences in how they are processed by different programs (e.g.,show-coords
). One way of doing this would be to rundnadiff
to obtain all necessary files (.coords
and.rdiff
), and calculate the values from them.The text was updated successfully, but these errors were encountered: