Skip to content

v0.2.2 (November, 2023)

Latest
Compare
Choose a tag to compare
@jeremymanning jeremymanning released this 06 Nov 20:02
· 4 commits to master since this release
202296b

Bug fix release!

This release fixes a "bug" in the permutation-corrected clustering (fingerprint) computations.

In the previous implementation, if a given feature dimension had the same value across all words in the list, the permutation-corrected clustering score would come out to 1. The code has now been refactored so that this "edge case" now results in a corrected clustering score of 0.5 (i.e., "chance"), which seems to make more sense.

Details

This bugfix entailed making two changes:

  • When ranking distances along some feature dimension, take into account the possibility that some feature values may be equal
  • When computing the percentile rank of the corrected score (within the distribution of shuffled scores), instead of computing the proportion of shuffled scores that were strictly less than the observed score, we now compute the mean "point values" as follows:
    • shuffled scores that are less than the observed score get 1 point
    • shuffled scores that are greater than the observed score get 0 points
    • shuffled scores that are equal to the observed score get 0.5 points

Full Changelog: v0.2.1...v0.2.2