Numerical stability issues in floating point comparisons (chord directional hamming) #295

bmcfee · 2018-12-03T13:42:52Z

I ran into a funny bug just now, and wanted to get some opinions on whether this should be considered an mir_eval problem or not.

This check in the directional hamming distance function looks for disjoint time intervals by comparing the end of one interval to the start of the next.

This is fine in principle, but I have a few (reference) files which are stored in JSON (jams) format, and for whatever reason, deserializing them messes up the floating point precision just right so that it triggers an error. For example:

In [4]: J = jams.load('/home/bmcfee/data/crema/chords/TRAIIEF149E3861F6C.jams')                                                                                                                                    

In [5]: ints, labs = J.annotations[0].to_interval_values()                                                                                                                                                         

In [6]: ints[1, 1]                                                                                                                                                                                                 
Out[6]: 2.126712

In [7]: ints[2, 0]                                                                                                                                                                                                 
Out[7]: 2.1267119990000003

On the one hand, this is clearly a data error, and it's reasonable for mir_eval to fail here.

On the other hand, this is not a serious data error, and it's one that we can expect to pop up any time we're dealing with numerical data through text serialization, which is basically every use case of mir_eval.

Given that, what do people think of rounding the time intervals (say to the millisecond level) prior to checking? That is, just add lines to the effect of

reference_intervals = np.around(reference_intervals, decimals=3)

before the overlap check. For my money, I see no downside to this, but I'd like to get some opinions before opening a PR.

craffel · 2018-12-03T18:26:53Z

Does this cause an error or violated assumption later on, or do things work out? In other words, should we round solely for the check, or should we round any input to the corresponding metric functions?

I feel a little odd rounding user data, though we already sort of do it elsewhere in the library. It feels like it would make more sense to round in io and then if someone uses their own IO we let them deal with any resulting weirdness, but I do not feel strongly about this since it probably won't make a big difference and is probably what the user wants.

bmcfee · 2018-12-03T18:36:11Z

Does this cause an error or violated assumption later on, or do things work out?

It depends on the metric. Many of the metrics convert (labeled) intervals to samples at some fixed rate (eg 10Hz), in which case, rounding to the millisecond will not change the results.

Some of the metrics (directional hamming) use precise differences rather than sampling, and in this case, there should be negligible difference if we round the results.

The places where it could get dicey are in something like onset detection or pitch detection, where the tolerances are already quite small (5ms), and maybe a lower tolerance would be appropriate there (1e-4, for instance).

Because it depends on the metric / task, I'm hesitant to recommend rolling the rounding into io, which happens upstream of the metrics.

craffel · 2018-12-03T18:48:49Z

Sorry, in that question I was referring specifically to the check in chord, not any of the other tasks. Should the chord metrics have their intervals rounded just for the check, or for the metric function itself?

bmcfee · 2018-12-03T18:54:26Z

Sorry, in that question I was referring specifically to the check in chord, not any of the other tasks. Should the chord metrics have their intervals rounded just for the check, or for the metric function itself?

I think it should be for both.

craffel · 2018-12-03T18:59:09Z

Because if we don't round it things break? Or for consistency and because it's probably the right thing to do? Or what?

bmcfee · 2018-12-03T21:44:04Z

Or for consistency and because it's probably the right thing to do? Or what?

Yeah, sorry -- if we round the timing to validate it, we should compute on the rounded data because it's been validated.

craffel · 2018-12-04T19:52:07Z

Ok, do you think we should only fix this in chord or are there other places we should round annotations?

bmcfee added the question label Dec 3, 2018

bmcfee added this to the 0.8 milestone Mar 13, 2024

bmcfee removed this from the 0.8 milestone Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical stability issues in floating point comparisons (chord directional hamming) #295

Numerical stability issues in floating point comparisons (chord directional hamming) #295

bmcfee commented Dec 3, 2018

craffel commented Dec 3, 2018 •

edited

bmcfee commented Dec 3, 2018

craffel commented Dec 3, 2018

bmcfee commented Dec 3, 2018

craffel commented Dec 3, 2018

bmcfee commented Dec 3, 2018

craffel commented Dec 4, 2018

Numerical stability issues in floating point comparisons (chord directional hamming) #295

Numerical stability issues in floating point comparisons (chord directional hamming) #295

Comments

bmcfee commented Dec 3, 2018

craffel commented Dec 3, 2018 • edited

bmcfee commented Dec 3, 2018

craffel commented Dec 3, 2018

bmcfee commented Dec 3, 2018

craffel commented Dec 3, 2018

bmcfee commented Dec 3, 2018

craffel commented Dec 4, 2018

craffel commented Dec 3, 2018 •

edited