Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical stability issues in floating point comparisons (chord directional hamming) #295

Open
bmcfee opened this issue Dec 3, 2018 · 7 comments
Labels

Comments

@bmcfee
Copy link
Collaborator

bmcfee commented Dec 3, 2018

I ran into a funny bug just now, and wanted to get some opinions on whether this should be considered an mir_eval problem or not.

This check in the directional hamming distance function looks for disjoint time intervals by comparing the end of one interval to the start of the next.

This is fine in principle, but I have a few (reference) files which are stored in JSON (jams) format, and for whatever reason, deserializing them messes up the floating point precision just right so that it triggers an error. For example:

In [4]: J = jams.load('/home/bmcfee/data/crema/chords/TRAIIEF149E3861F6C.jams')                                                                                                                                    

In [5]: ints, labs = J.annotations[0].to_interval_values()                                                                                                                                                         

In [6]: ints[1, 1]                                                                                                                                                                                                 
Out[6]: 2.126712

In [7]: ints[2, 0]                                                                                                                                                                                                 
Out[7]: 2.1267119990000003

On the one hand, this is clearly a data error, and it's reasonable for mir_eval to fail here.

On the other hand, this is not a serious data error, and it's one that we can expect to pop up any time we're dealing with numerical data through text serialization, which is basically every use case of mir_eval.

Given that, what do people think of rounding the time intervals (say to the millisecond level) prior to checking? That is, just add lines to the effect of

reference_intervals = np.around(reference_intervals, decimals=3)

before the overlap check. For my money, I see no downside to this, but I'd like to get some opinions before opening a PR.

@bmcfee bmcfee added the question label Dec 3, 2018
@craffel
Copy link
Owner

craffel commented Dec 3, 2018

Does this cause an error or violated assumption later on, or do things work out? In other words, should we round solely for the check, or should we round any input to the corresponding metric functions?

I feel a little odd rounding user data, though we already sort of do it elsewhere in the library. It feels like it would make more sense to round in io and then if someone uses their own IO we let them deal with any resulting weirdness, but I do not feel strongly about this since it probably won't make a big difference and is probably what the user wants.

@bmcfee
Copy link
Collaborator Author

bmcfee commented Dec 3, 2018

Does this cause an error or violated assumption later on, or do things work out?

It depends on the metric. Many of the metrics convert (labeled) intervals to samples at some fixed rate (eg 10Hz), in which case, rounding to the millisecond will not change the results.

Some of the metrics (directional hamming) use precise differences rather than sampling, and in this case, there should be negligible difference if we round the results.

The places where it could get dicey are in something like onset detection or pitch detection, where the tolerances are already quite small (5ms), and maybe a lower tolerance would be appropriate there (1e-4, for instance).

Because it depends on the metric / task, I'm hesitant to recommend rolling the rounding into io, which happens upstream of the metrics.

@craffel
Copy link
Owner

craffel commented Dec 3, 2018

Sorry, in that question I was referring specifically to the check in chord, not any of the other tasks. Should the chord metrics have their intervals rounded just for the check, or for the metric function itself?

@bmcfee
Copy link
Collaborator Author

bmcfee commented Dec 3, 2018

Sorry, in that question I was referring specifically to the check in chord, not any of the other tasks. Should the chord metrics have their intervals rounded just for the check, or for the metric function itself?

I think it should be for both.

@craffel
Copy link
Owner

craffel commented Dec 3, 2018

Because if we don't round it things break? Or for consistency and because it's probably the right thing to do? Or what?

@bmcfee
Copy link
Collaborator Author

bmcfee commented Dec 3, 2018

Or for consistency and because it's probably the right thing to do? Or what?

Yeah, sorry -- if we round the timing to validate it, we should compute on the rounded data because it's been validated.

@craffel
Copy link
Owner

craffel commented Dec 4, 2018

Ok, do you think we should only fix this in chord or are there other places we should round annotations?

@bmcfee bmcfee added this to the 0.8 milestone Mar 13, 2024
@bmcfee bmcfee removed this from the 0.8 milestone Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants