Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"time_frequency" sonification as introduced in mir_eval 0.5 makes it hard to hear chord changes #310

Open
jonathandriedger opened this issue Jan 23, 2019 · 7 comments

Comments

@jonathandriedger
Copy link

As described in #255, there was an issue with crackling sound in the time_frequency sonification function, which was fixed by adding some amplitude envelope interpolation. Although the implemented fix indeed prevents any crackling from happening, it also makes it very hard to hear, for example, the timing of chord changes in the sonification due to the very smooth transitions.

In the attached example, you can hear the original audio in the left channel and the sonification of a chord-estimate in the right.

example.zip

Maybe we could add a switch parameter for being able to choose between smooth transitions (without crackling but lack of "temporal resolution") and crisp transitions (potential crackling but clear transitions)?

All the best,
Jonathan

@bmcfee
Copy link
Collaborator

bmcfee commented Jan 23, 2019

I've also noticed this while qualitatively testing chord models, and I agree that it's confusing to listen to.

It's been a while since I looked at this code, but I wonder how difficult it would be to add a flag to clip time-frequency sonification at zero-crossings rather than taper to zero by interpolation? That way, we can still have crisp transitions without crackle.

@jonathandriedger
Copy link
Author

jonathandriedger commented Jan 23, 2019

I started wrapping my mind around the implementation a bit: I believe the problem here is that time_frequency should be capable of handling two rather different kinds of "grams":

  • those with fixed, usually comparably high time resolution and varying amplitudes across bins such as magnitude spectrograms
  • those that reflect rather long temporal intervals in each of their columns and are more of binary nature, not reflecting volume differences (such as in the case of the chord sonification).

I am not 100% sure, but I believe that your suggestion of clipping the individual waveforms at the last possible zero crossing could be problematic in the first scenario, since one would somehow need to ensure that no "gaps" in the wave would occur between temporally neighbored time-frequency bins

I have a different solution though: With the current implementation, it is only the long intervals that are problematic. So one could simple split each of those long intervals up into three new ones: One very short "attack" interval at the beginning, a very short "decay" interval at the end and the remainder interval on the middle

The following function implements this solution:

def prepare_gram_for_time_frequency_sonification(gram, times, max_interval_len=0.2):

    if times.ndim == 1:
        times = util.boundaries_to_intervals(times)

    mod_gram_inds = []
    mod_times = []
    for m in range(gram.shape[1]):
        if times[m,1] - times[m,0] > max_interval_len:
            mod_gram_inds += [m,m,m]
            mod_times.append(np.array([times[m,0],times[m,0]+max_interval_len/3]))
            mod_times.append(np.array([times[m,0]+max_interval_len/3,times[m,1]-max_interval_len/3]))
            mod_times.append(np.array([times[m,1]-max_interval_len/3,times[m,1]]))
        else:
            mod_gram_inds.append(m)
            mod_times.append(times[m,:])
    mod_times = np.array(mod_times)
    mod_gram = gram[:,mod_gram_inds]

    return mod_gram, mod_times

(I am not a very experienced Python Programmer, so please excuse the "non-Pythonic" style)

@craffel
Copy link
Owner

craffel commented Jan 24, 2019

I didn't realize people were using it for the second use-case you had listed; in that case the interpolation doesn't really make sense. I think it makes sense to interpolate over the minimum of the interval length or some pre-defined short interval. Does that make sense?

@bmcfee
Copy link
Collaborator

bmcfee commented Jan 24, 2019

I think it makes sense to interpolate over the minimum of the interval length or some pre-defined short interval.

How about interpolating over, say, two cycles at the frequency being synthesized?

@jonathandriedger
Copy link
Author

jonathandriedger commented Jan 24, 2019

@craffel The second use-case is exactly what happens when you call mir_eval.sonify.chords(...). Each column of the internally constructed gram corresponds to one interval/chord-label in the original given chord sequence and therefore, each column also corresponds to the full duration of a chord (which can potentially be VERY long, even for real-world examples).

I think your suggestion of interpolating at a fixed, potentially even frequency-dependent rate is very good! I'll see if I can come up with something.

@craffel
Copy link
Owner

craffel commented Jan 24, 2019

How about interpolating over, say, two cycles at the frequency being synthesized?

This seems fine unless there's an interval which is shorter than two cycles of the frequency. Then again if the interval is that short the user should expect it to sound clicky.

@bmcfee
Copy link
Collaborator

bmcfee commented Jan 24, 2019

This seems fine unless there's an interval which is shorter than two cycles of the frequency. Then again if the interval is that short the user should expect it to sound clicky.

Exactly: if that's the case, then you wouldn't perceive it as a tone anyway. I guess one cycle of fade-in and one of fade-out would be sufficient. If the interval is less than two cycles, this reduces nicely to a triangle window whose height is inversely proportional to the base. This would effectively blunt out any impulses due to short intervals (as opposed to being due to zc alignment), which seems like a nice property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants