Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to compare features to estimate the similarity of two signals? #268

Open
fenomas opened this issue Oct 5, 2018 · 3 comments · May be fixed by #844
Open

How to compare features to estimate the similarity of two signals? #268

fenomas opened this issue Oct 5, 2018 · 3 comments · May be fixed by #844

Comments

@fenomas
Copy link

fenomas commented Oct 5, 2018

Hi, this is a question about how to apply meyda! Hope that's okay.

How might one generally use extracted features to estimate the perceptual difference between two sounds? That is, I'm trying to define an error function that returns a high value when comparing a piano note to a snare drum, a low value when comparing two different snare drums, etc.

Right now I am taking a naive, straightforward approach - I loop through in bins of ~512 samples, extracting various features (mfcc, rms, chroma, etc.), and summing up the total difference in feature values. This sort of works as a rough baseline, but obviously it's very lacking - it tends to find very large feature differences between sounds that are perceptually identical to a human.

Are there known ways of approaching this - e.g. combinations of features to use, or ways of calculating the error between two sets of extracted features?

Thanks!

@pulakk
Copy link

pulakk commented Nov 24, 2018

I guess you could simply normalize and compare the MFCC features using DTW (Dynamic Time Warping), instead of using all the features. Most features like rms and energy won't even help distinguish two audio signals' perceptual difference to humans when they are sounds of two distinct instruments.

@hughrawlinson
Copy link
Member

Hi, this is a question about how to apply meyda! Hope that's okay.

Completely okay!

For sound similarity, I would usually pick the audio features I care most about (for example, I might be looking for sounds that have similar brightnesses and noisinesses, but not care about the loudness, so I would pick spectral centroid and spectral flatness). Then, I would represent each sound as a vector of audio features (so [spectralCentroid, spectralFlatness]), and then the euclidean distance between two sounds represents their similarity in those dimensions. Choosing your features is very important because it helps you tune what your similarity measurement actually measures.

If you would like to get a better similarity metric, you could build a model to weight each of the dimensions of the vector. You might for example have a UI that plays two pairs of sounds and ask a user to pick which "similarity" number is more accurate. From that data you could build a set of weightings for each dimension of the vector.

Another approach, not using meyda, is to take your sounds and train a convolutional autoencoder on their signals directly. This will develop an embedding of the sounds, which you can use as a vector on which to measure distance with euclidean distance, but it's tuned specifically to your dataset.

I hope one of those approaches helps!

I'm going to leave this issue open as a reminder to me to write a guide on this - it should really be part of Meyda's docs.

@derekpankaew
Copy link
Contributor

derekpankaew commented Dec 5, 2018

You might also want to check out this library. They used Shazam's paper to reconstruct the audio fingerprinting algorithm Shazam uses. In other words, you can take "fingerprints" of audio at set intervals, in the form of hashes. You can then compare future hashes to see if you're listening to a clip from the same audio.

It might be different than what you're looking for, but thought I'd share anyway. A lot of the work is about finding similarities in sound clips. The Shazam paper is a very interesting read also, if you want to construct something for similar use cases.

Package:

https://www.npmjs.com/package/stream-audio-fingerprint

Shazam Paper:

http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

hughrawlinson added a commit that referenced this issue May 29, 2021
This guide is unfinished, storing it here.

fix #268
@hughrawlinson hughrawlinson linked a pull request May 29, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants