Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocorrelation feature extractor #302

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

hughrawlinson
Copy link
Member

Would require a minor bump.

There's a weird performance thing for v8, the code looks bad but actually it's there to trigger a compiler optimization. Happy to explain if someone's interested.

It includes tests, but the tests make me nervous, would love to try on more signals.

Someone should definitely have a look at the output and compare it with other implementations. It looks v suspicious to me, particularly seeing as the last half are zeroes.

@jakubfiala
Copy link
Collaborator

Hey @hughrawlinson, really sorry for getting round to this so late. I had a look at your impl and compared it with Matlab's xcorr, and an implementation using the IFFT method @nevosegal mentioned. The results are almost exactly the same at the beginning, the small differences might be because of f32/f64.

But the zeroes definitely shouldn't be there - I'm testing with a 128-sample sine wave, whose positive xcorr should be gradually approaching zero until the last sample.

Also, I think usually developers will expect a full xcorr with both negative and positive lags - that way it can be used for linear prediction and other things.

I have an implementation of the IFFT method we could potentially use, unless we're worried about the performance implications. Happy to chat about this privately if that's easier :)

@jakubfiala
Copy link
Collaborator

thanks a lot for doing this btw :)

@hughrawlinson
Copy link
Member Author

I'd love to use your implementation! Will review a PR any time :) As long as the performance is fine for realtime, I'm sold. I had trouble implementing this in realtime, there was a v8 issue where a deop was triggered so I had to get around that to make it run fast enough.

@hughrawlinson
Copy link
Member Author

It didn't even occur to me to do negative lags! Will get on it :)

@hughrawlinson hughrawlinson changed the base branch from v5 to master January 14, 2020 03:13
@hughrawlinson
Copy link
Member Author

@jakubfiala this passes tests now. Accurate to 0.0000018530201888518416 on the sin test (once I added a zero at the start). I think since it's so close, it's reasonable to call this done (pending feedback, obviously).

The one remaining mystery I have is in the source formula:
Screenshot 2020-01-14 at 09 44 33

There were lots of equations and implementations all over the internet that I looked for - I initially wanted to implement a generic 'correlation of two signals' function, but couldn't find a formula that I could understand. There were some generic implementations that seemed parameterizable, and I didn't particularly want to let users configure which correlation equation was to be used under the hood, nor configure further lower level stuff. I found the above formula in these lecture notes at the very end. I implemented the extractor based on that, and I understand it all except the term dt. Obviously the tests pass regardless of that term, which I guess is just 1 in this case - but I would love to know what it is. I'm sure it's related to the term ΔT, further up the notes, but I'm not sure what the relationship is. Clearly, I have a lot of reading to do.

@nevosegal
Copy link
Collaborator

Sorry I didn't join the conversation earlier. The term dt just means that we're integrating this over t. It's not something you multiply against. Note that this equation is for continuous signals - it looks a bit different for discrete signals.

In any case, regarding the benchmarking Jakub did (comparing time-domain and freq-domain autocorrelation): Even if there isn't big difference in computation speed (although there should be - O(N^2) vs Nlog(N)) I still believe doing that it in the frequency domain makes more sense as we already computed all of the components needed for it. The way I understand it, we only need to perform an IFFT on the power spectrum (magnitude spectrum ^ 2) to get the autocorrelation result.

This is just an optimisation so it shouldn't block the PR - thanks a lot for working on this @hughrawlinson !!

@hughrawlinson
Copy link
Member Author

hughrawlinson commented Jan 15, 2020

Hey,

I tried implementing it based on IFFT like you suggested, using fftjs. I didn't quite get the result I expected.

Expected result:
Screenshot 2020-01-15 at 17 09 47

Actual result:
Screenshot 2020-01-15 at 17 09 55

Forgetting about the normalization errors - the second one is upside down, and cropped. I can fix the fact that the length is off by zero padding I would guess. Any ideas?

@nevosegal
Copy link
Collaborator

Oh. I think fftjs always return half of the FFT result to be more efficient, and that might not be good for this use case... I'll check it out.

@nevosegal
Copy link
Collaborator

@hughrawlinson What are you comparing the result to? Autocorrelation output from librosa?

@hughrawlinson
Copy link
Member Author

@nevosegal I'm comparing against an autocorrelation result that @jakubfiala gave me from an implementation of autocorrelation that he did.

@hughrawlinson
Copy link
Member Author

Our long nightmare is over, GitHub Copilot implemented autocorrelation for us lol

function mean(x) {
	var sum = 0;
	for (var i = 0; i < x.length; i++) {
		sum += x[i];
	}
	return sum / x.length;
}

function autocorrelation(x, y) {
	var n = x.length;
	var meanX = mean(x);
	var meanY = mean(y);
	var sum = 0;
	for (var i = 0; i < n; i++) {
		sum += (x[i] - meanX) * (y[i] - meanY);
	}
	return sum / (n - 1);
}

@hughrawlinson hughrawlinson changed the base branch from master to main October 31, 2021 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants