Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new algo audio2pitch #1413

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from
Draft

Conversation

xaviliz
Copy link
Contributor

@xaviliz xaviliz commented May 13, 2024

This pull request add a new essntia algorithm, Audio2Pitch. It has been designed for real time pitch extraction.

  • src/algorithms/tonal/audio2pitch.cpp
  • src/algorithms/tonal/audio2pitch.h
  • test/src/unittests/tonal/test_audio2pitch.py

Before we review everything it would be nice to have an idea of which test could be useful and any suggestions you could have. Kindly.

Comment on lines 59 to 60
declareParameter("minFrequency", "minimum frequency to detect in Hz", "[10,20000]", 60.f);
declareParameter("maxFrequency", "maximum frequency to detect in Hz", "[10,20000]", 2300.f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use such a floating-point notation in the rest of the algorithms (use 60.0, 2300.0, etc.)

declareParameter("pitchAlgorithm", "pitch algorithm to use", "{pyin,pyin_fft}", "pyin_fft");
declareParameter("loudnessAlgorithm", "loudness algorithm to use", "{loudness,rms}", "rms");
declareParameter("weighting", "string to assign a weighting function", "{default,A,B,C,D,Z}", "default");
declareParameter("tolerance", "sets tolerance for peak detection on pitch algorithm", "[0,1]", 1.0f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

declareParameter("maxFrequency", "maximum frequency to detect in Hz", "[10,20000]", 2300.f);
declareParameter("pitchAlgorithm", "pitch algorithm to use", "{pyin,pyin_fft}", "pyin_fft");
declareParameter("loudnessAlgorithm", "loudness algorithm to use", "{loudness,rms}", "rms");
declareParameter("weighting", "string to assign a weighting function", "{default,A,B,C,D,Z}", "default");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what default does. No weighting? Then we can rename it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default PitchYinFFT apply a weighting function defined here we can think in another name though. Otherwise we can add a comment like Default PitchYinFFt weighting function ?

declareParameter("loudnessAlgorithm", "loudness algorithm to use", "{loudness,rms}", "rms");
declareParameter("weighting", "string to assign a weighting function", "{default,A,B,C,D,Z}", "default");
declareParameter("tolerance", "sets tolerance for peak detection on pitch algorithm", "[0,1]", 1.0f);
declareParameter("pitchConfidenceThreshold", "level of pitch confidence above which note ON/OFF start to be considered", "[0,1]", 0.25);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More consistent description: "above/below which note ON/OFF start to be considered"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, what do you think about level of pitch confidence above used for voiced frame detection?

declareParameter("weighting", "string to assign a weighting function", "{default,A,B,C,D,Z}", "default");
declareParameter("tolerance", "sets tolerance for peak detection on pitch algorithm", "[0,1]", 1.0f);
declareParameter("pitchConfidenceThreshold", "level of pitch confidence above which note ON/OFF start to be considered", "[0,1]", 0.25);
declareParameter("loudnessThreshold", "loudness level above which note ON/OFF start to be considered, in linear values", "[0,1]", 0.0031); // ~ -50dB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, would it be more intuitive for a user to be able to specify dB values instead? However, we can keep using linear values for consistency with the outputs of Loudness and RMS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, decibels are easy to handle. I think we can make the conversion, it won't be an issue. Indeed we need dBs to for velocity conversion. Let me check which would be the impact.


const char* Audio2Pitch::name = "Audio2Pitch";
const char* Audio2Pitch::category = "Pitch";
const char* Audio2Pitch::description = DOC("Extractor algorithm to compute pitch with several possible pitch algorithms, specifically targeted for real-time pitch detection on saxophone signals.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in all algorithms this DOC string starts with a sentence "This algorithm ..." summarizing what the algorithm does, which is then auto-parsed by doc generation script to create short descriptions for the algorithm index. Let's keep it similar.

throw EssentiaException("Audio2Pitch: Max frequency cannot be higher than Nyquist frequency");
}
if (_maxFrequency <= _minFrequency) {
throw EssentiaException("Audio2Pitch: Max frequency cannot be lower than min frequency");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lower or equal"

}

if (_pitchAlgorithmName != "pyin_fft" && _pitchAlgorithmName != "pyin") {
E_INFO("Audio2Pitch: 'pitchAlgorithm' = "<<_pitchAlgorithmName<<"\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to add the parameter value directly to the exception message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm... because operands are invalid in the exception message:
../src/algorithms/tonal/audio2pitch.cpp:60:69: error: invalid operands to binary expression ('const char [39]' and 'std::string' (aka 'basic_string<char, char_traits<char>, allocator<char> >')) throw EssentiaException("Audio2Pitch: Bad 'loudnessAlgorithm' ="<<_loudnessAlgorithmName<<"\n");
Can you confirm that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I have seen how you use to do that in some other algorithm like here:
throw EssentiaException(fullName(), ": Could not push 1 value, output buffer is full");

_loudnessAlgorithm = AlgorithmFactory::create("RMS");
}
else {
E_INFO("Audio2Pitch: 'loudnessAlgorithm' = "<<_loudnessAlgorithmName<<"\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we can provide the parameter value directly in the exception message.

throw EssentiaException("Audio2Pitch: Bad 'loudnessAlgorithm' parameter");
}

// switch between pyin and pyin_fft to propagate the weighting parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are directly using parameter strings for comparison, but isSpectral is used in other parts of the code. Is this because there could be multiple FFT-based algorithms in the future? If not, we could simplify and get rid of isSpectral.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, this was because we started including other algorithms like PitchMelodia and in the python prototype we have more algorithms based on spectral processing. I think it might be removed if we don't plan to include others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants