Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech API: Cannot use explicit_decoding_config with encoding = ENCODING_UNSPECIFIED #7289

Open
jfradj opened this issue May 4, 2024 · 0 comments

Comments

@jfradj
Copy link

jfradj commented May 4, 2024

Hello,

I want to use the speech API to convert speech into text.


TL;DR

Using:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
]); 

Throws that error:

Invalid audio channel count value: 0. Values must be non-negative.

While using:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
    'audio_channel_count' => 2,
]); 

Throws that error:

The RecognitionConfig proto is invalid:
  * explicit_decoding_config.audio_channel_count: audio_channel_count isn't supported by the set encoding

Long and detailed version for the courageous ones :)

Environment details

  • OS: MacOS Sonoma 14.3 (23D56)
  • PHP version: PHP 8.2.17
  • Package name and version: google/cloud-speech 1.18.2

Steps to reproduce

I'm working on audio .aac files (generated by Instagram).
I tried the online GUI (https://console.cloud.google.com/speech/transcriptions) to try if the .acc file would be supported and it worked =>
Capture d’écran 2024-05-04 à 07 41 53

When using the GUI, after uploading the file I have a warning Unable to automatically detect audio information. Please review your audio file and enter the relevant fields manually.
So I fill fields manually:

  • Encoding = ENCODING_UNSPECIFIED
  • Sample rate = 16000
  • Channel count remains empty

This worked as shown on the screenshot above.

Then I wanted to do the same thing by code using the google/cloud-speech package.

I tried to use the auto_decoding_config option but got the following error:

Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.

Which is the same behavior as the GUI.

So I tried to use the explicit_decoding_config parameter and it failed.
See code below.

Code example

$audioFile = 'https://lookaside.fbsbx.com/ig_messaging_cdn/?asset_id=374095301647771&signature=AbxHJBUywVeA26a-1lSTIeODgXgrAsmxD7pCjaxDo7nNowZZvgE_3fC5jMA3H-9UX7AtT7vdNe3N772RgQpNbgBsvmfp3eT439xW14QykJsqVfvg0aC_GVOJ6sBLBhqDyEzDv7Vt08pCStD0dHvG7PHcL7Gp4RvddKRT_TSYVBQP3PTFPiECX9PsMK528lRG4FaYYIAXN4sBcyeIZsRK6EiiWxo_6g';

$client = new Google\Cloud\Speech\V2\Client\SpeechClient();

$content = file_get_contents($audioFile);

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
]);

$config = new Google\Cloud\Speech\V2\RecognitionConfig([
    'explicit_decoding_config' => $explicitConfig,
    'language_codes' => ['en-EN'],
    'model' => 'latest_long',
]);

$request = new RecognizeRequest([
    'recognizer' => 'projects/{MY_PROJECT_ID}/locations/global/recognizers/_',
    'config' => $config,
    'content' => $content,
]);

$response = $client->recognize($request);
$results = $response->getResults();

foreach ($results as $result) {
    $alternatives = $result->getAlternatives();
    $mostLikely = $alternatives[0];
    $transcript = $mostLikely->getTranscript();
    $confidence = $mostLikely->getConfidence();
    printf('Transcript: %s' . PHP_EOL, $transcript);
    printf('Confidence: %s' . PHP_EOL, $confidence);
}

This code throw the following error:

Invalid audio channel count value: 0. Values must be non-negative.

And setting the audio channel like this:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
    'audio_channel_count' => 2,
]); 

Throw that error:

The RecognitionConfig proto is invalid:
  * explicit_decoding_config.audio_channel_count: audio_channel_count isn't supported by the set encoding

Thanks for your help.

Regards,
Johann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant