Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typescript - incorrect type when using verbose_json as the whisper transcription response_format #702

Open
1 task done
jessebs opened this issue Mar 3, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@jessebs
Copy link

jessebs commented Mar 3, 2024

Confirm this is a Node library issue and not an underlying OpenAI API issue

  • This is an issue with the Node library

Describe the bug

With whisper, while using the verbose_json response_format parameter, the audio.transcriptions.create returns a type Transcription, which does not include the extra details from verbose_json

To Reproduce

See the code snippet

Code snippets

const response = await openAIClient.audio.transcriptions.create({
  model: 'whisper-1',
  file: fileStream,
  response_format: 'verbose_json',
  timestamp_granularities: ['segment']
})

console.log(response.text)
// @ts-ignore
console.log(response['language']) // language isn't part of the Transcription interface

OS

macOS

Node version

18.16.1

Library version

openai v4.28.4

@jessebs jessebs added the bug Something isn't working label Mar 3, 2024
@rattrayalex
Copy link
Collaborator

We hope to add support for this in the coming months.

@rattrayalex rattrayalex added enhancement New feature or request and removed bug Something isn't working labels Mar 5, 2024
@dereckmezquita
Copy link

dereckmezquita commented Mar 16, 2024

I just ran this today and I get this back from the response includes langauge:

const transcription = await this.client.audio.transcriptions.create({
    file: fs.createReadStream(tempFileName),
    response_format: 'verbose_json',
    model: 'whisper-1'
});
{
  task: "transcribe",
  language: "english",
  duration: 2.0399999618530273,
  text: "Hello World and all the bunnies!",
  segments: [
    {
      id: 0,
      seek: 0,
      start: 0,
      end: 2,
      text: " Hello World and all the bunnies!",
      tokens: [ 50364, 2425, 3937, 293, 439, 264, 6702, 40549, 0, 50464 ],
      temperature: 0,
      avg_logprob: -0.5200682878494263,
      compression_ratio: 0.8421052694320679,
      no_speech_prob: 0.017731403931975365,
    }
  ],
}

I'm piggy backing off this issue, I did notice that if I set return type to 'text' the return type is expected is a Transcription type object, but I receive a simple string. This is fine but it confuses TypeScript I have to do:

return result as unknown as string;

Question: are more tokens required/credits used if I request 'verbose_json' vs 'text'?

@wrogati
Copy link

wrogati commented Mar 24, 2024

Hello everyone,

I encountered the same issue regarding the return object. As a temporary workaround in my project, I added an interface based on the documentation to better handle the function's return.

Another approach could be to use a fork of the project and implement this fix. However, this always necessitates staying vigilant for possible updates and conflicts from the original repository.

To address this, I've submitted a Pull Request with the correction, hoping the maintainers will integrate this fix. Let's wait and see.

@dereckmezquita Regarding the question of whether the cost differs depending on the type of return, I'm not certain, but I believe it does not. The billing is based on the generation of data counted by tokens, not the size of the response. What the documentation makes clear is that requesting the "verbose_json" response in "words" segments increases latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants