Timestamps are broken for whisper large with WhisperForConditionalGeneration #30433

kamilakesbi · 2024-04-23T15:32:44Z

System Info

System Info

transformers version: 4.40.0.dev0
Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.22.2
Safetensors version: 0.4.2
Accelerate version: 0.29.1
Accelerate config: not found
PyTorch version (GPU?): 2.2.2+cu121 (True)
Tensorflow version (GPU?): 2.13.1 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
Jax version: 0.4.13
JaxLib version: 0.4.13
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

@kamilakesbi @sanchit-gandhi

Reproduction

Timestamps are broken for whisper-large-v3 when used with WhisperForConditionalGeneration:

Note: This issue is related to #30224

Reproduction

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch 
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

processor = WhisperProcessor.from_pretrained("openai/whisper-large-v3")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v3")
model.to(device)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
speech_samples = ds.sort("id").select(range(4))[:4]["audio"]

input_speech = [x["array"] for x in speech_samples]
features = processor.feature_extractor(raw_speech=input_speech, return_tensors="pt")

input_features = features.input_features.to(device)
generate_kwargs = {}

generate_outputs = model.generate(
    input_features, return_timestamps=True, return_token_timestamps=True, **generate_kwargs
)
print(generate_outputs.token_timestamps)

We get:

tensor([[ 0.0000,  0.0000, 29.3000, 29.3000, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800],
        [ 0.0000,  0.0000, 29.3000, 29.3000, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800, 29.9800,
         29.9800],...

indicating that the timestamps are broken...

The text was updated successfully, but these errors were encountered:

sanchit-gandhi · 2024-04-23T16:08:46Z

This notebook should be useful: https://github.com/sanchit-gandhi/codesnippets/blob/main/whisper-word-level.ipynb

While we fix this issue, we can also consider how to make the API simpler for users, since it currently requires some post-processing outside the model + processor API

nakranivaibhav · 2024-04-26T12:47:35Z

@sanchit-gandhi is this issue open for taking?

sanchit-gandhi assigned kamilakesbi Apr 23, 2024

sanchit-gandhi added Audio Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! labels Apr 23, 2024

kamilakesbi mentioned this issue May 14, 2024

add return_token_timestamps to WhisperProcessor #30812

Merged

amyeroberts closed this as completed in #30812 May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamps are broken for whisper large with WhisperForConditionalGeneration #30433

Timestamps are broken for whisper large with WhisperForConditionalGeneration #30433

kamilakesbi commented Apr 23, 2024 •

edited

sanchit-gandhi commented Apr 23, 2024 •

edited

nakranivaibhav commented Apr 26, 2024

Timestamps are broken for whisper large with WhisperForConditionalGeneration #30433

Timestamps are broken for whisper large with WhisperForConditionalGeneration #30433

Comments

kamilakesbi commented Apr 23, 2024 • edited

System Info

Reproduction

sanchit-gandhi commented Apr 23, 2024 • edited

nakranivaibhav commented Apr 26, 2024

kamilakesbi commented Apr 23, 2024 •

edited

sanchit-gandhi commented Apr 23, 2024 •

edited