Cannot reproduce videochatgpt video benchmark #175

Leo-Yuyang · 2024-05-16T03:51:15Z

Dear author, I found that in your paper, you claimed a very impressive performance on videochatgpt video benchmark. However, I didn't find related code about reproducing this experiment. So I modified the mvbench evaluation code to inference on this task. But I can't reproduce it.

What I got is:

The model is the right one because I used the same model and reproduced the performance on MVbench.
So the only difference might be the different prompt used when testing the dataset, however I can't really believe that a different prompt can lead to such a big difference.
So could you please give me the prompt used when testing this benchmark or the inference result of this? Extraordinary claims require extraordinary evidence.

Andy1621 · 2024-05-20T02:20:54Z

Can you reproduce the results of other models？ Recently, one paper revealed that the GPT version affects the final results.

Andy1621 · 2024-05-20T06:15:29Z

I have checked the history and we use gpt-3.5-turbo by default. Considering the testing time, we may use gpt-3.5-turbo-1106 and the results are as follows:

completed_files: 1996
incomplete_files: 0
All evaluation completed!
Average score for correctness: 3.020541082164329
completed_files: 1996
incomplete_files: 0
All evaluation completed!
Average score for detailed orientation: 2.875250501002004
completed_files: 1996
incomplete_files: 0
All evaluation completed!
Average score for contextual understanding: 3.509018036072144
completed_files: 499
incomplete_files: 0
All evaluation completed!
Average score temporal understanding: 2.661322645290581
completed_files: 499
incomplete_files: 0
All evaluation completed!
Average score for consistency: 2.8076152304609217

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce videochatgpt video benchmark #175

Cannot reproduce videochatgpt video benchmark #175

Leo-Yuyang commented May 16, 2024

Andy1621 commented May 20, 2024

Andy1621 commented May 20, 2024

Cannot reproduce videochatgpt video benchmark #175

Cannot reproduce videochatgpt video benchmark #175

Comments

Leo-Yuyang commented May 16, 2024

Andy1621 commented May 20, 2024

Andy1621 commented May 20, 2024