Evaluation Dataset in "Response Generation" - "end-to-end models" part #130

nqchieutb01 · 2023-09-20T03:07:56Z

Is this part used multiwoz 2.2 or multiwoz 2.0 as a benchmark dataset.
I'm so confused, in RewardNet, Mars, KRLS original paper, all results are the same as your table, but they all reported in multiwoz 2.0 dataset. Morever, in the TOATOD paper, authors reported combined score in multiwoz 2.2 dataset.
Is there any mistakes. Can you explain this inconsistent.
Thanks !

comprehensiveMap · 2023-11-03T16:37:38Z

Is this part used multiwoz 2.2 or multiwoz 2.0 as a benchmark dataset. I'm so confused, in RewardNet, Mars, KRLS original paper, all results are the same as your table, but they all reported in multiwoz 2.0 dataset. Morever, in the TOATOD paper, authors reported combined score in multiwoz 2.2 dataset. Is there any mistakes. Can you explain this inconsistent. Thanks !

I have the same doubt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Dataset in "Response Generation" - "end-to-end models" part #130

Evaluation Dataset in "Response Generation" - "end-to-end models" part #130

nqchieutb01 commented Sep 20, 2023

comprehensiveMap commented Nov 3, 2023

Evaluation Dataset in "Response Generation" - "end-to-end models" part #130

Evaluation Dataset in "Response Generation" - "end-to-end models" part #130

Comments

nqchieutb01 commented Sep 20, 2023

comprehensiveMap commented Nov 3, 2023