Skip to content

Romanian multidomain human-machine dataset and detection of machine generated text

License

Notifications You must be signed in to change notification settings

readerbench/ro-mgt-detection

Repository files navigation

ro-human-machine-60k

Romanian multidomain human-machine dataset (publicly available at: https://huggingface.co/datasets/readerbench/ro-human-machine-60k).

Domain Method Model Avg TTR Aggregate
Books Human Human 0.7447 11,208
Completion RoGPT2 0.6615
Completion GPT-Neo-Ro 0.7011
Completion davinci-003 0.6125
Backtranslation davinci-003 0.7652
Paraphrasing Flan-T5 0.8708
Backtranslation Opus-MT 0.7581
Backtranslation mBART 0.7379
News Human Human 0.6510 34,560
Completion RoGPT2 0.6762
Completion GPT-Neo-Ro 0.6867
Completion davinci-003 0.6508
Backtranslation davinci-003 0.7798
Paraphrasing Flan-T5 0.8389
Backtranslation Opus-MT 0.6589
Backtranslation mBART 0.7024
Medical Human Human 0.6911 4,456
Completion RoGPT2 0.6795
Completion GPT-Neo-Ro 0.6893
Completion davinci-003 0.6262
Backtranslation davinci-003 0.7510
Paraphrasing Flan-T5 0.8503
Backtranslation Opus-MT 0.7490
Backtranslation mBART 0.7618
Legal Human Human 0.7264 8,000
Completion RoGPT2 0.6542
Completion GPT-Neo-Ro 0.6880
Completion davinci-003 0.5828
Backtranslation davinci-003 0.7987
Paraphrasing Flan-T5 0.8418
Backtranslation Opus-MT 0.7231
Backtranslation mBART 0.7514
RoCHI Human Human 0.6234 872
Completion RoGPT2 0.6901
Completion GPT-Neo-Ro 0.5460
Completion davinci-003 0.5810
Backtranslation davinci-003 0.7514
Paraphrasing Flan-T5 0.8356
Backtranslation Opus-MT 0.6032
Backtranslation mBART 0.7477
Total 59,096

detection of machine generated text

MGT

paper

Cite this as: Nitu, M.; Dascalu, M. Beyond Lexical Boundaries: LLM-Generated Text Detection for Romanian Digital Libraries. Future Internet 2024, 16, 41. https://doi.org/10.3390/fi16020041