A Russian dataset with nested named entities, relations, events and linked entities.
Added: Linked entities
First version:
- Nested named entities
- Events
- Relations
No. | Entity type | No. | Entity type | No. | Entity type |
---|---|---|---|---|---|
1. | AGE | 11. | FAMILY | 21. | PENALTY |
2. | AWARD | 12. | IDEOLOGY | 22. | PERCENT |
3. | CITY | 13. | LANGUAGE | 23. | PERSON |
4. | COUNTRY | 14. | LAW | 24. | PRODUCT |
5. | CRIME | 15. | LOCATION | 25. | PROFESSION |
6. | DATE | 16. | MONEY | 26. | RELIGION |
7. | DISEASE | 17. | NATIONALITY | 27. | STATE_OR_PROV |
8. | DISTRICT | 18. | NUMBER | 28. | TIME |
9. | EVENT | 19. | ORDINAL | 29. | WORK_OF_ART |
10. | FACILITY | 20. | ORGANIZATION |
- Biaffine model
- Pyramid model
- SpERT
- MRC model (Machine Reading Comprehension)
Word representations used with all models are fastText (fT) and pre-trained RuBERT-cased embeddings.
For more details, please see here.
Method | P | R | F1 |
---|---|---|---|
Biaffine, fT | 81.64 | 77.69 | 79.62 |
Biaffine, RuBERT, ft | 80.71 | 77.84 | 79.25 |
Pyramid, fT | 75.87 | 72.40 | 74.09 |
Pyramid, RuBERT, ft | 79.54 | 79.91 | 79.73 |
SpERT, RuBERT | 82.90 | 82.14 | 82.52 |
MRC | 85.04 | 84.95 | 84.99 |
No. | Relation type | No. | Relation type | No. | Relation type |
---|---|---|---|---|---|
1. | ABBREVIATION | 18. | HEADQUARTERED_IN | 35. | PLACE_RESIDES_IN |
2. | AGE_DIED_AT | 19. | IDEOLOGY_OF | 36. | POINT_IN_TIME |
3. | AGE_IS | 20. | INANIMATE_INVOLVED | 37. | PRICE_OF |
4. | AGENT | 21. | INCOME | 38. | PRODUCES |
5. | ALTERNATIVE_NAME | 22. | KNOWS | 39. | RELATIVE |
6. | AWARDED_WITH | 23. | LOCATED_IN | 40. | RELIGION_OF |
7. | CAUSE_OF_DEATH | 24. | MEDICAL_CONDITION | 41. | SCHOOLS_ATTENDED |
8. | CONVICTED_OF | 25. | MEMBER_OF | 42. | SIBLING |
9. | DATE_DEFUNCT_IN | 26. | ORGANIZES | 43. | SPOUSE |
10. | DATE_FOUNDED_IN | 27. | ORIGINS_FROM | 44. | START_TIME |
11. | DATE_OF_BIRTH | 28. | OWNER_OF | 45. | SUBEVENT_OF |
12. | DATE_OF_CREATION | 29. | PARENT_OF | 46. | SUBORDINATE_OF |
13. | DATE_OF_DEATH | 30. | PART_OF | 47. | TAKES_PLACE_IN |
14. | END_TIME | 31. | PARTICIPANT_IN | 48. | WORKPLACE |
15. | EXPENDITURE | 32. | PENALIZED_AS | 49. | WORKS_AS |
16. | FOUNDED_BY | 33. | PLACE_OF_BIRTH | ||
17. | HAS_CAUSE | 34. | PLACE_OF_DEATH |
- OpenNRE model
- IntModel
The encoders used with SpanBERT and OpenNRE are multilingual BERT and RuBERT.
Method | P | R | F1 |
---|---|---|---|
In-sentence relations | |||
OpenNRE, mBERT | 81.7 | 81.6 | 81.7 |
OpenNRE, RuBERT | 85.3 | 84.6 | 84.9 |
SpanBERT, mBERT | 76.8 | 75.4 | 76.1 |
SpanBERT, RuBERT | 77.4 | 78.6 | 78.0 |
TRE | 66.4 | 68.1 | 67.2 |
In-sentence nested relations | |||
OpenNRE, mBERT | 74.3 | 77.7 | 76.0 |
OpenNRE, RuBERT | 77.8 | 79.6 | 78.7 |
IntModel | 76.3 | 72.4 | 74.3 |
Document-level relations | |||
OpenNRE, mBERT | 35.7 | 51.2 | 42.1 |
OpenNRE, RuBERT | 52.1 | 51.3 | 51.7 |
📓 Update 1 November 2023: this collection is now available in arekit-ss for a quick sampling of contexts with most subject-object relation mentions with just single script into
JSONL/CSV/SqLite
including (optional) language transfering 🔥 [Learn more ...]
NEREL-BIO is an extension of the NEREL dataset, introducing biomedical entity types in addition to the general-domain entities.
If you find this repository helpful, feel free to cite our papers:
[1] Loukachevitch N. et al. NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links. Language Resources and Evaluation (2023). https://doi.org/10.1007/s10579-023-09674-z
@article{loukachevitch2023nerel,
title={NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links},
author={Loukachevitch, Natalia and Artemova, Ekaterina and Batura, Tatiana and Braslavski, Pavel and Ivanov, Vladimir and Manandhar, Suresh and Pugachev, Alexander and Rozhkov, Igor and Shelmanov, Artem and Tutubalina, Elena and others},
journal={Language Resources and Evaluation},
pages={1--37},
year={2023},
publisher={Springer}
}
[2] Loukachevitch N., Artemova E., Batura T., Braslavski P., Denisov I., Ivanov V., Manandhar S., Pugachev A., Tutubalina E. NEREL: A Russian Dataset with Nested Named Entities, Relations and Events. Proceedings of RANLP. 2021. pp. 880–889.
@inproceedings{loukachevitch2021nerel,
title={{NEREL: A Russian} Dataset with Nested Named Entities, Relations and Events},
author={Loukachevitch, Natalia and Artemova, Ekaterina and Batura, Tatiana and Braslavski, Pavel and Denisov, Ilia and Ivanov, Vladimir and Manandhar, Suresh and Pugachev, Alexander and Tutubalina, Elena},
booktitle={Proceedings of RANLP},
pages={876--885},
year={2021}
}
[3] Loukachevitch N., Braslavski P., Ivanov V., Batura T., Manandhar S., Shelmanov A., Tutubalina E. Entity Linking over Nested Named Entities for Russian. Proceedings of LREC. 2022. pp. 4458–4466.
@inproceedings{nerel-el-nne,
title={{Entity Linking over Nested Named Entities for Russian}},
author={Loukachevitch, Natalia and Braslavski, Pavel and Ivanov, Vladimir and Batura, Tatiana and Manandhar, Suresh and Shelmanov, Artem and Tutubalina, Elena},
booktitle={Proceedings of LREC},
year={2022},
}