🙅 Inaccurate model coref predictions master thread #215

svlandeg · 2019-10-16T08:44:23Z

Master thread for collecting incorrect and/or problematic coreference predictions with the pretrained models. These can be interesting test cases when training the next version of the model.

petulla · 2019-11-21T16:52:16Z

updated to include article url, doh *

For this article, the model struggles with NASA's James Webb Space Telescope.

This is the mentions array:

[Mauna Kea: [Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, it, Mauna Kea, Mauna Kea], Hawaii: [Hawaii, Hawaii, Hawaii, Hawaii, Hawaii, its, Hawaii, Hawaii, Hawaii, Hawaii], Spain: [Spain, Spain, Spain, Spain, Spain, Spain], Earth: [Earth, Earth, Earth, Earth], astronomers: [astronomers, their], the world's largest telescope in Hawaii: [the world's largest telescope in Hawaii, the telescope, the telescope, it, the Webb telescope, the telescope, it, Thirty Meter Telescope, Its, the telescope on La Palma, the telescope in Spain, this telescope], the islands': [the islands', their], Meter Telescope officials: [Meter Telescope officials, their], their backup site atop a peak on the Spanish Canary island of La Palma: [their backup site atop a peak on the Spanish Canary island of La Palma, it, it, the site], La Palma: [La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma], Mauna Kea's: [Mauna Kea's, Mauna Kea's], Bolte, who has used existing Mauna Kea telescopes: [Bolte, who has used existing Mauna Kea telescopes, he], Bolte: [Bolte, Bolte, Bolte, The telescope group's Bolte], Webb: [Webb, Webb], Mather: [Mather, He, Mather, he], bright stars: [bright stars, them], Loeb: [astrophysicist Avi Loeb, who chairs Harvard University's astronomy department, Loeb, Loeb, he], The Native Hawaiian opponents: [The Native Hawaiian opponents, themselves, their, They], the telescope group: [the telescope group, The telescope group], protest leader Kealoha Pisciotta: [protest leader Kealoha Pisciotta, Pisciotta], Thirty Meter Telescope officials: [Thirty Meter Telescope officials, they], the Canary Islands: [the Canary Islands, the Canary Islands, the Canary Islands], Others: [Others, their, their], Jos Manuel Vilchez, an astronomer with Spain's Higher Council of Scientific Research and a former member of the scientific committee of the Astrophysics Institute of the Canary Islands: [Jos Manuel Vilchez, an astronomer with Spain's Higher Council of Scientific Research and a former member of the scientific committee of the Astrophysics Institute of the Canary Islands, We, We], Vilchez: [Vilchez, Vilchez, Vilchez, Vilchez], Native Hawaiians: [Native Hawaiians, their, they, Native Hawaiians]]

Webb is broken out as if it is a last name when it is the part of the telescope's name. In general the model struggles to tell the difference between the two telescopes mentioned in the article.

I'm wondering if Bert Span-based model might be an option for the next release? I tried the above text on it and it is slightly better (though still imperfect). https://github.com/mandarjoshi90/coref

Atul-Anand-Jha · 2019-12-24T09:14:36Z

Hey @svlandeg ,
I have observed that the live demo on https://huggingface.co/coref/ is surprisingly correct at multiple cases, when My local implementation of the same model fails.
To ensure this furthermore, I have tried to resolve co-reference with both of the models, latest one - neuralcorefv4.0 and neuralcoref-lg-3.0.0. And, our results are way poorer than your live demo. I am attaching a screenshot for your better understanding of the situation. Have a look, and please respond.
Does the live demo implements a different model than both of the above mentioned ones. if So; How can we get them to implement in our project.?????

Fig: our implementation results ( neuralcoref v3 + neuralcored v4)

Fig: your live demo result

EvanFabry · 2020-01-28T10:07:12Z

Hey @svlandeg ,
I have observed that the live demo on https://huggingface.co/coref/ is surprisingly correct at multiple cases, when My local implementation of the same model fails.
To ensure this furthermore, I have tried to resolve co-reference with both of the models, latest one - neuralcorefv4.0 and neuralcoref-lg-3.0.0. And, our results are way poorer than your live demo. I am attaching a screenshot for your better understanding of the situation. Have a look, and please respond.
Does the live demo implements a different model than both of the above mentioned ones. if So; How can we get them to implement in our project.?????

Fig: our implementation results ( neuralcoref v3 + neuralcored v4)

Fig: your live demo result

+1. I've noticed discrepancies between performance locally and in the dev environment. @svlandeg @thomwolf, can you comment on what exactly is currently served by the demo environment?

svlandeg · 2020-02-09T22:08:01Z

I wasn't involved with this project when the demo environment was created. However, note that it's not just the version that was trained that makes a difference, but also the specific hyperparameters used for making the predictions. So that is definitely something you can "play" with too.

Atul-Anand-Jha · 2020-02-10T07:15:40Z

I wasn't involved with this project when the demo environment was created. However, note that it's not just the version that was trained that makes a difference, but also the specific hyperparameters used for making the predictions. So that is definitely something you can "play" with too.

Thanks,
I actually tried different options for these Hyper-parameters, But, none of the model release' uploaded here could match the demo one.

aereobert · 2020-03-15T03:02:13Z

Same here.

With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.

In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.

I am wondering how exactly to reproduce the result on the sample page.

Thank you very much!

@svlandeg @thomwolf

resolved by compiling spacy 2.1.0 and neuralcoref from source code.

aamin3 · 2020-04-11T05:45:38Z

Same here.

With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.

In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.

I am wondering how exactly to reproduce the result on the sample page.

Thank you very much!

@svlandeg @thomwolf

resolved by compiling spacy 2.1.0 and neuralcoref from source code.

hello,
you confirm that the demo results can be achieved by compiling spacy 2.1.0 and neuralcoref from source code?

aereobert · 2020-04-15T03:33:10Z

Same here.
With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.
In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.
I am wondering how exactly to reproduce the result on the sample page.
Thank you very much!
@svlandeg @thomwolf
resolved by compiling spacy 2.1.0 and neuralcoref from source code.

hello,
you confirm that the demo results can be achieved by compiling spacy 2.1.0 and neuralcoref from source code?

Not exactly. I am just saying that this would increase the accuracy on my side, from unusable to usable.

cfoster0 · 2020-04-17T22:52:36Z

Surprised to see the following.

On the example sentence in the README, neuralcoref predicts accurately:

But on a slight modification, where we switch sister to brother, and swap the pronouns, we get an incorrect prediction on the second sentence:

noelslice · 2020-06-18T15:00:53Z

It would be very helpful if someone could shed some light on what model and combination of package versions are used in the demo environment. Like others here I'm not able to reproduce what I see on the demo in my local setup, even when rolling spacy back to 2.1.3 and building neuralcoref from source. It feels like the model served in the demo environment is a different model or it was trained with different word embeddings. Are pretrained neuralcoref models tied to any specific spacy language model tag? I've played with the parameters like others here with some improvement but I'm still seeing systematic differences with the live demo.

pborysov · 2020-06-18T15:04:43Z

It would be very helpful if someone could shed some light on what model and combination of package versions are used in the demo environment. Like others here I'm not able to reproduce what I see on the demo in my local setup, even when rolling spacy back to 2.1.3 and building neuralcoref from source. It feels like the model served in the demo environment is a different model or it was trained with different word embeddings. Are pretrained neuralcoref models tied to any specific spacy language model tag? I've played with the parameters like others here with some improvement but I'm still seeing systematic differences with the live demo.

Totally agree!!! Online demo is an ideal starting point, but only if it is reproducible :(

Keating950 · 2020-07-27T18:56:43Z

I'm not able to share much in the way of text for confidentiality reasons, but I'm noticing that the pre-trained model seems to be gravitating toward resolving "us" to "We." It might be useful to be able to blacklist certain words (e.g. "We") as never being satisfactory coreferents.

\< It is not up to us to rectify things
\---
\> It is not up to We to rectify things

\< It is absolutely an issue, but not only to us
\---
\> It is absolutely an issue, but not only to We

neuralcoref 4.0
spacy 2.3.2

aamin3 · 2020-07-28T18:49:24Z

I agree- to have a more customizable blacklist (including they, it, these, who) would be wonderful. this is great tech as it is but just a suggestion

…

On Mon, Jul 27, 2020 at 2:57 PM Keating950 ***@***.***> wrote: I'm not able to share much in the way of text for confidentiality reasons, but I'm noticing that the pre-trained model seems to be gravitating toward resolving "us" to "We." It might be useful to be able to blacklist certain words (e.g. "We") as never been satisfactory coreferents. \< It is not up to us to rectify things \--- \> It is not up to We to rectify things \< It is absolutely an issue, but not only to us \--- \> It is absolutely an issue, but not only to We - neuralcoref 4.0 - spacy 2.3.2 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#215 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALGMOQXWSF373JA2LBJJMJDR5XEXZANCNFSM4JBIG4EQ> .

Keating950 · 2020-08-09T19:21:21Z

I agree- to have a more customizable blacklist (including they, it, these, who) would be wonderful. this is great tech as it is but just a suggestion

If you're interested in this feature, I've added to in my fork of this project. I'm still making sure it works, so I'm all ears to any feedback and review.

aamin3 · 2020-08-09T19:48:27Z

Thanks alot Keating950! I see that in your fork NO_COREF_LIST = ["i", "me", "my", "you", "your"] no longer exists in *train/document.py*, nor in *neuralcoref.pyx*. So to me it seems we do not place our custom blacklist directly in the source code. Does that mean each time neuralcoref is instantiated I just pass the custom blacklist? for example: *coref = neuralcoref.NeuralCoref(nlp.vocab, greedyness=0.75 , blacklist = ["i", "me", "my", "you", "your", "they", "their", "it"] )* Just verify I implement your fork as intended. Thanks

…

On Sun, Aug 9, 2020 at 3:21 PM Keating950 ***@***.***> wrote: I agree- to have a more customizable blacklist (including they, it, these, who) would be wonderful. this is great tech as it is but just a suggestion If you're interested in this feature, I've added it in my fork <https://github.com/Keating950/neuralcoref> of this project. I'm still making sure it works, so I'm all ears to any feedback and review. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#215 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALGMOQQCPSE3FI3ZWR7F6U3R73ZL5ANCNFSM4JBIG4EQ> .

Keating950 · 2020-08-10T02:31:13Z

Yup, that's exactly right. I've updated the README. Feel free to open an issue on that repo if you have any other questions.

aamin3 · 2020-08-11T15:37:17Z

Thanks alot man.

…

On Sun, Aug 9, 2020 at 10:31 PM Keating950 ***@***.***> wrote: Yup, that's exactly right. I've updated the README. Feel free to open an issue on that repo if you have any other questions. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#215 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALGMOQTPEWADAXYJ75QCNWLR75LX5ANCNFSM4JBIG4EQ> .

lauwauw · 2021-05-28T20:42:16Z

Thanks for your work @Keating950! Very helpful!

Keating950 · 2021-05-30T14:42:08Z

@lauwauw Thanks! I've merged in the latest changes from this repo in light of the renewed interest.

svlandeg added the perf / accuracy label Oct 16, 2019

svlandeg mentioned this issue Oct 16, 2019

Getting the relation between the plural object and its individual #176

Closed

svlandeg pinned this issue Oct 16, 2019

This was referenced Oct 16, 2019

Ways to fix incorrect coref resolution? #198

Closed

customising dictionary for identifying proper nouns #191

Closed

Multiple Subjects in Long Sentence Does Not Work #181

Closed

svlandeg changed the title ~~Inaccurate model coref predictions master thread~~ 🙅 Inaccurate model coref predictions master thread Oct 17, 2019

svlandeg changed the title ~~🙅 Inaccurate model coref predictions master thread~~ Inaccurate model coref predictions master thread Oct 17, 2019

svlandeg changed the title ~~Inaccurate model coref predictions master thread~~ 🙅 Inaccurate model coref predictions master thread Oct 17, 2019

svlandeg mentioned this issue Dec 7, 2019

unable to get few references #229

Closed

noelslice mentioned this issue Jul 15, 2020

add SCONJ to REMOVE_POS to exclude subordinating conjunction from mention span detection #276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🙅 Inaccurate model coref predictions master thread #215

🙅 Inaccurate model coref predictions master thread #215

svlandeg commented Oct 16, 2019

petulla commented Nov 21, 2019 •

edited

Atul-Anand-Jha commented Dec 24, 2019

EvanFabry commented Jan 28, 2020 •

edited

svlandeg commented Feb 9, 2020

Atul-Anand-Jha commented Feb 10, 2020

aereobert commented Mar 15, 2020 •

edited

aamin3 commented Apr 11, 2020

aereobert commented Apr 15, 2020 •

edited

cfoster0 commented Apr 17, 2020

noelslice commented Jun 18, 2020

pborysov commented Jun 18, 2020

Keating950 commented Jul 27, 2020 •

edited

aamin3 commented Jul 28, 2020 via email

Keating950 commented Aug 9, 2020 •

edited

aamin3 commented Aug 9, 2020 via email

Keating950 commented Aug 10, 2020

aamin3 commented Aug 11, 2020 via email

lauwauw commented May 28, 2021

Keating950 commented May 30, 2021

🙅 Inaccurate model coref predictions master thread #215

🙅 Inaccurate model coref predictions master thread #215

Comments

svlandeg commented Oct 16, 2019

petulla commented Nov 21, 2019 • edited

Atul-Anand-Jha commented Dec 24, 2019

EvanFabry commented Jan 28, 2020 • edited

svlandeg commented Feb 9, 2020

Atul-Anand-Jha commented Feb 10, 2020

aereobert commented Mar 15, 2020 • edited

aamin3 commented Apr 11, 2020

aereobert commented Apr 15, 2020 • edited

cfoster0 commented Apr 17, 2020

noelslice commented Jun 18, 2020

pborysov commented Jun 18, 2020

Keating950 commented Jul 27, 2020 • edited

aamin3 commented Jul 28, 2020 via email

Keating950 commented Aug 9, 2020 • edited

aamin3 commented Aug 9, 2020 via email

Keating950 commented Aug 10, 2020

aamin3 commented Aug 11, 2020 via email

lauwauw commented May 28, 2021

Keating950 commented May 30, 2021

petulla commented Nov 21, 2019 •

edited

EvanFabry commented Jan 28, 2020 •

edited

aereobert commented Mar 15, 2020 •

edited

aereobert commented Apr 15, 2020 •

edited

Keating950 commented Jul 27, 2020 •

edited

Keating950 commented Aug 9, 2020 •

edited