Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

false positive #56

Open
ghost opened this issue Jul 9, 2022 · 5 comments
Open

false positive #56

ghost opened this issue Jul 9, 2022 · 5 comments

Comments

@ghost
Copy link

ghost commented Jul 9, 2022

wtf? for some reason this message is flagged as toxic:
"who selling lup pots"
can you fix? using original data set

@anitavero
Copy link
Contributor

Thanks for reporting this example.
If you notice any pattern in the examples the models flag falsely as toxic, it would be very useful if you could share it.
In order for us to improve the models some useful information would be:

  • the type of model you ran
  • the data you ran it on
  • false positive / false negative examples and patterns (grammar, topic etc) you noticed

@smasterparth
Copy link

Hey,
Even I have come across this False Positive issue.
I was working with a model to detect offensive text in a given dataset.
For example, I had few records having string as Shital, which is a name, not an offensive word. So few of such records were being classified as Toxic while rest as Non-Toxic. Same was a case with records having word 'Nishit' that's also a name.

I tried to find out any pattern for being classified as toxic for few records and rest time as non-toxic, but nothing was there to be noticed.

Let me know if there's any work around you guys have come up or working on it.

@anitavero
Copy link
Contributor

anitavero commented Jul 15, 2022

It matters a lot which version of the model you use: "original", "unbiased" or "multilingal".

  • For me "unbiased" solves the problem with "Nishit", and mildly mitigates with "Shital" (although it's still high).
  • "who selling lup pots" isn't flagged as toxic for me by any of the models.
from detoxify import Detoxify

input_text = ['Shital', 'Nishit', "who selling lup pots"]
model_u = Detoxify('unbiased')
model_o = Detoxify('original')
model_m = Detoxify('multilingual')

results_u = model_u.predict(input_text)
results_o = model_o.predict(input_text)
results_m = model_m.predict(input_text)

print("Original", pd.DataFrame(results_o, index=input_text).round(2))
print("Multilingual", pd.DataFrame(results_m, index=input_text).round(2))
print("Unbiased", pd.DataFrame(results_u, index=input_text).round(2))

This outputs:

Original                       toxicity  severe_toxicity  obscene  threat  insult  identity_attack
Shital                    0.82             0.01     0.57    0.00    0.05             0.00
Nishit                    0.71             0.04     0.52    0.01    0.39             0.24
who selling lup pots      0.00             0.00     0.00    0.00    0.00             0.00
Multilingual                       toxicity  severe_toxicity  obscene  identity_attack  insult  threat  sexual_explicit
Shital                    0.82             0.00     0.54              0.0    0.41     0.0             0.01
Nishit                    0.87             0.01     0.82              0.0    0.14     0.0             0.02
who selling lup pots      0.01             0.00     0.00              0.0    0.00     0.0             0.00
Unbiased                       toxicity  severe_toxicity  obscene  identity_attack  insult  threat  sexual_explicit
Shital                    0.67              0.0     0.21              0.0    0.03     0.0             0.52
Nishit                    0.06              0.0     0.01              0.0    0.01     0.0             0.00
who selling lup pots      0.01              0.0     0.00              0.0    0.00     0.0             0.00

Let us know if you find any other issues!
If you could attach model outputs similar to the above one, that would be really helpful!

@ogencoglu
Copy link

@anitavero
original model also outputs very high false positive toxicity value for the following text: "They had great sex!"

{'toxicity': 0.88951826, 'severe_toxicity': 0.0110040745, 'obscene': 0.4631456, 'threat': 0.0027411387, 'insult': 0.021174002, 'identity_attack': 0.0034398066}

@ogencoglu
Copy link

Also for this one:
"Sucking power of this vacuum cleaner is great!"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants