Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

These words shouldn't be considered 'positive' #3

Open
tonyjiang opened this issue Jul 12, 2013 · 7 comments
Open

These words shouldn't be considered 'positive' #3

tonyjiang opened this issue Jul 12, 2013 · 7 comments

Comments

@tonyjiang
Copy link

0.875 ill-mannered
0.5 brutally
0.5 boneheaded
0.5 cynically
0.5 cutthroat
0.5 dishonestly
0.5 dishonestly

I wonder where you got the list?

@jemminger
Copy link
Member

Taken directly from here: https://github.com/cmaclell/Basic-Tweet-Sentiment-Analyzer

@jemminger
Copy link
Member

What values do you propose they be? Just make them negative?

@agarie
Copy link

agarie commented Dec 18, 2013

Maybe...

-0.5 ill-mannered
-0.40 boneheaded
-0.75 dishonestly
-0.875 brutally
-0.875 cynically
-0.875 cutthroat

I tried to compare these words to other found in sentiwords.txt. However, I'm not a native speaker, so I might be wrong.

@tonyjiang
Copy link
Author

@agarie - 'cynically' can't be as bad as 'brutally', nor is it worse than 'dishonestly'.

@agarie
Copy link

agarie commented Dec 18, 2013

@tonyjiang interesting. I didn't know about 'cynically' / 'dishonestly' (in portuguese it's actually the reverse). Do you have any suggestions?

@edmondlafay
Copy link

The true question is how the dictionaries was created :
These word lists are generally computed from large corpuses of texts annotated by hand as positive or negative. One of the simpler approches which has probably been used here is to run a program that sums the number of times a word is seen in positive and negative texts and normalizes it by the number of text it appears in. Therefore the words scores don't reflect the actual meaning of the word, but how people use them in texts.
The problem is that these scores are very dependent on the corpuses you use : if your corpus is from classic literature, your dictionaries will have more words, and the words in the texts will be used with a more literal manner than if your corpus is a set of random tweets where people use a limited amount of vocabulary and word meaning is more popular usage.
Therefore it is best to rebuild a dictionary for that better suits your usage of the gem, and the given dictionaries should just be used as default settings.

@cromulus
Copy link

cromulus commented Dec 29, 2016

Might I suggest this corpus of word sentiments?

Sentiword is the current best in class corpus of words and sentiments: http://sentiwordnet.isti.cnr.it/

It's a rather large DB, perhaps it might be useful as an alternative?

It's format is slightly different: each word is scored both as positive and negative, from 0 to 1. Some words have a score for both positive and negative. Perhaps we just subtract negative from positive?
Unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants