HaRe

HaRe (Harassment Recognizer) is a command line tool and Python library to automatically detect harassment as it happens (real-time) with the help of machine learning techniques.

1. Using HaRe as command line tool

The monitoring tool can be started by running bin/hare monitor <pretrained_model_name>, and listens on localhost:11118. If you want to run it on another address or port, you can add <address>:<port>, for example:

$bin/hare monitor moba localhost:1234
Loading pretrained model moba...
HaRe is listening on localhost:1234. To start monitoring the first conversation, send either an HTTP PUT request to localhost:1234/yourconvoid, or use the convenient clien tool: 'bin/hare client convo yourconvoid'

You can then send HTTP PUT, POST and GET requests to this process. For convenience, HaRe includes a separate tool bin/hare client to send these requests quickly from the command line. It will use localhost:11118 by default, but this can be overridden by including the address and host somewhere in the arguments.

Start a new conversation:

bin/hare client convo <yourconvoid>
Does an HTTP PUT to address:port/yourconvoid
Example: bin/hare client convo myconvo does an HTTP PUT to localhost:11118/myconvo

Add an utterance:

bin/hare client utterance <yourconvoid> <yourspeakerid> <the rest of the words are for the utterance>
Does an HTTP POST to address:port/yourconvoid
Example: bin/hare client utterance myconvo speakera good luck everyone does an HTTP POST to localhost:11118/myconvo with this data:

{"utterance":"good luck everyone","speaker":"a"}

Check the status:

bin/hare client status <yourconvoid>
Does an HTTP GET to address:port/yourconvoid
Example: bin/hare client status myconvo does an HTTP GET to `localhost:11118/myconvo', and will return something in this format:

TODO

2. Using HaRe as Python library

2.1 Basic usage

The easiest way to use HaRe is by simply loading a pretrained HaRe model included with this repo in the models folder, like the one named 'moba':

from hare import load_pretrained

moba_hare = load_pretrained('moba')

You can then use this object to monitor conversations in progress. Let's start a conversation and ask HaRe to monitor it:

from hare import Conversation

convo = Conversation()
moba_hare.add_conversation(convo)

At any point in time, you can then request the current status of the conversation according to this HaRe model:

convo.add_utterance(speaker='a',content='hello')
convo.add_utterance(speaker='b',content='hi everyone')
moba_hare.get_status()

You can also add multiple sentences at once; for example a whole conversation if it has already finished.

from hare import Utterance

convo.add_utterances([Utterance(speaker='a',content='good luck'),
                          Utterance(speaker='c',content='ur all n00bs')])
moba_hare.get_status()

If you add multiple conversations for Hare to monitor, you will need to specify the conversation index when asking for the status:

second_convo = Conversation()
convo_index = moba_hare.add_conversation(second_convo)

second_convo.add_utterance(speaker='a',content='hello')
second_convo.add_utterance(speaker='b',content='hi everyone')

moba_hare.get_status(id=convo_index)

2.2 Evaluating

If you have a labeled dataset (that is: for each conversation an indication which participants are considered toxic), HaRe can calculate to what extent its judgments match the labels. A label can range from the default 0 (not toxic at all) to 1 (maximally toxic). Let's label speaker c:

convo.label_speaker('c',0.9)

There are several evaluation metrics, depending on what is important to you (detecting ALL harassment, detecting harassment quickly, no false positives, etc):

moba_hare.calculate_accuracy()

These metrics are calculated on the basis on all conversations the HaRe object is aware of that have at least 1 labeled participant. If you want to exclude a label from evaluation, simply add it to the conversations_excluded_for_evaluation list:

moba_hare.conversations_excluded_for_evaluation = [0]

2.3 Training

At some point, you might want to do some training yourself. This can for example be the case because you are applying HaRe in another domain than the pretrained models, and harassment looks slightly different there, or because you even want to detect something different than harassment.

from hare import Hare, Utterance

new_hare = Hare()

new_hare.start_conversation(conversation_id='convo_001')
new_hare.label_speaker('b',1)
new_hare.add_utterances([Utterance(speaker='a',content='good luck'),
                          Utterance(speaker='b',content='ur all n00bs')])

new_hare.start_conversation(conversation_id='convo_002')
new_hare.label_speaker('b',1)
new_hare.add_utterances([Utterance(speaker='a',content='hi'),
                          Utterance(speaker='b',content='SHUT UP!')])

new_hare.train()

It will probably be highly effective to use so-called word embeddings during training. You can see these embeddings as a dictionary that translates from a word's characters to an estimate of its meaning. Research has shown that classifying these 'meanings' is much more successful that classifying the raw words. [reference] . To use them, simply point your HaRe object to the embedding file in the embeddings folder before training:

new_hare.embedding_file = 'english_large'
new_hare.train()

Of course, these embeddings should be in the same language as the rest of your dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
bin		bin
data_processing_scripts		data_processing_scripts
datasets/LoL		datasets/LoL
examples		examples
experiments		experiments
hare		hare
hare_env_python36/share/python-wheels		hare_env_python36/share/python-wheels
publications		publications
.gitignore		.gitignore
HaRe.ipynb		HaRe.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

data_processing_scripts

data_processing_scripts

datasets/LoL

datasets/LoL

examples

examples

experiments

experiments

hare

hare

hare_env_python36/share/python-wheels

hare_env_python36/share/python-wheels

publications

publications

.gitignore

.gitignore

HaRe.ipynb

HaRe.ipynb

LICENSE.md

LICENSE.md

README.md

README.md

todo.txt

todo.txt

Repository files navigation

HaRe

1. Using HaRe as command line tool

2. Using HaRe as Python library

2.1 Basic usage

2.2 Evaluating

2.3 Training

About

Releases

Packages

Languages

License

Woseseltops/HaRe

Folders and files

Latest commit

History

Repository files navigation

HaRe

1. Using HaRe as command line tool

2. Using HaRe as Python library

2.1 Basic usage

2.2 Evaluating

2.3 Training

About

Resources

License

Stars

Watchers

Forks

Languages