GitHub - EMBEDDIA/PAN2019: Code for PAN author profiling task experiments, International Conference of the CLEF Association (CLEF 2019)

Code for experiments conducted in the papers 'Who is hot and who is not? profiling celebs on twitter' and 'Fake or not: Distinguishing between bots, males and females' submitted to the Tenth International Conference of the CLEF Association (CLEF 2019) as part of the PAN author profiling task

Please cite these papers [bib] if you use this code.

Installation, documentation

Published results were produced in Python 3 programming environment on Linux Mint 18 Cinnamon operating system. Instructions for installation assume the usage of PyPI package manager.

To get the source code, clone the repository from github with 'git clone https://github.com/EMBEDDIA/PAN2019'

Data for the bot vs male vs female classification can be downloaded from here:
https://zenodo.org/record/3692340#.YAARLNYo-Uk
Data for the celebrity classification can be downloaded from here:
https://zenodo.org/record/3885373#.YAASeNYo-Uk

Install dependencies if needed: pip install -r requirements.txt

We have added a Jupyter notebook (see gender/src/example_usage.ipynb) in order to explain specific steps in the code.

To reproduce the results of celebrity classification published in the paper run the code in the command line using following commands:

Read data and generate features:

python parse_data.py --num_samples 100 --train_corpus pathToTrainCorpus --train_labels pathToTrainLabels --feature_folder pathToOutputFeatureFolder --all_data

Remove the '--all_data' flag if you want to reproduce the results on the evaluation set. If the flag is removed, 3837 examples are removed from the train set and used as a validation set.

Evaluate on development set:

python evaluate.py --feature_folder pathToOutputFeatureFolder

Generate test set predictions:

python test.py --input pathToTestCorpus --output pathToResultsFolder --feature_folder pathToOutputFeatureFolder

To reproduce the results of bot vs male vs female classification published in the paper run the code in the command line using following commands:

Read data and generate features:

python parse_data.py --train_corpus pathToTrainCorpus --feature_folder pathToOutputFeatureFolder

Evaluate on development set:

python evaluate.py --feature_folder pathToOutputFeatureFolder

Generate test set predictions:

python test.py --input pathToTestCorpus --output pathToResultsFolder --feature_folder pathToOutputFeatureFolder

Contributors to the code

Matej Martinc, Blaž Škrlj

Knowledge Technologies Department, Jožef Stefan Institute, Ljubljana

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
celebrity/src		celebrity/src
gender/src		gender/src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
bibtext.js		bibtext.js
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

celebrity/src

celebrity/src

gender/src

gender/src

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

bibtext.js

bibtext.js

requirements.txt

requirements.txt

Repository files navigation

Code for experiments conducted in the papers 'Who is hot and who is not? profiling celebs on twitter' and 'Fake or not: Distinguishing between bots, males and females' submitted to the Tenth International Conference of the CLEF Association (CLEF 2019) as part of the PAN author profiling task

Installation, documentation

We have added a Jupyter notebook (see gender/src/example_usage.ipynb) in order to explain specific steps in the code.

To reproduce the results of celebrity classification published in the paper run the code in the command line using following commands:

To reproduce the results of bot vs male vs female classification published in the paper run the code in the command line using following commands:

Contributors to the code

About

Releases

Packages

Languages

License

EMBEDDIA/PAN2019

Folders and files

Latest commit

History

Repository files navigation

Code for experiments conducted in the papers 'Who is hot and who is not? profiling celebs on twitter' and 'Fake or not: Distinguishing between bots, males and females' submitted to the Tenth International Conference of the CLEF Association (CLEF 2019) as part of the PAN author profiling task

Installation, documentation

We have added a Jupyter notebook (see gender/src/example_usage.ipynb) in order to explain specific steps in the code.

To reproduce the results of celebrity classification published in the paper run the code in the command line using following commands:

To reproduce the results of bot vs male vs female classification published in the paper run the code in the command line using following commands:

Contributors to the code

About

Resources

License

Stars

Watchers

Forks

Languages