Skip to content

This software is an implementation of Author Profiling Model in 4 languages. 1. English 2. Arabic 3. Portugese 4. Spanish Authors are profiled on the basis of Gender and Region. The Model is trained over PAN 2107 provided Twitter data of various users. Currently for size concerns 20 data files from each language is included into pan folder for t…

License

Notifications You must be signed in to change notification settings

j-ahmadkhan/Author-Profiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Author-Profiling

This software is an implementation of Author Profiling Model in 4 languages. 1. English 2. Arabic 3. Portugese 4. Spanish

Authors are profiled on the basis of Gender and Region.

The Model is trained over PAN 2107 provided Twitter data of various users. The training configurtions are kept in StartupPath/Configs directory for each language.

Various text patterns are kept into account for building the model like the language textual words used to express anger, happiness, tech, confusion, love etc. that one can find in StartupPath/

Currently for size concerns 20 data files from each language is included into pan folder for test and run purposes.

One can find complete details of model at http://ceur-ws.org/Vol-1866/paper_52.pdf

Contribution guidelines The Model is implemented in C# and compiled with Visual Studio 2010.

The default path for processing files is StartupPath/pan/ The default path for processed results is StartupPath/Results/

Users can also provide input path with -i switch and output path with -o switch

The output is in Json format

This software was presented to take part in CLEF PAN 2017 contest and is uploaded as is.

Anyone can use/improve/change it for academic and research purposes.

Who do I talk to? for queries j_ahmadkhan@yahoo.co

About

This software is an implementation of Author Profiling Model in 4 languages. 1. English 2. Arabic 3. Portugese 4. Spanish Authors are profiled on the basis of Gender and Region. The Model is trained over PAN 2107 provided Twitter data of various users. Currently for size concerns 20 data files from each language is included into pan folder for t…

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published