Endogenous Attention Allocation

This repository contains the description and implementation of Endogenous Attention Allocation (EAA), a generative probabilistic model for natural language processing (NLP). I developed this model as part of my master's thesis titled Strategic Issue Selection and Ideological Polarization: Evidence from the Congressional Record Data and defended at the New Economic School, Moscow in 2013.

Essentially, EAA modifies the Latent Dirichlet Allocation (LDA) model by incorporating document-level features. On one hand, such features can be used for improving the interpretability of the resulting topics. On the other, they can be of substantive interest, e.g., if one wants to understand the variation of topics across individuals and over time. The model uses nonconjugate priors. The optimization procedure relies on variational approximations and yields empirical Bayes estimates of the parameters. The code uses combination of Python and Cython, resulting in relatively fast performance. I thank Radim Řehůřek, whose gensim library provided inspiration for certain parts of the code.

I wrote the thesis during 2012-2013. EAA was developed independently from what later became known as the Structural Topic Model, a related approach that also builds on LDA. My primary goal was to understand when and why legislators in the U.S. Congress prioritized certain issues over others, as reflected in their speeches on the House or Senate floor. I conceptualized issue selection as a discrete choice problem in a random utility setting, with substantive political issues roughly corresponding to topics of speech transcripts. The model was trained and validated using speech transcripts from the Congressional Record pertaining to the 110th Congress (2007-2008). A slightly edited version of the thesis, which describes the algorithm itself, the setting, the data, and the main results, accompanies the code (eaa.pdf). The repository also contains additional scripts for preprocessing and merging the raw data.

I stopped developing EAA in 2013. The algorithm can be executed using the Congressional Record as input data by running the script run_eaa.py (make sure to edit the paths and parameters in user_config.py as necessary). Note: if you use the modern versions of the respective Python libraries, the code may require some modifications.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
corpora		corpora
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
approx_routines.c		approx_routines.c
approx_routines.pxd		approx_routines.pxd
approx_routines.pyx		approx_routines.pyx
approx_routines.so		approx_routines.so
augment_people_xml.py		augment_people_xml.py
definitions.c		definitions.c
definitions.pxd		definitions.pxd
definitions.pyx		definitions.pyx
definitions.so		definitions.so
eaa.pdf		eaa.pdf
eaamodel.c		eaamodel.c
eaamodel.pxd		eaamodel.pxd
eaamodel.pyx		eaamodel.pyx
eaamodel.so		eaamodel.so
froutines.c		froutines.c
froutines.pxd		froutines.pxd
froutines.pyx		froutines.pyx
froutines.so		froutines.so
froutines_double.pyx		froutines_double.pyx
froutines_mpfr.pyx		froutines_mpfr.pyx
interfaces.py		interfaces.py
interfaces.pyc		interfaces.pyc
misc.c		misc.c
misc.pxd		misc.pxd
misc.pyx		misc.pyx
misc.so		misc.so
organize_cngrec.py		organize_cngrec.py
process_cngrec.py		process_cngrec.py
run_eaa.py		run_eaa.py
setup.py		setup.py
store_hdf5.py		store_hdf5.py
store_hdf5.pyc		store_hdf5.pyc
user_config.py		user_config.py
user_config.pyc		user_config.pyc
utils.py		utils.py
utils.pyc		utils.pyc

License

inurutdinov/eaa

Folders and files

Latest commit

History

Repository files navigation

Endogenous Attention Allocation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages