GitHub

#Party Bias Clustering of US Legislators

Our project focuses on analyzing the language used by US legislators. We want to know how their language varies by party and state. We also want to know how their language changes over time.

Scripts are used roughly in the following order:

congressPdfPuller.pl | congress pdf massive downloader
congTextToCSV.py | Imports function from updated parsing file READMEtextToCSV.txt
listCongressNumberAndFilenameInCSV.pl | just a helper tool to extract distinct congress number and date
remove2001.py | Removes 2001 dates from congressionalRecords.csv since we're only interested in 2006+
c_processor_final.py | Preprocessor to parse congressional record PDF's into CSV's legislators-current.json.txt | biographical information for current legislators legislators-historical.json.txt | biographical information for past legislators
splitCsvByDate.pl | splits processed CSV's by date
{tfidf calculation in tf_idf folder}
tfidfDiffs.py | calculates Delta TF-IDF's
allYearsTFIDF_Diff.py | runs the tfidfDiffs.py script over all the congressional records
fixCongress82.py | Fixes mistake where congress = 82
parallel_script.sh | added extra formatting to fix some errors in filenames
gnuParallelList.txt | removed parallel delta tf-idf script and appended original.
cleanedTableData.html | added html table parser and html file to get approval disapproval scores
htmlTableParser.pl | took percent stuff off

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
Plotting		Plotting
tf_idf		tf_idf
FinalReport.pdf		FinalReport.pdf
README.md		README.md
READMEtextToCSV.txt		READMEtextToCSV.txt
allYearsTFIDF_Diff.py		allYearsTFIDF_Diff.py
c_processor_final.py		c_processor_final.py
cleanedTableData.html		cleanedTableData.html
congTextToCSV.py		congTextToCSV.py
congressPdfPuller.pl		congressPdfPuller.pl
fixCongress82.py		fixCongress82.py
gnuParallelList.txt		gnuParallelList.txt
htmlTableParser.pl		htmlTableParser.pl
legislators-current.json.txt		legislators-current.json.txt
legislators-historical.json.txt		legislators-historical.json.txt
listCongressNumberAndFilenameInCSV.pl		listCongressNumberAndFilenameInCSV.pl
parallel_script.sh		parallel_script.sh
remove2001.py		remove2001.py
splitCsvByDate.pl		splitCsvByDate.pl
tfidfDiffs.py		tfidfDiffs.py

STA160/mainProject

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages