Skip to content

nealcaren/quant-text-fall-2014

Repository files navigation

**Soci 950 - Words to Numbers: Quantitative Text Analysis **
Fall 2014
Hamilton Hall 150
Monday, Wednesday 1-2:15pm
Neal Caren neal.caren@unc.edu

There's nothing new about sociologists using text as data. Traditionally, source materials have come from things like primary sources (e.g. interviews) or secondary sources (e.g. newspapers). Scholars use programs like NVivo or ATLAS.ti to assist them in making sense of the data. Over the last decade, however, researchers from a variety of disciplines have increasingly turned to a more algorithmic analysis of texts. This new focus on the quantitative analysis of text (along with network analysis and agent based modeling) forms the basis of computational social science. Not coincidentally, the ability of social scientists to collect text corpuses has also grown over the last decade. This combination of new methods and new sources of data presents a unique opportunity for social scientists to find new answers to old questions and start asking new questions.

The primary learning goal for this course if for students to develop the ability to employ appropriate quantitative textual analysis techniques to a social scientific question. In other words, you should be able to write a publishable paper that involves the quantitative analysis of text. Specifically, I expect that by the end of the semester you will be:

  • Able to collect, store and manipulate data from text files, web pages, and web application programming interfaces (APIs);
  • Familiar with the major methods of text analysis;
  • Knowledgeable of relevant machine learning techniques;
  • Able to apply relevant analytic methods to appropriate social scientific questions.

Between most class meetings you will have to do some combinations of the following things. First, you will be reading contemporary examples of social scientific research that employs the relevant methods. Second, you will be reading code of worked examples. Quite often, this code will be in the form of IPython notebooks. Third, you will have to produce some code yourself. For the first few days, this code will take advantage of the codecademy Python MOOC. After that, you'll be writing your own code that you will either bring to class or email to me ahead of class. Finally, we'll spend the last section of the course working on a pair of studies. For each, you and your partner are responsible for presenting your code and findings.

Half of you grade will be based on the daily homework. They are marked with a H on the syllabus. The other half of your grade is based on the two projects. For each project, you will be expected to present your findings and hand in a well-commented IPython notebook so that someone can replicate your findings. The first project involves an analysis of political emails. The second involves contemporary newspaper data. In both cases, you will develop an interesting sociological puzzle, get/collect the appropriate raw data, and then analyze the data to explore your puzzle. You may work with a partner, but you can't have the same partner. Depending on the flow of the course and student interest, we might end up doing something else for one or both of these projects.

I've put together a list of online Python tutorials that are accessible to social scientists. You might find some helpful when looking for additional information about a topic.

Wednesday 8/20 - Introductions

Monday, 8/25 - The Basics

Wednesday, 8/27 - More Python

Wednesday, 9/3 - Even more Python

  • Codecademy. A few more sections: Lists and Functions (Lists and Functions and Battleship!); Loops (Loops and Practice makes Perfect); Exam Statistics; and Advanced Topics in Python (feel free to skip the Lambdas section).
  • H - Email me your "Great job finishing" certificate from codecademy.

Monday, 9/8 - Getting Data when they want to give it to you

  • Codecademy. Take the Placekitten API course. Then, pick another API class that uses Python, such as NPR, Sunlight Foundation, or NHTSA. Take that one.
  • H - Email me your "Great job finishing" certificate from codecademy.

Wednesday, 9/10 - More on APIs

  • Sushi Bars and Yelp. Sign up to be a developer on Yelp.
  • H - Bring an IPython notebook to class that lists the coffee bars in your home town. Feel free to copy and paste from me. Note: You might need to install oauth2.

Monday, 9/15 - Twitter API

  • Mining Twitter.
  • Get your own consumer key, consumer secret, access token, and access token secret from Twitter.
  • H - Bring an IPython notebook to class that does something cool with tweepy.

Wednesday, 9/17 - Getting Data when they may not want to give it to you

Monday, 9/22 - More web scraping

Wednesday, 9/24 - Counting words

Monday, 9/29 - Sentiment Analysis

Wednesday, 10/1 - Classification

Monday, 10/6 - Feature Selection

Wednesday, 10/8 - Machine Learning Algorithms

Monday, 10/13 - Review

Wednesday, 10/15 - Review

Monday, 10/20 - Text clustering

Wednesday, 10/22 - Topic Models in Mallet

Monday, 10/27 - Topic Models in Python

Wednesday, 10/29 - Project I - Analzying Political Discourse

Monday, 11/3

Wednesday, 11/5

Monday, 11/10

Wednesday, 11/12

  • Student presentations

Monday, 11/17 - Project II - Realtime Sociology

Wednesday, 11/19

Monday, 11/24

Monday, 12/1

  • TBA

Wednesday, 12/3

  • Student presentations

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages