Skip to content

Latest commit

 

History

History
48 lines (42 loc) · 1.24 KB

course_outline.md

File metadata and controls

48 lines (42 loc) · 1.24 KB

Course Outline

1. Basics of Python and Text Analysis (~ 2 weeks)

  • First steps with Python
  • Installing and getting started with the toolset
  • First steps with text analysis
    • Simple explorations of word use
    • Sentiment analysis
    • Basic Tagging
    • Counting Words
    • Representations of co-occurrence
    • Collocations and n-grams

2. Text Classification (~ 2 weeks)

  • Classification Methods
    • Decision Trees
    • Naive Bayes
    • Support Vector Machines
    • Neural Networks
  • Testing frameworks
  • Look at some tools with Graphical User Interfaces

3. Negotiating the problems of real data (~1.5 weeks)

  • Assembling raw data
    • From the web: web pages, twitter, etc.
    • From PDFs
  • Cleaning
  • Normalizing
    • Tokenizing
    • Stemming
    • Lemmatizing
    • Regular Expressions

3. Vector space methods (~1.5 weeks)

  • About bag-of-words approaches
  • Converting text to a vector
  • Various representations of vectors
  • Use of vector representations to measure text similarity
  • Distributed vector representations (word embeddings)

4. Unsupervised methods (~1.5 Weeks)

  • Text Clustering
  • Topic Modeling

5. Toward New Conceptualizations of Methods in LS (1 Week)

  • A week of readings

Other Topics

  • Networks