Skip to content

Latest commit

 

History

History
14 lines (13 loc) · 1.35 KB

week_3_tasks.md

File metadata and controls

14 lines (13 loc) · 1.35 KB

Week 3 Tasks

  1. Grab the new notebooks for this week. They are numbered W3-1 through W3-5.
    • (Follow the instructions from last week)
  2. Grab the new data corpora for this week
    • If you're using Tactic, you can get them from the Collections tab of the repository.
    • If you're not using Tactic, get them from this repository and put them in your corpora folder.
  3. Look at our analyses of the Titanic corpus in notebooks 3-1 and 3-2. Try to improve on them.
    • This dataset has been the focus of a Kaggle challenge. If you poke around on the internet you might be able to find some suggestions. (I haven't poked around myself so I'm not sure.)
  4. Try the tasks in notebook 3-5 which use a new non-text corpus.
  5. See if you can improve on our analysis of the spam (text) corpus in notebook 3-4. (If we get to this in class.)
  6. The nltk book has a chapter on classifying text here. I think it's worth taking a read through. But you'll see that it uses, as its first example, the "gender classification" of names. You can decide what you think about this. (What I think: If we treat it as an attempt to understand what people are responding to when they see a name as male or female, then that makes this an appropriate and interesting endeavor. But we should tread more sensitively than the authors of this chapter.)