Skip to content

lamthuyvo/cuny-advanced-data-journalism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 

Repository files navigation

Data Journalism - JOUR76006

  • CRN: JOUR76006
  • Credits: 3 credits
  • Semester: Fall 2021
  • Duration: 14 weeks
  • Instructors: Lam Thuy Vo

Communications channels and office hours:

  • Google Classroom: You will receive feedback, assignments and other information at your Google Classroom
  • Email for specific individual questions (lam.vo@journalism.cuny.edu)
  • Office hours by appointment: request via email.

Course Description

Data sets are everywhere. In public sources, like election results, budgets and census reports; semi-public and private datasets, like hidden company information; in cross referencing people and organizations in documents and databases to discover conflicts of interest; in social media updates, images and video uploads. Data has become an invaluable resource for journalists to expose stories buried in the numbers and find relevant facts to shape them in newsworthy ways to produce great stories. And today, no matter if your goal is to cover a daily beat or to do enterprise or investigative stories, you are expected to be able to use it.

In this course, you will build the skills you need to do data journalism:

  • Data journalism history and principles.
  • How to find and acquire data using automated means (scraping!), as well as how to negotiate access to data with officials by using FOIA/FOIL.
  • Work with common data formats and different types of data, as well as to understand what sort of data are in rows and columns.
  • Discover how to spot errors, deal with missing values and messy data.
  • How to clean data, normalize it, analyze it and test your results using basic math, statistics and data journalism tools
  • To mix data skills with on-the ground reporting to be able to discover newsworthy stories in data and answer questions to do accountability journalism that serves the public interest.


Most importantly, we want to focus on getting you the skills you need to find stories in data and be able to come to your editor with data-driven pitches.

How the class works

This is a hands-on course. Each lesson will focus on one or two of our expected outcomes, moving sequentially through the course. Lessons will include:

  • Lectures, discussions and updates on your reporting
  • Lab time to work and practice with real datasets and use computer-assisted tools and basic programming (Google Sheets, Open Refine, command line, Python with Jupyter Notebooks) to obtain, clean, normalize, analyze and bulletproof data.

You will be expected to conduct the following work outside of the classroom:

  • Homework exercises
  • Readings
  • 2 assignments

All the course materials will be shared by the instructors with the students on a Github and on Google Classroom.

Tools

Some software is already installed on your laptop. Others, you will have to install on your own. The instructors will do an “install party” for this.

  • Google Sheets
  • Google Drive
  • Text editor Atom or Sublime Text
  • Command line Terminal (already in your laptop)
  • Python: includes Jupyter Notebooks, data analysis packages, package management tools, and environment manager to create virtual environments.

Objective and outcomes

The objective of this course is to train students in the fundamental skills to do data journalism and be ready to continue their training in the Advanced Data Journalism course.

At the end of this course you will be expected to be able to:

  • Understand the principles and process of doing data journalism.
  • Do online and offline research to obtain documents and data.
  • Understand and use public record laws to negotiate access to data.
  • Know the characteristics of different file formats and types of data
  • Check quality of data to identify errors, missing values and how to solve this issues.
  • Use basic math and descriptive and inferential statistics for data analysis.
  • Organize, explore, clean and do accurate solid analysis of different types of data by using the tools of data journalism (Google Sheets, Open Refine, command line, Python with packages and Jupyter Notebooks ).
  • Ask interesting and answerable questions of data.
  • Maintain data integrity and use best practices in data journalism for reproducibility methods.
  • Combine data work with fact checking, interviewing sources and on-the ground reporting to produce quality journalism.
  • Evaluate professional data stories (what makes a particular project successful or not?).

Class Schedule

Schedule is subject to be changed by instructors, depending on how well you are progressing. Any modifications will be announced by the instructors. The basic structure of each Lesson follows this rough outline:

  1. Discussion or announcements (15 mins)
  2. Lecture (1 hour)
  3. Break (15 mins)
  4. Hands-on exercise and lab time (1 hour 30 mins)

1.1 — Week 1

What’s this course about: Internet data, technology and society

Introduction and syllabus overview | What is data journalism according to you, find 4 definitions

Lecture: Introduction to the semester | Fundamentals of web languages

Hands-on: Installation guide!

At home: Homework 1: What is data journalism according to you, find 4 definitions


Data Mining

1.2 — Week 2

Introduction to Python and the command line interface

Lecture: Python in Journalism The command-line tool, Python and the interactive shell

**Hands-on: Introduction to Python basics **

Reference: Chapter 1: The Programming Languages You’ll Need to Know

At home: Homework 2: Python Exercise

1.3 — Week 3

Jupyter Notebooks — how to set them up (also: APIs and data structures)

Lecture: Investigating the social web

Hands-on: Your first Jupyter Notebook

Reference: Chapter 2: Where to Get Your Data

At home: Homework 3: Jupyter Notebook Exercise

1.4 — Week 4

Data gathering via APIs (structuring data, loops, csv.writer)

Hands-on: API Call / API scraping (2 hours, one hour explanation, one hour writing)

Reference: Chapter 3: Getting Data with Code

At home: Homework 4: Download your Facebook archive, Assignment 1 pitch is due!

Read at home: https://www.buzzfeednews.com/article/lamvo/facebook-filter-bubbles-liberal-daughter-conservative-mom

1.5— Week 5

Introduction to scraping — scraping your Facebook archive

Lecture: Social data a primer — quantified selfies

Hands-on: Scraping a local directory

Reference: Chapter 4: Scraping Your Own Facebook Data

At home: Read at home:https://gizmodo.com/people-you-may-know-a-controversial-facebook-features-1827981959

1.6 — Week 6

Scraping a Live Site

Lecture: Ethics of Scraping Live Websites

Hands-on: Scraping Wikipedia

Reference: Chapter 5: Scraping a Live Site

At home: Homework 5: Scraping review

1.7 — Week 7

FOIA for data, algorithms, schemata and other weird Internet things

Lecture: What is FOIA

Hands-on: Write a FOIA request to the FTC

At home: Homework 6: Log your FOIA request, Assignment 1 final story is due!

Read: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing


Data Analysis

2.1 — Week 8

**Introduction to data exploration with Jupyter Notebooks and Pandas **

Lecture: The Art of Debugging

Hands-on: Jupyter Notebooks (What’s my data edition) — data exploration

At home: Homework 7: Pandas

2.2 — Week 9

**Why people suck - data problems and how to deal with them **

Lecture: Why people suck — Data problems other people created for you and how you can solve them

Hands-on: Data Cleaning

At home: Homework 8: Cleaning data

2.3 — Week 10

Data aggregation — basic math in pandas (also, functions!)

Lecture: Finding stories in data

Hands-on: Pandas — filtering, sorting, basic math, frequencies and distribution (2 hours)

Reference: Chapter 9: Finding Trends in Reddit

At home: Assignment 2 pitch due!

2.4 — Week 11

Resampling data over time (Pandas and Matplotlib)

Lecture: Understanding data with visuals

Hands-on: Resampling data and plotting it with pandas and Matplotlib

Reference: Chapter 10: Measuring the Twitter Activity of Political Actors

At home: Read at home: https://www.thenation.com/article/sweetgreen-appropriation-rap-segregation/

2.5— Week 12

Merging with Pandas

Lecture: When two become one - the power of merging data sets

Hands-on: Merging with Pandas

2.6 — Week 13

The future

Lecture: The New Frontiers

Hands-on: Spacy text analysis

At home:

Final class! Week 14

Present your stories!

  • Presentations
  • Class evaluations

Class rules

Students-Instructors contract: The success of this course depends on the level of commitment of each student. That is, it is up to each student to carry out their class work and assignments as well as to contribute to their team’s reporting project and speak up about any doubts or concerns you may have. In return, the instructors will do their best to provide a clear lesson plan, give students timely feedback and advice them to achieve the course’s expected outcomes.

Attendance and punctuality: We meet 15 times during the spring semester. You must attend every class and be on time. If you’re sick or you have an emergency, let us know via Slack or text. If you don’t show up, you will hear from us. More than one unexcused absence will penalize your overall grade by 5%. Similarly, two tardiness equals an absence.

Deadlines matter. This is of vital importance, not only for the class but also in the professional career of any journalist, because deadlines are sacred. Please carefully note these rule: there will be a 10 percent deduction of your assignment grade for every 24 hours that passes after a deadline in which you have not turned it in. No exceptions. Except medical emergency or family emergency. Make-up work will not be offered except in extenuating circumstances.

Communicate. If you have a problem or if you have difficulties, tell us right away, not after is too late. In journalism that’s what we do. When we have a problem we immediately tell our editor.

Be accurate and use language correctly. The value of journalistic work depends on credibility. That is why class assignments must have a rigorous verification of the data and information presented. That is the basis of the profession. A story with erroneous information can carry out an F grade. We expect the language to be used correctly. Follow the AP Style guidelines. Aim for clarity, precision and correct spelling and grammar.

Keep up with the news. Consuming information on a daily basis leads to a healthy diet of background, helps you connect the dots and discover story ideas to work on. If you care about a topic or your subject concentration you have to stay in top of the game.

Be a pro. Honesty, courtesy, curiosity and professionalism are the core values of a journalist. Behave like one because you are a journalist. When classmates are presenting or we have guests or we are working in teams don’t multitask, focus.

Participation. It is important to maintain an attitude of openness. Class time is reserved for learning and discussing the topics of each session. It is not the time for personal calls, text messages, emails and social networks.

Diversity and inclusion. It's critical that students learn to include a diverse set of voices in their stories, something that is often glossed over when finding stories in spreadsheets and online sources. You are encouraged and expected to look for stories about and voices from communities that are underrepresented. This also applies to our classroom. It requires us all to discuss differences with respect and empathy, regarding race, gender, age, religion, sexual preference, disability, language, origin or political beliefs.

Code of honor. This class follows the guidelines of the Student Handbook of our school. More so, in journalism plagiarism or falsification of data, sources and facts are serious crimes that can lead to failing this class. You may also be the subject of suspension, probation or expulsion, pending the decision of the School administration.

Assignments and Due Dates

2 story assignments (1 pitch and one final story per assignment): 80 points

For pitches:

submit pitch should be 1 page long, 2 pages long max.

 and including the following things:

  • Byline
  • What is your story about? Tell us in 1 headline and 1 lead paragraph. This should include answers to the following questions:
    • Why this story is relevant ("So what?) and why now?
    • What is the single question your story tries to answer?
    • Why will this story resonate with your audience?
    • What else has been done on this topic? (Provide links) and how is your angle different or fresh?
  • Show us your data work! Give us access to a Google Sheet or a Jupyter Notebook!
  • Write up at least one or up to three findings from your analysis based on the dataset that was given
  • Maximum/minimum.
    • What is the maximum (best) story possible?
    • What's the minimum (fallback) story if your hypothesis doesn't prove out?

For final stories:

  • Headline
  • Dek
  • Byline
  • Text
  • The data work
    • Your code (scrapers)
    • Google Sheet or a Jupyter Notebook

For extra credit you can submit the following (3 points for each):

  • Data visualization
  • A compelling character
  • Methodology

Class participation, readings and homework: 20 points

We expect you to participate in the class and pay respect to each other. This means partaking in discussions, in the hands-on drills and other class activities as well as completing homework.

The 8 homework assignments are individual drills to evaluate your understanding of the material taught in classes.

  • Total: 100 points.

How to file assignments

Each student will create a personal Drive homework foldername-lastname-homework to save his/her homework. Each team will create a team folderteam-lastname-lastname-lastname to file teamwork. After you create your personal and team folder, share them with your instructors with editing permission.

All assignments are filed by Friday at 11:59 PM, by submitting the work via Google Classroom.

Breakdown of due dates

Details for each homework assignment will be posted in the shared Google Classroom. Details of the requirements for team reporting assignments are already posted inGoogle Classroom and will be explained in the first class.

Due date (11:59PM) Assignment or homework that is due
Week 1: August 30 Homework 1: What is data journalism according to you, find 4 definitions
Week 2: September 6 Homework 2: Python Exercise
Week 3: September 13 Homework 3: Jupyter Notebook Exercise
Week 4: September 20 and September 22 Homework 4: Download your Facebook archive (September 20 )

Assignment 1 pitch is due! (September 22)

Week 5: September 27
Week 6: October 4 Homework 5: Scraping review
Week 7: October 25 and October 27 Homework 6: Log your FOIA request (October 25)

Assignment 1 final story is due! (October 27)

Week 8: November 1 Homework 7: Pandas
Week 9: November 8 Homework 8: Cleaning data
Week 10: November 15 Assignment 2 pitch due!
Week 11: November 22
Week 12: November 29
Week 13: December 6 Assignment 2 final story due!
Week 14: December 11 Presentations

Grading

Rubric

The story pitches makes up 80% of your grade, each 20%. Graded as:

2 Assignment: 80 points altogether, 40 points each

Each pitch

20 pts. On time + meets all project criteria + original reporting + effective use of data

17 pts. On time + meets most if not all of project criteria + acceptable reporting + acceptable use of data

13 pts. On time + meets very little of project criteria + somewhat acceptable reporting + somewhat acceptable use of data

10 pts. Late and/or meets little of project criteria + weak reporting + weak use of data

7 pts. Late and/or does not mean project criteria + very weak reporting + very weak use of data

3 pts. Late and/or shows little to no effort

0 pts. Not submitted within 1 week of deadline

8 Homeworks: 16 points

Each homework

2 pts. Completed homework.

1 pts. Completed partially. If this happens, your instructor will leave you a short comment to help you complete the exercise. If you do it, you’ll get full points.

0 pts. Not submitted

Class participation and readings discussion. 4 points.

4 pts. Amazing participation, asks questions, comments readings, shares ideas, works well with others

3 pts. Most of the time participates, asks questions, shares ideas, works well with others.

2 pts. Could do better

1 pts. Not engaged most of the times

0 pts. Not engaged at all

Total points: 100 = 100%

Scale

Final course grades, according to the grading scale used in the CUNY Graduate School of Journalism:

  • 97: A+ Stellar work. Ready to be published by a professional news organization with minimal changes.
  • 93: A Excellent work. It is ready to be published professionally with some changes.
  • 90: A- Good quality work, although it needs a slightly more significant revision to be able to be published.
  • 87: B+ Solid work that shows some deficiencies that need to be solved.
  • 83: B Meets certain requirements, but lacks several important elements.
  • 80: B- Below average and needs strong overall improvements.
  • 77: C+ Poor job. It presents many problems of structure, reporting and storytelling
  • 73: C Almost unacceptable because of major overall problems.
  • 70: C- Unacceptable. Does not meet the minimum requirements of a graduate level journalism project.
  • Anything below a 70 is an F. Work has failed at every level. There are no D in CUNY’s grading scale.

Guides and tipsheets

Further reading

Your class readings will be provided in class. Find some more recommended books here:

  • “Numbers in the Newsroom: Using math and statistics in News”, 2nd Edition. By Sarah Cohen.
  • “The investigative reporter's handbook: a guide to documents, databases, and techniques”. 4th Edition. Edited by Brant Houston et al.
  • “Computer-Assisted Reporting: A practical guide”, 4th Edition. By Brant Houston.
  • “Precision Journalism: a Reporter’s Introduction to Social Science Methods”, 4th Edition. By Philip Meyer.
  • “The Functional Art: An introduction to information graphics and visualization”. By Alberto Cairo.
  • “The Curious Journalist Guide to Data” (online). By Jonathan Stray.
  • “Storytelling with Data”. By Cole Nussbaumer Knaflic
  • “Computer-Assisted Research: Information Strategies and Tools for Journalists”. By Nora Paul and Kathleen A. Hansen
  • “Mapping for Stories: A Computer-Assisted Reporting Guide”. By Jennifer LaFleur and Andy Lehren
  • “The Visual Display of Quantitative Information”. By Edward R. Tufte
  • “Data Points: Visualization That Means Something”. By Nathan Yau
  • “Design for Information”. By Isabel Meirelles

Instructors will also share tip sheets, stories and tutorials for specific lessons.

Coaches

You'll find all the coaches here.

Most relevant to our class:

Name Coaching areas Hours Office Location Email
Kirsti Itameri Interactive Journalism: Design, WordPress, Illustrator, Photoshop, Social Media Tuesdays 6:30-8:30 pm or by appointment Newsroom kirsti.itameri@journalism.cuny.edu
TC McCarthy Interactive Journalism: Coding Thursday 6-8 pm Newsroom tc.mccarthy@journalism.cuny.edu
Malik Singleton Interactive Journalism: Data Storytelling, WordPress, HTML, CSS Mondays 5:30-7:30 pm Newsroom malik.singleton@journalism.cuny.edu
Nicholas Wells Interactive Journalism: Data Storytelling, HTML, CSS, R Tuesdays 6:00 - 8:30 pm Newsroom Nicholasbwells@gmail.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published