- CRN: JOUR76006
- Credits: 3 credits
- Semester: Fall 2021
- Duration: 14 weeks
- Instructors: Lam Thuy Vo
Communications channels and office hours:
- Google Classroom: You will receive feedback, assignments and other information at your Google Classroom
- Email for specific individual questions (lam.vo@journalism.cuny.edu)
- Office hours by appointment: request via email.
Data sets are everywhere. In public sources, like election results, budgets and census reports; semi-public and private datasets, like hidden company information; in cross referencing people and organizations in documents and databases to discover conflicts of interest; in social media updates, images and video uploads. Data has become an invaluable resource for journalists to expose stories buried in the numbers and find relevant facts to shape them in newsworthy ways to produce great stories. And today, no matter if your goal is to cover a daily beat or to do enterprise or investigative stories, you are expected to be able to use it.
In this course, you will build the skills you need to do data journalism:
- Data journalism history and principles.
- How to find and acquire data using automated means (scraping!), as well as how to negotiate access to data with officials by using FOIA/FOIL.
- Work with common data formats and different types of data, as well as to understand what sort of data are in rows and columns.
- Discover how to spot errors, deal with missing values and messy data.
- How to clean data, normalize it, analyze it and test your results using basic math, statistics and data journalism tools
- To mix data skills with on-the ground reporting to be able to discover newsworthy stories in data and answer questions to do accountability journalism that serves the public interest.
Most importantly, we want to focus on getting you the skills you need to find stories in data and be able to come to your editor with data-driven pitches.
This is a hands-on course. Each lesson will focus on one or two of our expected outcomes, moving sequentially through the course. Lessons will include:
- Lectures, discussions and updates on your reporting
- Lab time to work and practice with real datasets and use computer-assisted tools and basic programming (Google Sheets, Open Refine, command line, Python with Jupyter Notebooks) to obtain, clean, normalize, analyze and bulletproof data.
You will be expected to conduct the following work outside of the classroom:
- Homework exercises
- Readings
- 2 assignments
All the course materials will be shared by the instructors with the students on a Github and on Google Classroom.
Some software is already installed on your laptop. Others, you will have to install on your own. The instructors will do an “install party” for this.
- Google Sheets
- Google Drive
- Text editor Atom or Sublime Text
- Command line Terminal (already in your laptop)
- Python: includes Jupyter Notebooks, data analysis packages, package management tools, and environment manager to create virtual environments.
The objective of this course is to train students in the fundamental skills to do data journalism and be ready to continue their training in the Advanced Data Journalism course.
At the end of this course you will be expected to be able to:
- Understand the principles and process of doing data journalism.
- Do online and offline research to obtain documents and data.
- Understand and use public record laws to negotiate access to data.
- Know the characteristics of different file formats and types of data
- Check quality of data to identify errors, missing values and how to solve this issues.
- Use basic math and descriptive and inferential statistics for data analysis.
- Organize, explore, clean and do accurate solid analysis of different types of data by using the tools of data journalism (Google Sheets, Open Refine, command line, Python with packages and Jupyter Notebooks ).
- Ask interesting and answerable questions of data.
- Maintain data integrity and use best practices in data journalism for reproducibility methods.
- Combine data work with fact checking, interviewing sources and on-the ground reporting to produce quality journalism.
- Evaluate professional data stories (what makes a particular project successful or not?).
Schedule is subject to be changed by instructors, depending on how well you are progressing. Any modifications will be announced by the instructors. The basic structure of each Lesson follows this rough outline:
- Discussion or announcements (15 mins)
- Lecture (1 hour)
- Break (15 mins)
- Hands-on exercise and lab time (1 hour 30 mins)
What’s this course about: Internet data, technology and society
Introduction and syllabus overview | What is data journalism according to you, find 4 definitions
Lecture: Introduction to the semester | Fundamentals of web languages
Hands-on: Installation guide!
At home: Homework 1: What is data journalism according to you, find 4 definitions
Introduction to Python and the command line interface
Lecture: Python in Journalism The command-line tool, Python and the interactive shell
**Hands-on: Introduction to Python basics **
Reference: Chapter 1: The Programming Languages You’ll Need to Know
At home: Homework 2: Python Exercise
Jupyter Notebooks — how to set them up (also: APIs and data structures)
Lecture: Investigating the social web
Hands-on: Your first Jupyter Notebook
Reference: Chapter 2: Where to Get Your Data
At home: Homework 3: Jupyter Notebook Exercise
Data gathering via APIs (structuring data, loops, csv.writer)
Hands-on: API Call / API scraping (2 hours, one hour explanation, one hour writing)
Reference: Chapter 3: Getting Data with Code
At home: Homework 4: Download your Facebook archive, Assignment 1 pitch is due!
Read at home: https://www.buzzfeednews.com/article/lamvo/facebook-filter-bubbles-liberal-daughter-conservative-mom
Introduction to scraping — scraping your Facebook archive
Lecture: Social data a primer — quantified selfies
Hands-on: Scraping a local directory
Reference: Chapter 4: Scraping Your Own Facebook Data
At home: Read at home:https://gizmodo.com/people-you-may-know-a-controversial-facebook-features-1827981959
Scraping a Live Site
Lecture: Ethics of Scraping Live Websites
Hands-on: Scraping Wikipedia
Reference: Chapter 5: Scraping a Live Site
At home: Homework 5: Scraping review
FOIA for data, algorithms, schemata and other weird Internet things
Lecture: What is FOIA
Hands-on: Write a FOIA request to the FTC
At home: Homework 6: Log your FOIA request, Assignment 1 final story is due!
Read: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
**Introduction to data exploration with Jupyter Notebooks and Pandas **
Lecture: The Art of Debugging
Hands-on: Jupyter Notebooks (What’s my data edition) — data exploration
At home: Homework 7: Pandas
**Why people suck - data problems and how to deal with them **
Lecture: Why people suck — Data problems other people created for you and how you can solve them
Hands-on: Data Cleaning
At home: Homework 8: Cleaning data
2.3 — Week 10
Data aggregation — basic math in pandas (also, functions!)
Lecture: Finding stories in data
Hands-on: Pandas — filtering, sorting, basic math, frequencies and distribution (2 hours)
Reference: Chapter 9: Finding Trends in Reddit
At home: Assignment 2 pitch due!
Resampling data over time (Pandas and Matplotlib)
Lecture: Understanding data with visuals
Hands-on: Resampling data and plotting it with pandas and Matplotlib
Reference: Chapter 10: Measuring the Twitter Activity of Political Actors
At home: Read at home: https://www.thenation.com/article/sweetgreen-appropriation-rap-segregation/
Merging with Pandas
Lecture: When two become one - the power of merging data sets
Hands-on: Merging with Pandas
The future
Lecture: The New Frontiers
Hands-on: Spacy text analysis
At home:
Present your stories!
- Presentations
- Class evaluations
Students-Instructors contract: The success of this course depends on the level of commitment of each student. That is, it is up to each student to carry out their class work and assignments as well as to contribute to their team’s reporting project and speak up about any doubts or concerns you may have. In return, the instructors will do their best to provide a clear lesson plan, give students timely feedback and advice them to achieve the course’s expected outcomes.
Attendance and punctuality: We meet 15 times during the spring semester. You must attend every class and be on time. If you’re sick or you have an emergency, let us know via Slack or text. If you don’t show up, you will hear from us. More than one unexcused absence will penalize your overall grade by 5%. Similarly, two tardiness equals an absence.
Deadlines matter. This is of vital importance, not only for the class but also in the professional career of any journalist, because deadlines are sacred. Please carefully note these rule: there will be a 10 percent deduction of your assignment grade for every 24 hours that passes after a deadline in which you have not turned it in. No exceptions. Except medical emergency or family emergency. Make-up work will not be offered except in extenuating circumstances.
Communicate. If you have a problem or if you have difficulties, tell us right away, not after is too late. In journalism that’s what we do. When we have a problem we immediately tell our editor.
Be accurate and use language correctly. The value of journalistic work depends on credibility. That is why class assignments must have a rigorous verification of the data and information presented. That is the basis of the profession. A story with erroneous information can carry out an F grade. We expect the language to be used correctly. Follow the AP Style guidelines. Aim for clarity, precision and correct spelling and grammar.
Keep up with the news. Consuming information on a daily basis leads to a healthy diet of background, helps you connect the dots and discover story ideas to work on. If you care about a topic or your subject concentration you have to stay in top of the game.
Be a pro. Honesty, courtesy, curiosity and professionalism are the core values of a journalist. Behave like one because you are a journalist. When classmates are presenting or we have guests or we are working in teams don’t multitask, focus.
Participation. It is important to maintain an attitude of openness. Class time is reserved for learning and discussing the topics of each session. It is not the time for personal calls, text messages, emails and social networks.
Diversity and inclusion. It's critical that students learn to include a diverse set of voices in their stories, something that is often glossed over when finding stories in spreadsheets and online sources. You are encouraged and expected to look for stories about and voices from communities that are underrepresented. This also applies to our classroom. It requires us all to discuss differences with respect and empathy, regarding race, gender, age, religion, sexual preference, disability, language, origin or political beliefs.
Code of honor. This class follows the guidelines of the Student Handbook of our school. More so, in journalism plagiarism or falsification of data, sources and facts are serious crimes that can lead to failing this class. You may also be the subject of suspension, probation or expulsion, pending the decision of the School administration.
submit pitch should be 1 page long, 2 pages long max. and including the following things:
- Byline
- What is your story about? Tell us in 1 headline and 1 lead paragraph. This should include answers to the following questions:
- Why this story is relevant ("So what?) and why now?
- What is the single question your story tries to answer?
- Why will this story resonate with your audience?
- What else has been done on this topic? (Provide links) and how is your angle different or fresh?
- Show us your data work! Give us access to a Google Sheet or a Jupyter Notebook!
- Write up at least one or up to three findings from your analysis based on the dataset that was given
- Maximum/minimum.
- What is the maximum (best) story possible?
- What's the minimum (fallback) story if your hypothesis doesn't prove out?
- Headline
- Dek
- Byline
- Text
- The data work
- Your code (scrapers)
- Google Sheet or a Jupyter Notebook
For extra credit you can submit the following (3 points for each):
- Data visualization
- A compelling character
- Methodology
Class participation, readings and homework: 20 points
We expect you to participate in the class and pay respect to each other. This means partaking in discussions, in the hands-on drills and other class activities as well as completing homework.
The 8 homework assignments are individual drills to evaluate your understanding of the material taught in classes.
- Total: 100 points.
How to file assignments
Each student will create a personal Drive homework foldername-lastname-homework to save his/her homework. Each team will create a team folderteam-lastname-lastname-lastname to file teamwork. After you create your personal and team folder, share them with your instructors with editing permission.
All assignments are filed by Friday at 11:59 PM, by submitting the work via Google Classroom.
Breakdown of due dates
Details for each homework assignment will be posted in the shared Google Classroom. Details of the requirements for team reporting assignments are already posted inGoogle Classroom and will be explained in the first class.
Due date (11:59PM) | Assignment or homework that is due |
Week 1: August 30 | Homework 1: What is data journalism according to you, find 4 definitions |
Week 2: September 6 | Homework 2: Python Exercise |
Week 3: September 13 | Homework 3: Jupyter Notebook Exercise |
Week 4: September 20 and September 22 | Homework 4: Download your Facebook archive (September 20 )
Assignment 1 pitch is due! (September 22) |
Week 5: September 27 | |
Week 6: October 4 | Homework 5: Scraping review |
Week 7: October 25 and October 27 | Homework 6: Log your FOIA request (October 25)
Assignment 1 final story is due! (October 27) |
Week 8: November 1 | Homework 7: Pandas |
Week 9: November 8 | Homework 8: Cleaning data |
Week 10: November 15 | Assignment 2 pitch due! |
Week 11: November 22 | |
Week 12: November 29 | |
Week 13: December 6 | Assignment 2 final story due! |
Week 14: December 11 | Presentations |
The story pitches makes up 80% of your grade, each 20%. Graded as:
2 Assignment: 80 points altogether, 40 points each
Each pitch
20 pts. On time + meets all project criteria + original reporting + effective use of data
17 pts. On time + meets most if not all of project criteria + acceptable reporting + acceptable use of data
13 pts. On time + meets very little of project criteria + somewhat acceptable reporting + somewhat acceptable use of data
10 pts. Late and/or meets little of project criteria + weak reporting + weak use of data
7 pts. Late and/or does not mean project criteria + very weak reporting + very weak use of data
3 pts. Late and/or shows little to no effort
0 pts. Not submitted within 1 week of deadline
8 Homeworks: 16 points
Each homework
2 pts. Completed homework.
1 pts. Completed partially. If this happens, your instructor will leave you a short comment to help you complete the exercise. If you do it, you’ll get full points.
0 pts. Not submitted
Class participation and readings discussion. 4 points.
4 pts. Amazing participation, asks questions, comments readings, shares ideas, works well with others
3 pts. Most of the time participates, asks questions, shares ideas, works well with others.
2 pts. Could do better
1 pts. Not engaged most of the times
0 pts. Not engaged at all
Total points: 100 = 100%
Final course grades, according to the grading scale used in the CUNY Graduate School of Journalism:
- 97: A+ Stellar work. Ready to be published by a professional news organization with minimal changes.
- 93: A Excellent work. It is ready to be published professionally with some changes.
- 90: A- Good quality work, although it needs a slightly more significant revision to be able to be published.
- 87: B+ Solid work that shows some deficiencies that need to be solved.
- 83: B Meets certain requirements, but lacks several important elements.
- 80: B- Below average and needs strong overall improvements.
- 77: C+ Poor job. It presents many problems of structure, reporting and storytelling
- 73: C Almost unacceptable because of major overall problems.
- 70: C- Unacceptable. Does not meet the minimum requirements of a graduate level journalism project.
- Anything below a 70 is an F. Work has failed at every level. There are no D in CUNY’s grading scale.
Guides and tipsheets
- Research Guides for Reporters by The Newmark J-School Research Center
- Data-driven story resources by The Newmark J-School Research Center
- Tips for doing data stories by Miguel Paz
- The Quartz guide to bad data, by Christopher Groskopf & Quartz GitHub Contributors
- A Guide to Bulletproofing Your Data by ProPublica
- Tipsheet: Most common data formats and concepts, compiled by Miguel Paz
- Data is plural, curated list of useful data, compiled by Jeremy Singer-Vine from Buzzfeed (sign up for updates)
- The Quartz Directory of Essential Data, by Christopher Groskopf
- First Draft News verification resources
- The Verification Handbook, European Journalism Centre, edited by Craig Silverman
- Fact checking guides, Open News
- Finding Stories in Census Data, by Emily Alpert Reyes
- How to Use the Census Bureau’s American Community Survey like a Pro, by Paul Overberg
- Pushing Hot Buttons with Census.gov: Using census data to find facts in a world of speculation, by Ronald Campbell
- Understanding Households and Relationships in Census Data, by Anthony DeBarros
Your class readings will be provided in class. Find some more recommended books here:
- “Numbers in the Newsroom: Using math and statistics in News”, 2nd Edition. By Sarah Cohen.
- “The investigative reporter's handbook: a guide to documents, databases, and techniques”. 4th Edition. Edited by Brant Houston et al.
- “Computer-Assisted Reporting: A practical guide”, 4th Edition. By Brant Houston.
- “Precision Journalism: a Reporter’s Introduction to Social Science Methods”, 4th Edition. By Philip Meyer.
- “The Functional Art: An introduction to information graphics and visualization”. By Alberto Cairo.
- “The Curious Journalist Guide to Data” (online). By Jonathan Stray.
- “Storytelling with Data”. By Cole Nussbaumer Knaflic
- “Computer-Assisted Research: Information Strategies and Tools for Journalists”. By Nora Paul and Kathleen A. Hansen
- “Mapping for Stories: A Computer-Assisted Reporting Guide”. By Jennifer LaFleur and Andy Lehren
- “The Visual Display of Quantitative Information”. By Edward R. Tufte
- “Data Points: Visualization That Means Something”. By Nathan Yau
- “Design for Information”. By Isabel Meirelles
Instructors will also share tip sheets, stories and tutorials for specific lessons.
You'll find all the coaches here.
Most relevant to our class:
Name | Coaching areas | Hours | Office Location | |
Kirsti Itameri | Interactive Journalism: Design, WordPress, Illustrator, Photoshop, Social Media | Tuesdays 6:30-8:30 pm or by appointment | Newsroom | kirsti.itameri@journalism.cuny.edu |
TC McCarthy | Interactive Journalism: Coding | Thursday 6-8 pm | Newsroom | tc.mccarthy@journalism.cuny.edu |
Malik Singleton | Interactive Journalism: Data Storytelling, WordPress, HTML, CSS | Mondays 5:30-7:30 pm | Newsroom | malik.singleton@journalism.cuny.edu |
Nicholas Wells | Interactive Journalism: Data Storytelling, HTML, CSS, R | Tuesdays 6:00 - 8:30 pm | Newsroom | Nicholasbwells@gmail.com |