Skip to content

zufanka/DataAnalysisPython_DataHarvest2020

Repository files navigation

Data analysis with Python @ EIJC2020

Binder

Image-intro

About this repo

This Github repository was created for the Data Analysis with Python sessions (1, 2, 3) during the DataHarvest+ European Investigative Journalism Conference. During three session of about an hour, we took participants through the process of retrieving, cleaning, analysing, transforming, combining and visualising a dataset in the Jupyter environment. These were a showcase sessions, but you could follow along in your own time, by downloading Anaconda and our notebooks which are all stored in this repo.

About the trainers

We are two Python programming data journalists, named Adriana and Winny. While Winny works at the Dutch public news broadcast; Adriana is for hire as a freelance data journalist. Hire the woman already!

Used in the training

  • obviously Python3
  • libraries needed: pandas, altair, requests, beautifulsoup4
  • data from Eurostat

Words of encouragement


To be the noun, you have to do the verb. Wanne be a painter? Paint. Wanna be a programming data journalist? Program to make data journalistic stories... Note how there are no conditions like quality or duration. If you want to become a data-driven journalist who uses Python programming to analyse data and find stories, all you have to do is start.
While that is true, be kind to yourself and start small. Start small! (Remember that a good story doesn't necessarily need a big dataset, it needs a big insight.) Also, the problem of good taste is very real, and you either overcome it or fail. Radio- and podcast-maker Ira Glass said it first...
And finally, remember that most people are median well doing most things. That's how the median of every skill becomes the median, right? Watch Jason Kaplan-Moss talk about this concept.

Resources


Book Python for Everybody
Why? Friendly and free introduction to Python (also comes with a Coursera course with the same name)
https://www.py4e.com/book

Basic introduction to Python
Why? Getting basics like the syntax right before diving in the deep end is a smart thing to do
https://www.datacamp.com/courses/intro-to-python-for-data-science

Online Python course aimed at journalists
Why? especially aimed at journalists, and Winny made it. ;)
https://datajournalism.com/watch/python-for-journalists

Article on Altair and its upsides
http://fernandoi.cl/blog/posts/altair/

Handy snippets of Python code
Why? Adriana collected all code snippets she looks up in one handy place. Private in the sense that this list solely exists for her own personal use, but now that you're here you might want to take a peek on this data analysis cheatsheet too.
https://gist.github.com/zufanka/39b8a55d707b3b4a2a4d369694739561

Annotated Python code
Why? Learning to code starts by understanding code written by someone else, and editing it for your own purposes. Just like learning any other language. This repo contains commented Jupyter Notebooks to help you ease into Python programming for journalism.
https://github.com/winnydejong/next/

The second Data Journalism Handbook
Why? While the first handbook zooms in on why data journalism is important; the second promises to be even more interesting explaining how to actually go about data journalism. Partly published already, expected to be online in full this fall.
https://datajournalism.com/read/handbook/two

About Altair vs Matplotlib

  • mathplotlib is a classic; you should probably learn a little bit of this too;
  • mathplotlib needs less code to create a chart;
  • altair has a clearer syntax;
  • altair is better at interactivity;
  • altair charts can be adjusted in many many ways (also: more code ;) )
    There are other libraries you can use to make charts: ggpy, bokeh, seaborn, plotly. All of them have slightly different advances. When you're beginning, simply pick one and learn it. :)

Binder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published