Reporting-Data-Using-Python

Communicating data using Python in a Jupyter notebook

Reporting and Communicating Data Using Python

A lot of people think the work of a Data Scientist or someone working in a data role is to be able to spin up a Apache Hadoop cluster and using Apache Spark to spend 12 hours fitting an XGBoost model. This may be true some of the time, but really 80% of time is spent preparing data.

Neither of these are topics for today. I want to talk about communicating data. All the effort put towards working with data is wasted if nobody ever uses it. If decisions are to be made with data, it first needs to be communicated!

This will focus on Descriptive Statistics, something happened and we want to understand it, but the same ideas hold true for Predictive Statistics or Prescriptive Statistics.

These same concepts are at play regardless of platform being used, be it Python, R, Scala, Excel, JMP, Minitab, SPSS, etc.

To discuss:

Always plot your data
Reporting confidence intervals and p-values
Some neat jupyter tricks (interactivity, HTML, hide code)

Other things to consider:

Consistency across roles, multiple Data Scientists and Analysts should use similar (if not the same) methods, which should also be available to other technical persons who also need to work with data
Know your audience, what makes sense for communicating with other Data Persons, Technical Persons, Business Persons, and the general public aren't always the same thing
Label and scale your axes, one of the "best" ways to lie to people with data is to misrepresent it with unlabeled axes or messing with the scale of the axis
Colorblindness is fairly common, keep this in mind when selecting colors

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
ReportingDataUsingPython.ipynb		ReportingDataUsingPython.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md