.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Berkeley Data Science Modules: Introduction to Data Science in Python\n",
    "<img src=\"https://data.berkeley.edu/sites/default/files/styles/openberkeley_brand_widgets_rectangle/public/john_and_ani_community_photo_dsc_7214.jpg?itok=6Hjv1irR\" style=\"width: 500px; height: 350px;\"/>\n",
    "\n",
    "\n",
    "### Table of Contents\n",
    "<a href='#section 0'>Welcome to Jupyter Notebooks!</a>\n",
    "\n",
    "1.  <a href='#section 1'>The Python Programming Language</a>\n",
    "\n",
    "    a. <a href='#subsection 1a'>Expressions</a> and <a href='#subsection error'>Errors</a>\n",
    "\n",
    "    b. <a href='#subsection 1b'>Names</a>\n",
    "\n",
    "    c. <a href='#subsection 1c'>Functions</a>\n",
    "\n",
    "    d. <a href='#subsection 1d'>Sequences</a>\n",
    "<br><br>\n",
    "2. <a href='#section 2'>Tables</a>\n",
    "\n",
    "    a. <a href='#subsection 2a'>Attributes</a>\n",
    "\n",
    "    b. <a href='#subsection 2b'>Transformations</a><br><br>\n",
    "\n",
    "3. <a href='#section 3'>Coming Soon...</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Jupyter  Notebook <a id='section 0'></a>\n",
    "\n",
    "Welcome to the Jupyter Notebook! **Notebooks** are documents that can contain text, code, visualizations, and more. \n",
    "\n",
    "A notebook is composed of rectangular sections called **cells**. There are 2 kinds of cells: markdown and code. A **markdown cell**, such as this one, contains text. A **code cell** contains code in Python, a programming language that we will be using for the remainder of this module. You can select any cell by clicking it once. After a cell is selected, you can navigate the notebook using the up and down arrow keys.\n",
    "\n",
    "To run a code cell once it's been selected, \n",
    "- press Shift-Enter, or\n",
    "- click the Run button in the toolbar at the top of the screen. \n",
    "\n",
    "If a code cell is running, you will see an asterisk (\\*) appear in the square brackets to the left of the cell. Once the cell has finished running, a number will replace the asterisk and any output from the code will appear under the cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run this cell\n",
    "print(\"Hello World!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll notice that many code cells contain lines of blue text that start with a `#`. These are *comments*. Comments often contain helpful information about what the code does or what you are supposed to do in the cell. The leading `#` tells the computer to ignore them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Editing\n",
    "\n",
    "You can edit a Markdown cell by clicking it twice. Text in Markdown cells is written in [**Markdown**](https://daringfireball.net/projects/markdown/), a formatting syntax for plain text, so you may see some funky symbols when you edit a text cell. \n",
    "\n",
    "Once you've made your changes, you can exit text editing mode by running the cell. Edit the next cell to fix the misspelling."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Go Baers!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Code cells can be edited any time after they are highlighted. Try editing the next code cell to print your name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# edit the code to print your name\n",
    "print(\"Hello: my name is NAME\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Saving and Loading\n",
    "\n",
    "Your notebook can record all of your text and code edits, as well as any graphs you generate or calculations you make. You can save the notebook in its current state by clicking Control-S, clicking the floppy disc icon in the toolbar at the top of the page, or by going to the File menu and selecting \"Save and Checkpoint\".\n",
    "\n",
    "The next time you open the notebook, it will look the same as when you last saved it.\n",
    "\n",
    "**Note:** after loading a notebook you will see all the outputs (graphs, computations, etc) from your last session, but you won't be able to use any variables you assigned or functions you defined. You can get the functions and variables back by re-running the cells where they were defined- the easiest way is to highlight the cell where you left off work, then go to the Cell menu at the top of the screen and click \"Run all above\". You can also use this menu to run all cells in the notebook by clicking \"Run all\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Completing the Notebooks\n",
    "\n",
    "As you navigate the notebooks, you'll see cells with bold, all-capitalized headings that need to be filled in to complete the notebook. There are two types:\n",
    "<div class=\"alert alert-warning\">\n",
    "<b>EXERCISE</b> cells require you to write code to solve a problem, or write short responses related to analyzing a graph or the result of a computation\n",
    "</div>\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "<b>PRACTICE</b> cells provide spaces to try out new coding skills at your own pace, unrelated to the case study. Since each coding skill taught in these notebooks is necessary for analyzing the cases, practice cells are a good way to get comfortable before applying those skills to real data.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we begin, we'll need a few extra tools to conduct our analysis. Run the next cell to load some code packages that we'll use later. \n",
    "\n",
    "Note: this cell MUST be run in order for most of the rest of the notebook to work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# dependencies: THIS CELL MUST BE RUN\n",
    "from datascience import *\n",
    "import numpy as np\n",
    "import math\n",
    "import matplotlib.pyplot as plt\n",
    "plt.style.use('fivethirtyeight')\n",
    "import ipywidgets as widgets\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Python <a id='section 1'></a>\n",
    "\n",
    "**Python** is  programming language- a way for us to communicate with the computer and give it instructions. \n",
    "\n",
    "Just like any language, Python has a *vocabulary* made up of words it can understand, and a *syntax* giving the rules for how to structure communication.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Errors <a id=\"subsection error\"></a>\n",
    "\n",
    "Python is a language, and like natural human languages, it has rules.  It differs from natural language in two important ways:\n",
    "1. The rules are *simple*.  You can learn most of them in a few weeks and gain reasonable proficiency with the language in a semester.\n",
    "2. The rules are *rigid*.  If you're proficient in a natural language, you can understand a non-proficient speaker, glossing over small mistakes.  A computer running Python code is not smart enough to do that.\n",
    "\n",
    "Whenever you write code, you will often accientally break some of these rules. When you run a code cell that doesn't follow every rule exactly, Python will produce an **error message**.\n",
    "\n",
    "Errors are *normal*; experienced programmers make many errors every day. Errors are also *not dangerous*; you will not break your computer by making an error (in fact, errors are a big part of how you learn a coding language). An error is nothing more than a message from the computer saying it doesn't understand you and asking you to rewrite your command.\n",
    "\n",
    "We have made an error in the next cell.  Run it and see what happens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"This line is missing something.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see something like this (minus our annotations):\n",
    "\n",
    "<img src=\"images/error.jpg\"/>\n",
    "\n",
    "The last line of the error output attempts to tell you what went wrong.  The *syntax* of a language is its structure, and this `SyntaxError` tells you that you have created an illegal structure.  \"`EOF`\" means \"end of file,\" so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.\n",
    "\n",
    "There's a lot of terminology in programming languages, but you don't need to know it all in order to program effectively. If you see a cryptic message like this, you can often get by without deciphering it.  (Of course, if you're frustrated, you can usually find out by searching for the error message online or posting on the Piazza.)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 1a. Data <a id='subsection 1a'></a>\n",
    "**Data** is information- the \"stuff\" we manipulate to make and test hypotheses. \n",
    "\n",
    "Almost all data you will work with broadly falls into two types: numbers and text. *Numerical data* shows up green in code cells and can be positive, negative, or include a decimal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Numerical data\n",
    "\n",
    "4\n",
    "\n",
    "87623000983\n",
    "\n",
    "-667\n",
    "\n",
    "3.14159"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Text data (also called *strings*) shows up red in code cells. Strings are enclosed in double or single quotes. Note that numbers can appear in strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Strings\n",
    "\"a\"\n",
    "\n",
    "\"Hi there!\"\n",
    "\n",
    "\"We hold these truths to be self-evident that all men are created equal.\"\n",
    "\n",
    "# this is a string, NOT numerical data\n",
    "\"3.14159\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1a. Expressions <a id='subsection 1a'></a>\n",
    "\n",
    "A bit of communication in Python is called an **expression**. It tells the computer what to do with the data we give it.\n",
    "\n",
    "Here's an example of an expression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# an expression\n",
    "14 + 20"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "When you run the cell, the computer **evaluates** the expression and prints the result. Note that only the last line in a code cell will be printed, unless you explicitly tell the computer you want to print the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# more expressions. what gets printed and what doesn't?\n",
    "100 / 10\n",
    "\n",
    "print(4.3 + 10.98)\n",
    "\n",
    "33 - 9 * (40000 + 1)\n",
    "\n",
    "884"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many basic arithmetic operations are built in to Python, like `*` (multiplication), `+` (addition), `-` (subtraction), and `/` (division). There are many others, which you can find information about [here](http://www.inferentialthinking.com/chapters/03/1/expressions.html). \n",
    "\n",
    "The computer evaluates arithmetic according to the PEMDAS order of operations (just like you probably learned in middle school): anything in parentheses is done first, followed by exponents, then multiplication and division, and finally addition and subtraction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# before you run this cell, can you say what it should print?\n",
    "4 - 2 * (1 + 6 / 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "<b>PRACTICE:</b> If you're new to python and coding, one of the best ways to get comfortable is to practice. Try writing and running different expressions in the cell below using numbers and the arithmetic operators `*` (multiplication), `+` (addition), `-` (subtraction), and `/` (division). See if you can generate different error messages and figure out what they mean.\n",
    "    </div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optional: try out different arithmetic operations\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1b. Names <a id='subsection 1b'></a>\n",
    "Sometimes, the values you work with can get cumbersome- maybe the expression that gives the value is very complicated, or maybe the value itself is long. In these cases it's useful to give the value a **name**.\n",
    "\n",
    "We can name values using what's called an *assignment* statement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# assigns 442 to x\n",
    "x = 442"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The assignment statement has three parts. On the left is the *name* (`x`). On the right is the *value* (442). The *equals sign* in the middle tells the computer to assign the value to the name.\n",
    "\n",
    "You'll notice that when you run the cell with the assignment, it doesn't print anything. But, if we try to access `x` again in the future, it will have the value we assigned it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print the value of x\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also assign names to expressions. The computer will compute the expression and assign the name to the result of the computation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = 50 * 2 + 1\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then use these name as if they were numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x - 42"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x + y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "<b>PRACTICE:</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optional: experiment with assigning names and doing arithmetic operations with named variables\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1c. Functions <a id='subsection 1c'></a>\n",
    "We've seen that values can have names (often called **variables**), but operations may also have names. A named operation is called a **function**. Python has some functions built into it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# a built-in function \n",
    "round"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Functions get used in *call expressions*, where a function is named and given values to operate on inside a set of parentheses. The `round` function returns the number it was given, rounded to the nearest whole number."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# a call expression using round\n",
    "round(1988.74699)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A function may also be called on more than one value (called *arguments*). For instance, the `min` function takes however many arguments you'd like and returns the smallest. Multiple arguments are separated by commas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "min(9, -34, 0, 99)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "<b>PRACTICE</b>\n",
    "<ul>\n",
    "    <li>The `abs` function takes one argument (just like `round`)</li>\n",
    "    <li>The `max` function takes one or more arguments (just like `min`)</li>\n",
    "</ul>\n",
    "\n",
    "\n",
    "Try calling `abs` and `max` in the cell below. What does each function do?\n",
    "\n",
    "Also try calling each function *incorrectly*, such as with the wrong number of arguments. What kinds of error messages do you see?\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# replace the ... with calls to abs and max\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Dot Notation\n",
    "Python has a lot of [built-in functions](https://docs.python.org/3/library/functions.html) (that is, functions that are already named and defined in Python), but even more functions are stored in collections called *modules*. Earlier, we imported the `math` module so we could use it later. Once a module is imported, you can use its functions by typing the name of the module, then the name of the function you want from it, separated with a `.`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# a call expression with the factorial function from the math module\n",
    "math.factorial(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "**EXERCISE:**  `math` also has a function called `sqrt` that takes one argument and returns the square root. Call `sqrt` on 16 in the next cell.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use math.sqrt to get the square root of 16\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Tables <a id='section 2'></a>\n",
    "\n",
    "The last section covered four basic concepts of python: data, expressions, names, and functions. In this next section, we'll see just how much we can do to examine and manipulate our data with only these minimal Python skills."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Tables** are fundamental ways of organizing and displaying data. Run the next cell to load the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ratings = Table.read_table(\"data/imdb_ratings.csv\")\n",
    "ratings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This table is organized into **columns**: one for each *category* of information collected:\n",
    "\n",
    "You can also think about the table in terms of its **rows**. Each row represents all the information collected about a particular instance, which can be a person, location, action, or other unit. \n",
    "\n",
    "What do the rows in this table represent?\n",
    "\n",
    "By default only the first ten rows are shown. Can you see how many rows there are in total?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2a. Table Attributes <a id='subsection 2a'></a>\n",
    "\n",
    "Every table has **attributes** that give information about the table, like the number of rows and the number of columns. Table attributes are accessed using the dot method. But, since an attribute doesn't perform an operation on the table, there are no parentheses (like there would be in a call expression).\n",
    "\n",
    "Attributes you'll use frequently include `num_rows` and `num_columns`, which give the number of rows and columns in the table, respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the number of columns\n",
    "ratings.num_columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "<b>PRACTICE:</b> Use `num_rows` to get the number of rows in our table.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the number of rows in the table\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2b. Table Transformation <a id='subsection 2b'></a>\n",
    "\n",
    "Not all of our columns are relevant to every question we want to ask. We can save computational resources and avoid confusion by *transforming* our table before we start work.\n",
    "\n",
    "#### Subsetting columns with `select` and `drop`\n",
    "The `select` function is used to get a table containing only particular columns. `select` is called on a table using dot notation and takes one or more arguments: the name or names of the column or columns you want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make a new table with only selected columns\n",
    "ratings.select(\"Votes\", \"Title\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If instead you need all columns except a few, the `drop` function can get rid of specified columns. `drop` works very similarly to `select`: call it on the table using dot notation, then give it the name or names of what you want to drop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# drop a column\n",
    "ratings.drop(\"Decade\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "<b>EXERCISE:</b> Pick two columns from our table. Create a new table containing only those two columns two different ways: once using `select` and once using `drop`. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use select\n",
    "..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use drop\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Filtering rows with `where`\n",
    "Some analysis questions only deal with a subset of rows.\n",
    "\n",
    "The **`where`** function allows us to choose certain rows based on two arguments:\n",
    "- A column label\n",
    "- A condition that each row should match, called the _predicate_ \n",
    "\n",
    "In other words, we call the `where` function like so: `table_name.where(column_name, predicate)`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get a subset of rows\n",
    "ratings.where(\"Decade\", are.equal_to(1950))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are many types of predicates, but some of the more common ones are:\n",
    "\n",
    "|Predicate|Example|Result|\n",
    "|-|-|-|\n",
    "|`are.equal_to`|`are.equal_to(50)`|Find rows with values equal to 50|\n",
    "|`are.not_equal_to`|`are.not_equal_to(50)`|Find rows with values not equal to 50|\n",
    "|`are.above`|`are.above(50)`|Find rows with values above (and not equal to) 50|\n",
    "|`are.above_or_equal_to`|`are.above_or_equal_to(50)`|Find rows with values above 50 or equal to 50|\n",
    "|`are.below`|`are.below(50)`|Find rows with values below 50|\n",
    "|`are.between`|`are.between(2, 10)`|Find rows with values above or equal to 2 and below 10|\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# example 2: get a subset of rows\n",
    "ratings.where(\"Rank\", are.above(8.7))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "<b> EXERCISE:</b> Describe what happened in each of the two examples above. Which rows were filtered out? Give an example where we would want to use those filters for analysis.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**YOUR RESPONSE HERE**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Coming up... <a id='section 3'></a>\n",
    "\n",
    "Knowing these few basic concepts about Python and Tables will help you interact with data in upcoming parts of the module. Here's a preview of the kinds of visualizations and operations coming up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make a bar plot of the max rank per decade\n",
    "ratings.select(\"Rank\", \"Decade\").group(\"Decade\", max).barh(\"Decade\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make many bar plots showing feature averages per decade\n",
    "def avg_by_decade(feature):\n",
    "    return ratings.select(feature, \"Decade\").group(\"Decade\", np.average).barh(\"Decade\")\n",
    "\n",
    "# create the slider sfor the widget\n",
    "buttons = widgets.ToggleButtons(options=[\"Rank\", \"Votes\"])\n",
    "\n",
    "# create the widget to view plots for different parameter values\n",
    "display(widgets.interactive(avg_by_decade, feature=buttons))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the Capital bike-sharing data set\n",
    "bikes = Table.read_table(\"data/day_renamed.csv\")\n",
    "bikes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# look at the correlation between temperature and casual rider numbers\n",
    "bikes.select(\"casual\", \"temp\").scatter(\"temp\", fit_line=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# compare scatter plots for casual and registered riders for different predictor variables\n",
    "def scatter_bikes(predictor, response, fit_line):\n",
    "    if response == \"both\":\n",
    "        b = bikes.select(\"registered\", \"casual\", predictor)\n",
    "    else:\n",
    "        b = bikes.select(response, predictor)\n",
    "    return b.scatter(predictor, fit_line=fit_line)\n",
    "\n",
    "# create the slider sfor the widget\n",
    "predict_widget = widgets.Dropdown(options=[\"humidity\", \"windspeed\", \"temp\"],\n",
    "                         value=\"humidity\")\n",
    "response_widget = widgets.Dropdown(options=[\"casual\", \"registered\", \"both\"],\n",
    "                         value=\"casual\")\n",
    "fitline_widget = widgets.Dropdown(options=[True, False],\n",
    "                          value=False)\n",
    "\n",
    "# create the widget to view plots for different parameter values\n",
    "display(widgets.interactive(scatter_bikes, predictor=predict_widget, response=response_widget, fit_line=fitline_widget))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### References\n",
    "\n",
    "- Sections of \"Intro to Jupyter\", \"Table Transformation\" adapted from materials by Kelly Chen and Ashley Chien in [UC Berkeley Data Science Modules core resources](http://github.com/ds-modules/core-resources)\n",
    "- \"A Note on Errors\" subsection and \"error\" image adapted from materials by Chris Hench and Mariah Rogers for the Medieval Studies 250: Text Analysis for Graduate Medievalists [data science module](https://github.com/ds-modules/MEDST-250).\n",
    "- Rocket Fuel data and discussion questions adapted from materials by Zsolt Katona and Brian Bell, BerkeleyHaas Case Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Author: Keeley Takimoto"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}