/
TFN_PracticePage.html
executable file
·102 lines (85 loc) · 5.18 KB
/
TFN_PracticePage.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
<!DOCTYPE html>
<html>
<head>
<title>Practice Data Science</title>
<link rel="stylesheet" href="Stylesheets/TFN_SiteStylesheet.css">
<link rel="stylesheet" href="Stylesheets/TFN_DropDownMenu.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.0/css/all.css" integrity="sha384-lZN37f5QGtY3VHgisS14W3ExzMWZxybE1SJSEsQp9S+oqd12jhcu+A56Ebc1zFSJ" crossorigin="anonymous">
</head>
<body>
<div class="dropdown">
<button class="dropbtn">
<i class="fas fa-bars"></i>
</button>
<div class="dropdown-content">
<a href="TFN_WelcomePage.html">Home</a>
<a href="TFN_AboutUsPage.html">About Us</a>
<a href="TFN_DataSciencePage.html">What is Data Science?</a>
<a href="TFN_BSProjects.html">Brown Scholars' Projects</a>
<a href="TFN_PracticePage.html">Practice Data Science</a>
<a href="TFN_ResourcesPage.html">Resources</a>
</div>
</div>
<header>
<h1>Practice Data Science</h1>
</header>
<div class = "parallax"> </div>
<h3>Instructions to practice working with a data set:</h3>
<ol>
<li>Choose a data set from the list of CSV files below, or go to <a href="https://www.kaggle.com/datasets" target="_blank">kaggle</a> and choose a different one.</li>
<li>Download your chosen CSV file.</li>
<li>Download the Jupyter notebook that corresponds with your data set. Make sure that both the CSV and Jupyter notebook files are in <strong>the same directory</strong> on your computer.</li>
<li>Launch the Jupyter notebook (through anaconda). Go through and complete the exercises.</li>
<li>If you get stuck, use Google, our resource page, or look at a completed Jupyter notebook (below).</li>
</ol>
<p>Note: You will need a kaggle account to download a data set. You will need Python downloaded onto your computer. We recommend <a href="https://www.anaconda.com/download/#macos" target="_blank">anaconda.</a></p>
<div class = "parallax"> </div>
<h3>CSV File Downloads</h3>
<ul>
<li><a href="JupyterNB+CSV/all-ages.csv" download>All College Majors</a></li>
<li><a href="JupyterNB+CSV/avocado.csv" download>Avacado Prices</a></li>
<li><a href="JupyterNB+CSV/babynames.csv" download>NYC Baby Names</a></li>
<li><a href="JupyterNB+CSV/popularWebsites.csv" download>Popular Websites</a></li>
<li><a href="JupyterNB+CSV/stateElections.csv" download>State Legislative Election Results</a></li>
<li><a href="JupyterNB+CSV/women-stem.csv" download>Women STEM Majors</a></li>
</ul>
<p></p>
<div class = "parallax"> </div>
<h3>Jupyter Notebooks</h3>
<ul>
<li>Located in <a href="https://github.com/SGunal/JupyterNotebooks" target="_blank">this Github repository</a>
<ul>
<li>Click the green 'Clone or Download' button on the right hand side.</li>
<li>Download as a ZIP file or copy the link and clone the repository to your computer.</li>
</ul>
</li>
</ul>
<p></p>
<div class = "parallax"> </div>
<h3>Extra Resources</h3>
<ul>
<li><a href="https://pandas.pydata.org/pandas-docs/stable/tutorials.html" target="_blank">Pandas Tutorials</a></li>
<li><a href="https://pandas.pydata.org/pandas-docs/stable/visualization.html" target="_blank">Pandas Graphing</a></li>
<li><a href="https://matplotlib.org/contents.html" target="_blank">Matplotlib Documentation</a></li>
</ul>
<p></p>
<div class = "parallax"> </div>
<h3>Useful Commands</h3>
<ul>
<li>For Cleaning Data
<ul>
<li><span class="code">df.info()</span> <i class="fas fa-arrow-right"></i> Displays general info about dataframes (names of columns, number of entries per column, data type). Use it to get a general sense of your data.</li>
<li><span class="code">df.isnull().sum()</span> <i class="fas fa-arrow-right"></i> Lists total number of null values in each column of dataframes</li>
<li><span class="code">df["ColumnName"].value_counts()</span> <i class="fas fa-arrow-right"></i> Lists all unique values and how many times they occur in a specific column</li>
<li><span class="code">df["ColumnName"].replace(to_replace, value, inplace=True)</span> <i class="fas fa-arrow-right"></i> Replace one or more values (to_replace) with another value (value)</li>
<li><span class="code">df["ColumnName"].unique()</span> <i class="fas fa-arrow-right"></i> Returns an array of all unique values in a specific column. Can be used to see if you need to replace the names of any values.</li>
</ul></li>
<li>For Visualizing Data
<ul>
<li><span class="code">df.groupby(["Column1", "Column2"])[["Column 3"]]</span> <i class="fas fa-arrow-right"></i> Group by a series of columns.</li>
<li><span class="code">df.plot()</span> <i class="fas fa-arrow-right"></i> Makes plot of data in a dataframe.</li>
<li><span class="code">plt.plot(x, y)</span> <i class="fas fa-arrow-right"></i> Creates plot (using matplotlib, need plt.show() after.</li>
</ul></li>
</ul>
<h2>Have Fun!</h2>
</body>