TFN_PracticePage.html

<!DOCTYPE html>

<html>
  <head>
    <title>Practice Data Science</title>
    <link rel="stylesheet" href="Stylesheets/TFN_SiteStylesheet.css">
    <link rel="stylesheet" href="Stylesheets/TFN_DropDownMenu.css">
    <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.0/css/all.css" integrity="sha384-lZN37f5QGtY3VHgisS14W3ExzMWZxybE1SJSEsQp9S+oqd12jhcu+A56Ebc1zFSJ" crossorigin="anonymous">
  </head>

  <body>

    <div class="dropdown">
      <button class="dropbtn">
        <i class="fas fa-bars"></i>
      </button>
      <div class="dropdown-content">
        <a href="TFN_WelcomePage.html">Home</a>
        <a href="TFN_AboutUsPage.html">About Us</a>
        <a href="TFN_DataSciencePage.html">What is Data Science?</a>
        <a href="TFN_BSProjects.html">Brown Scholars' Projects</a>
        <a href="TFN_PracticePage.html">Practice Data Science</a>
        <a href="TFN_ResourcesPage.html">Resources</a>
      </div>
    </div>

    <header>
      <h1>Practice Data Science</h1>
    </header>

    <div class = "parallax"> </div>

    <h3>Instructions to practice working with a data set:</h3>
      <ol>
        <li>Choose a data set from the list of CSV files below, or go to <a href="https://www.kaggle.com/datasets" target="_blank">kaggle</a> and choose a different one.</li>
        <li>Download your chosen CSV file.</li>
        <li>Download the Jupyter notebook that corresponds with your data set. Make sure that both the CSV and Jupyter notebook files are in <strong>the same directory</strong> on your computer.</li>
        <li>Launch the Jupyter notebook (through anaconda). Go through and complete the exercises.</li>
        <li>If you get stuck, use Google, our resource page, or look at a completed Jupyter notebook (below).</li>
      </ol>

      <p>Note: You will need a kaggle account to download a data set. You will need Python downloaded onto your computer. We recommend <a href="https://www.anaconda.com/download/#macos" target="_blank">anaconda.</a></p>

    <div class = "parallax"> </div>

    <h3>CSV File Downloads</h3>
      <ul>
        <li><a href="JupyterNB+CSV/all-ages.csv" download>All College Majors</a></li>
        <li><a href="JupyterNB+CSV/avocado.csv" download>Avacado Prices</a></li>
        <li><a href="JupyterNB+CSV/babynames.csv" download>NYC Baby Names</a></li>
        <li><a href="JupyterNB+CSV/popularWebsites.csv" download>Popular Websites</a></li>
        <li><a href="JupyterNB+CSV/stateElections.csv" download>State Legislative Election Results</a></li>
        <li><a href="JupyterNB+CSV/women-stem.csv" download>Women STEM Majors</a></li>
      </ul>
    <p></p>

    <div class = "parallax"> </div>

    <h3>Jupyter Notebooks</h3>
      <ul>
        <li>Located in <a href="https://github.com/SGunal/JupyterNotebooks" target="_blank">this Github repository</a>
          <ul>
            <li>Click the green 'Clone or Download' button on the right hand side.</li>
            <li>Download as a ZIP file or copy the link and clone the repository to your computer.</li>
          </ul>
        </li>
      </ul>
    <p></p>

    <div class = "parallax"> </div>
  
    <h3>Extra Resources</h3>
      <ul>
        <li><a href="https://pandas.pydata.org/pandas-docs/stable/tutorials.html" target="_blank">Pandas Tutorials</a></li>
        <li><a href="https://pandas.pydata.org/pandas-docs/stable/visualization.html" target="_blank">Pandas Graphing</a></li>
        <li><a href="https://matplotlib.org/contents.html" target="_blank">Matplotlib Documentation</a></li>
      </ul>
      <p></p>

    <div class = "parallax"> </div>
  
    <h3>Useful Commands</h3>
      <ul>
        <li>For Cleaning Data
          <ul>
            <li><span class="code">df.info()</span> <i class="fas fa-arrow-right"></i> Displays general info about dataframes (names of columns, number of entries per column, data type). Use it to get a general sense of your data.</li>
            <li><span class="code">df.isnull().sum()</span> <i class="fas fa-arrow-right"></i> Lists total number of null values in each column of dataframes</li>
            <li><span class="code">df["ColumnName"].value_counts()</span> <i class="fas fa-arrow-right"></i> Lists all unique values and how many times they occur in a specific column</li>
            <li><span class="code">df["ColumnName"].replace(to_replace, value, inplace=True)</span> <i class="fas fa-arrow-right"></i> Replace one or more values (to_replace) with another value (value)</li>
            <li><span class="code">df["ColumnName"].unique()</span> <i class="fas fa-arrow-right"></i> Returns an array of all unique values in a specific column. Can be used to see if you need to replace the names of any values.</li>
          </ul></li>
        <li>For Visualizing Data
          <ul>
            <li><span class="code">df.groupby(["Column1", "Column2"])[["Column 3"]]</span> <i class="fas fa-arrow-right"></i> Group by a series of columns.</li>
            <li><span class="code">df.plot()</span> <i class="fas fa-arrow-right"></i> Makes plot of data in a dataframe.</li>
            <li><span class="code">plt.plot(x, y)</span> <i class="fas fa-arrow-right"></i> Creates plot (using matplotlib, need plt.show() after.</li>
          </ul></li>
      </ul>

  <h2>Have Fun!</h2>

</body>