Skip to content

raviolli77/dataScience-UCSBProjectGroup-Syllabus

Repository files navigation

Winter Quarter Project Group - Data Science at UCSB

Contributors: Raul Eulogio, David A. Campos, Jason Freeberg, Nathan Fritter

In Memory of..

The efforts of this quarter and the work done is dedicated to the memory of:

  • Fernando Regino (1993-2013)
  • Bernardino De Jesus (1993-2016)
  • Ivan Garcia Vergara (1991-2018)
  • Erik Alonso (1991-2009)
  • Jorge Zarate (1990-2008)

"When the lights shut off
And it's my turn to settle down
My main concern
Promise that you will sing about me
" - Kendrick Lamar

Thank you to everyone who participated this quarter

Abstract

This repository serves as an itinerary for the Project Groups for Winter Quarter for the Data Science at UCSB organization. Providing a weekly overview as well as resources used within the weekly meetings.

Contributors:

Table of Contents

Lesson Plan

Week 2: Introductions

  • Who are you?
    • Name
    • Major
    • Year
    • Where are you from?
  • Why are you here?
    • What are you trying to accomplish in life?
    • what are you trying to accomplish here?
    • What are you trying to learn?
    • What project(s) are you working on today?
    • What recent failure have you had?
    • Strengths & weaknesses as it relates to data science or in general? Storm Goal of this group is to ultimately get projects finished and published
  • WHY
    • We found that it is by working on projects that you actually get to learn and being to understand how to do data science
  • Brainstorm on data science ideas
    • Write them on a piece of paper
    • Go to the front of the group and present it
    • Have people walk up to you/you walk up to people, persuade people to be in your group

Collide:

  • Form teams
  • Mix up grade levels/experience
  • Discuss weaknesses, technologies, expertise, talent
    • Pick R or Python
  • Establish Communication channels
    • Facebook
    • GroupMe
    • Slack
    • GitHub
    • Phone
    • Gmail/Email

Homework:

  • Find an interesting project online/from inertia7.com
    • Read through contents
  • Catch up on your R/Python skills with DataCamp
  • Get to know each other
  • Become Familiar with GitHub/create account (for more beginner level/those who weren't here, we'll go into more detail in a later meeting)

Links to Resources to resources discussed in meeting:

Week 3: Why do a Data Science Project?

Some preliminaries

  • Does everyone in your team have:

    • Slack account/channel within the dsprojectgroup Slack?
    • GitHub account?
    • R, Python, SQL set up on their machine? (Whatever y'all plan on using)
      • Speak about versions for language and packages/modules. Especially in Python:
        • Speak to me after if you need more clarification +If you can answer this questions then you're fine: Do you know what a virtual environment is? And do you know its use?
          • If you don't know have your team speak to me after.
      • Which interface will your team be using i.e. Rstudio or Jupyter Notebook for R
  • Introduce the concepts of Stand Ups

    • Structure of an effective Stand Up:
      • What did I accomplish last meeting?
      • What will I do today?
      • What obstacles are impeding my progress? (Blockers)
  • Document everything in your Slack channel

    • If you used a site to review R, Python, html, etc. post it within your group's channel
    • Read a cool article relating to your project; document it on Slack
    • This will become important when citing sources, creating documentation for project, and just a good habit to develop since people deserve credit for helping you!
  • Trello

    • Nathan will introduce the interface and how to integrate it into your workforce
    • We might create a markdown file explaining in more detail if people do not understand how to use it right away (but is pretty easy to use).
    • Resources:

What is a Data Science Project?

  • How to do a Data Science Project?

    • Steps of a Data Science project:
      • Getting Data
        • UCI Machine Learning Repository
        • Kaggle datasets
      • Cleaning data/sanity checks
      • Exploratory Analysis
        • Trends in reponse and predictor variales
      • Modeling (Choosing Supervised Vs. Unsupervised Learning)
      • Model Validation
      • Sharing Results
        • Inertia7.com
        • GitHub repo with nice READNE.md
        • Jupyter/RMarkdown Notebook

If you don't think you can do a project on your own right of the bat. Try doing a project from Inertia7!

Here are some of my own repos where I have projects that aren't published on Inertia7:

Discuss what their project can look like given the structure of what they just hacked

  • Fill in the Steps of a Data Science Project

Homework: For this section, we can be lenient as to when this gets done. For more advanced groups we expect for you to be able to do this on your own. Now for the newer groups you can wait until the next meeting to have me or other members help with the process.

Week 4: Project Iteration/GitHub

Some Preliminaries:

  • Are people interested in a Python Hackathon?

    • If so when and where works best
  • Has your team created a GitHub Repo for your project within the organizational GitHub (Source: https://github.com/UCSB-dataScience-ProjectGroup)?

    • Does it have a ReadMe explaining the Steps of a Data Science Project?
    • Did you all agree which versions/interface for the language you will be using?
    • Did you reach a conclusion of what models/approach you will take?
      • If not give us an overview what you plan to do, by the end of this meeting the project should be decided more or less

Team Resources

  • Has your team...
    • Been in contact through Slack?
    • Been doing Stand Ups?
    • Been addressing issues in going about your project or any preliminary practice for your project
    • Asked for help?

GitHub Crash Course

Here we're giving a quick overview of how GitHub works. Purpose is to be used as a rudimentary guide for those of you who are new to GitHub. We can spend an entire day going over the workflow of GitHub, but for now we're concerned with just getting your feet wet, and soon creating a repo for your project if you haven't already.

NOTE: One can spend an entire day learning git, so we'll leave that out for this iteration. We will provide resources for git below!

  • Step 1:

    • Create a GitHub account (Should go without saying, but you'd be surprised.)
  • Step 2:

    • You should create a myProject folder where you keep all your projects. This will help with organization for later on when you'll be doing a shit load of projects and prior when publishing projects!
    • Create a folder for your project where you will include things like, but not limited to:
      • README file - This file will be other people's introduction to your project so make it pretty and easy to follow! (in .md format). I use Sublime Text to create and edit README files (there's a plethora of text editors like Notepad++, atom, etc. really its all personal preference)
      • Script files - These files will be in the format of the language you are doing your project on so either an R file or Python file (in .R or .py or .sql )
      • Data file(Not sure what the proper name for this is will edit later) - This file is where your data is stored if you are using a static data source typically it can be:
        • .csv file
        • .txt file
        • .JSON file
        • .db file
      • Image folder - For organizational purposes we usually create an image folder which is where we store all images produced in the project if we plan on hosting them or making them viewable without having to run/save the code. Inside this folder you will find static image files like:
        • .png files (favorited in producing statistical images)
        • .jpeg
        • .gif
      • Once you get more acquainted with GitHub there will be more files that you will add, but for this example these will do
  • Step 3:

    • Once you have the folder for your project and all the respective files you wish to include in the repo on the main page of GitHub, click the green button that says New repository
    • Add the Repo name: we usually name our repos as such
      • statisticalModel_DataSetDescription Ex.
        • classification_IrisFlowersR
        • regression_bostonHousingR
    • Add a description: give a brief overview of what your project will be about to help give people context. Ex.
      • A collection of alternate R markdown templates
      • Repo for a quick ggplot2 tutorial for Exploratory Analysis using Jupyter Notebook and R script
    • Leave it as public: Make it accessible to everyone
    • Initialize with a README - ALWAYS initialize with a README: this acts as an instructional overview for your project
      • You typically include steps that were required that you can't express in your code (i.e. Creating a plotly account, steps needed if there are multiple scripts in your project)
      • A brief overview of your data set and statistical models used in the project
        • This will help later on if you plan to publish on inertia7!
      • Updates made to your project since its last iteration
      • Look at the inertia7 README's for some concrete examples
  • Step 4: Since you will be working in a team you have to be familiar with branches. Branches are different versions for the project, so a good way for your group to work on the project without fucking up the master branch

  • (Master Branch: This is the version the world will see and use, so make sure that this branch is the best iteration/is deployable)

    • Create a branch and call it like ravi_branch
    • You and each person in your team should have a branch that shows your iteration of the project if you happen to go ahead or test something out you haven't spoken with your teammates yet.
  • Step 5: Say you and your group are in agreement that your branch is the version you want on the master branch, the next step is creating a Pull Request.

  • (Pull Request: Allows people to review any changes made in a project, make modifications before the master branch changes, and overall help a team work efficiently)

    • Go into the branch you want to merge so ravi_branch
    • Click New Pull Request
      • Here you will see the two branches being compared:the base will typically be the master branch and the compared file will be ravi_branch in our example.
      • Add a description of some of the changes you made!
      • GitHub will give you an overview of the changes made in files
      • Once you have reviewed everything click Create pull request
      • This is where other teammates will be notified of you wanting to merge your branch and the master branch
      • If everyone is in agreement you click Merge pull request
      • Then, click Confirm merge and the master branch will now have the same contents as ravi_branch

That's a quick and rough tutorial to working in GitHub. Doesn't go over everything but should give context as to how to work as a team using GitHub and branches. I have provided sources that go in more detail and definitely explain better so I would suggest reading up on them!

Homework:

  • Will depend on conversations we have on Wednesday to see where your team is at
  • Have a repo within the organizational repo by the end of today!
  • Create branches for each teammate
  • Set up a meeting time outside of Wednesday

Links to Resources to resources discussed in meeting(NOTE(2/14): Moved GitHub related resources to Recommended Resources for entire quarter):

Week 5: Project Iteration

Some Preliminaries:

  • Python Hackathon (Workshop)

    • Steps needed to be taken before we can start/set up the hackathon:
      • Install Python3.X
      • Use a Virtual Environment for your project if it will be in Python
    • Fill out the google survey sent yesterday night:
      • We need to gauge date, time, and funds to make sure it will run smoothly
  • Rewards!!!

    • HG Data Hackathon
      • Date proposition: April 21st from 2pm to 10pm
        • Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams
    • Spoke with Jason
      • Informal presentation of projects with congratulatory refreshments
        • Reward for Best Data Visualization
        • Reward for Best insight/best modeling
        • Reward for Best presentation
    • Jun Seo can speak of presentation of projects for library staff!
  • Major issues to address for today:

    • Does every team have a requirement.txt for their project?
    • Some README's need more detail (I will go about doing informal interviews today to each group)
    • By today your team should have what algorithms, methods and Python versioning.
    • Branches for team members Depending on attendance we want today really show us the early iteration of your project so
  • Have a script with modules you will be using

  • Data set attached to your repo

  • Algorithms you will use

Week 6: Project Iteration/Blockers

Some Preliminaries:

  • Python Hackathon (Workshop)
    • Confirmed Date: 2/25/2017 at 10 a.m.
    • Buy shirts to rep!
      • Contact me after to get them from other officer. I can take Venmo!
  • Rewards (Reiterate because a lot of people were MIA)!!!
    • HG Data Hackathon
      • Date proposition: April 21st from 2pm to 10pm
        • Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams
    • Informal presentation of projects with congratulatory refreshments near end of this quarter
      • Reward for Best Data Visualization
      • Reward for Best insight/best modeling
      • Reward for Best presentation
    • The informal presentation can be a prep for the presentation to the Library faculty
      • Most likely scheduled at the start of next quarter (Ask Jun-Seo if you have any questions)
    • Project will be posted in the newest iteration of int7x (inertia7)!
  • Team Management
    • Word from me regarding team
    • We need teams to start applying Stand Ups now (Mandatory)
      • Must be done before starting your sessions and immediately when your team finishes the meet-up.
      • Will demonstrate again with more feedback given to teams Today will play as an important catch up day for many teams since midterm season was(is) around
  • I will go around to teams and ask about project relating to
    • repository
    • code
    • README Today will be focused mostly on iterating projects.

Week 7: I didn't prep this week

Carry on. Nothing to see here.

Week 8: Presentation/Flex Day

For this week I decided we are going to do a surprise project presentation.

Announcements: Thank you for everyone who participated in the Python Workshop

I will need every team to do the following:

  • Update all scripts on their GitHub repo in the ProjectGroupWinter2017.
    • README.md
    • scipt.py
    • All appropriate data files (i.e. csv files, txt files, etc.)
    • Images (inside images folder) that were produced for this project
  • Be prepared to pitch your idea to me.
    • Sell that shit.
    • Why is your project relevant to Data Science and the data community as a whole.
  • (Not 100%) I would like to see some scripts/notebooks being ran during presentation but due to time constraints, we might just only use what's on GitHub.

Each group presentation should be no longer than 15 minutes

Week 9: Quarter Wrap-Up

Final thoughts on quarter

  • Thank You's
  • Dedications
  • Food for thought for next quarter

Some Preliminaries:

FACTOR PI sale


Only 1$ a piece! Go show some support to our friends at the Female Actuarial Association. Find event link Here

  • Location: SRB
  • Date: March 14, 2017
  • Time: 11AM - 3PM

Farmer's Data Talk


The Org. wants a packed house for the Farmer's Insurance Data Talk so let's all make it out! Facebook event link Here

  • Location: UCen SB Harbor Room
  • Date: March 9, 2017 (So tomorrow)
  • Time: 6PM - 8PM
  • Will NOT BE FOCUSED on actuary based stuff (Will focus on Natural Language Processing so highly relevant to our group)

HG Data Hackathon

  • Location: HG Data Offices
  • Time: April 21st
  • More on this later
  • Will most likely work on a tutorial with Calvin during Spring Break to help prep

Chapman Data Fest

  • Location: Chapman University
  • Time: April 21st as well
  • Team of 5 to attend
  • NOTE: Json wants the people to attend the Chapman Data Fest to be of different class levels (i.e. freshman, sophomore, Junior, Senior and Super Senior)
  • Let me know if you're interested in this event! Link for Event Here

Library presentations

We have confirmed date!

  • Location: Same location so here
  • Time: April 26th at 7pm
  • Need y'all to use today to prep and keep track of progress!
    • Make Github repos pretty
    • Code readable
    • Write nice docs
    • Make plots pretty with titles, axis labels, and legends

Let's really flex for this. Everyone worked hard!

We would like your team to use inertia7 to present your projects so this is a good segue for the next section

inertia7 User Testing

We know dead week and finals are fast approaching but we were wondering if anyone would be interested in User-testing the new iteration of inertia7 to give constructive criticism.

  • Doesn't have to be publishing a project. Can just play with the app
  • If interested to talk to me or David
  • Follow Link to apply for credentials

Wrap-Up

Things needed by the end of this meeting:

  • Updated Scripts
  • Updated README's
  • Add any appropriate images
  • Create plotly account to publish plotly graphs (if applicable)
  • To-do list detailing what is still needed for your project
  • Keep in contact with partners over break.
  • If you're bored during break work on the project!

IMPORTANT TO NOTE: Since finals is approaching your group needs set this up in their repo since there will be a gap period of 3 weeks. I need to know where your team is at and context of this. You CAN'T leave until your team shows me the repo and the outline of what is done and what isn't done.

Three weeks is a long time and if there's no structure as to where your at you will forget/will be hard to pick back up.

For those of you who feel you are ready to iterate on the presentation part of your project talk to me by the end of today's meeting.

Again thank you for a wonderful quarter and hope to see you all again next quarter!

Recommended Resources for entire quarter:

About

Data Science Project Group repo for Winter 2017

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages