Skip to content

wqx13579/data_science_tools1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Tools 1

  • Course: Data Science Project 1 COMP-4447-1
  • Class time: M, Wed 07:00 PM - 08:50 PM |Engineering & Computer Science | Room 410
  • Instructor: Pooran Singh Negi, pooran.negi@du.edu webpage
  • GTA: Mitchell Wright
  • Office: 470
  • Office Hours: Tue, Thu, 2.00 p.m. - 4.00 p.m. Email for 1-on-1 help.

Books

Other books

- Think Bayes

Optional material

More to come

Course Description

It is recommended that you consult this github page often for material related to this course. You should check your e-mail periodically for messages. Assignments will be upload here and in the canvas.

The main objective of data science tools 1 is to learn various tools to perform data analysis. Focus in tool1 is data cleanup, summarization ,and visualization. It is more like a hacking skill set but our primary focus will be on the scientific python and Linux ecosystem. We’ll use jupyter notebook/lab for in the class and homeworks. This should make our learning interactive.

For the final project, students will work through individual or team projects applying course-work to the data lifecycle within a particular domain. The focus will also be on best data science/software engineering practices and reproducible work.

Please select a project by January 20th as per your preference. You are allowed to have a group of 2 to 3 students but project work must justify team count. There will be a homework asking about the detail of your final project. We’ll provide feedback about feasibility of the final project. Final projects, Can be based on initial capstone work?. Please let us know if this is the case. We need to go over details.

Syllabus

This syllabus is subject to change at the discretion of the instructor.

  • Jupyter Notebook for reproducible workflow.
  • Data science and EDA.
  • Git tools work flow.
  • Data science at command prompt. Linux command line, bash, basic awk and sed.
  • Data collection and ingestion(web scrapping and reading datasets + pandas).
  • Data cleanup and imputation + Pandas.
  • Data summarization and visualization+ panda(groupby, apply, aggregate etc).
  • Go over some some topics as per students demands.
  • more to come

    Linux command line and scientific python ( primarily numpy, matplotlib, request, seaborn, basic pandas) will be used throughout the course.

Grading

There will be coding/analysis homework assignments, midterm and a final project. We’ll drop one of your worst assignment grade.

There will be a final presentation of the final project. You will be required to submit a final project report in the jupyter notebook format.

Dates

coding Homework50%
midterm, 13 Feb in class25%
final project presentation, 15 minutes, 13 March in class15%
final project report, due 15 March, please refer to above final report format for submission guideline20%

Final course grading rubric

grade range [(‘A’, >=93), (‘A_minus’, >=89), (‘B_plus’, >=85), (‘B’, >=81), (‘B_minus’, >=77), (‘C_plus’, >=73), (‘C’, >=69), (‘C_minus’, >=65), (‘D_plus’, >61), (‘D’, >=57), (‘D_minus’, >=53), (‘F’, < 53)])

Honor code

All members of the University of Denver community are expected to uphold the values of Integrity, Respect, and Responsibility. These values embody the standards of conduct for students, faculty, staff, and administrators as members of the University community. Our institutional values are defined as:

Integrity: acting in an honest and ethical manner;

Respect: honoring differences in people, ideas, experiences, and opinions;

Responsibility: accepting ownership for one’s own behavior and conduct.

Please respect DU Honor Yourself, Honor the Code

Students with Disabilities

Students with recognized disabilities will be provided reasonable accommodations, appropriate to the course, upon documentation of the disability with a Student Accommodation Form from the Disability Services Program. To receive these accommodations, you must request the specific accommodations, by submitting them to the instructor in writing, by the end of first week of classes. Visit CAMPUS LIFE & INCLUSIVE EXCELLENCE webpage for details.

Withdrawal Policy

Please see registrar calender for Academic deadlines. We’ll strictly follow the deadlines.

Data set for Projects

We need to know your project/dataset, before we approve it for final project.

More to come.

Software Installation

Python

We want everybody to have same experience using computational tools in data science tools 1. Please follow steps as per your operating system.

Window based installation

Please install Windows Subsystem for Linux (WSL) on window 10. Follow the instruction in this post Using Windows Subsystem for Linux for Data Science by Hugo Ferreira for installing Linux. **ignore install Anaconda part.**

You can also watch this video to see installation of Windows 10 Bash & Linux Subsystem Setup.

Linux /Mac users should already have bash command prompt

You can run echo $0 to check current shell. Change to bash shell using chsh -s /bin/bash

One you are in Linux/Mac bash command prompt, Please follow following instructions

Python3 installation

Please follow instructions here to install python3 if it is not installed in your system. This link also lists Windows Subsystem for Linux (WSL) for window 10(Windows 10 Creators or Anniversary Update). I am using python 3.5.2. Hopefully any version of python 3 should work.

creating virtual environment and installing packages for data science tools 1

Run following commands from command prompt.

  • apt-get install python3-venv
  • Using command line(cd command), go to the folder where you want to keep python file, notebooks related to this course.
  • run **python3 -m venv /path/to/new/virtual/environment**
    • e.g. I ran python3 -m venv dst1_env
  • To activate you environment run source /path/to/new/virtual/environment/bin/activate
    • e.g From this course directory I run, source dst1_env/bin/activate
  • run python3 -m pip install – upgrade pip. Note that there are 2 dash in upgrade option.
  • run wget https://raw.githubusercontent.com/psnegi/data_science_tools1/master/requirements.txt
  • run pip install -r requirements.txt
  • run jupyter notebook or jupyter lab.
  • In the browser you should see your current files.
  • Click on the notebook you want to run.
  • click on RISE slideshow extension in notebook, if you want to see notebook as slideshow.

To deactivate python virtual environment, run deactivate

Python learning resources

You can also go to my python for reproducible research github repository and start by running pythonBasic.ipynb notebook. I will go over basic of python and jupyter notebook.

data analysis tools in python

  • more to come

Notebooks

Jan 7

Jan 9

Homeworks

No late hw will be accepted

HW nodesciption and link
Due date
1Complete questions in this notebooksFriday 18 th Jan 11.59 p.m

Course Activity

DateReading/Coding Assignmentsclass activity
7 JanInstall jupyter environmentMitchell covered Jupyter introduction notebook
also helped with installation
Python Virtual EnvironmentsCovered jupyter introduction and data science notebook.
9 JanResources to learn gitIt may not be time consuming to wait for notebook to get started via binder every time.
We’ll also go over data science Go to the folder for this course in your computer and run git clone https://github.com/psnegi/data_science_tools1.git.
Run command ls. You should see data_science_tools1 folder. Activate your virtual environment.
Navigate to course directory using cd data_science_tools1. change to the notebook directory using command cd notebooks.
Now run jupyter notebook. You should see all the notebooks in a browser window. Click on the notebook you want to run.
To run a cell in the notebook press alt+enter or ctr+enter.
Note that whenever a new content is posted, you must run git pull origin master from data_science_tools1 directory to make sure you have the latest
content. Don’t worry about above git commands. We’ll start git in next class. Please start with git notebook.
I don’t like notebooks.- Joel Grus video provide by Laura Atkinson

About

course website for data science tools 1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%