.
├── Makefile # run `make help` to see make targets
├── README.md # this readme file
├── requirements.txt # virtualenv requirements file
├── lectures # lecture notebooks
├── preparation # course preparation notebooks
└── source # sources, e.g., images for notebooks
Please consider the following instructions and the material in this repository carefully. The repository content is designed to make participation in Learning from Big Data as easy and enjoyable for you as possible.
- Python 3.8
virtualenv
Optional:
- node (for
plotly
) graphviz
(install withbrew install graphviz
)
Please familiarize yourselves with virtualenv
(or a similar tool such as conda
). Some background information can be found in the virtualenv docs or here.
In the lectures, we will use Jupyter notebooks to illustrate implementation-related key points. The notebooks will be published in this repository well ahead of the lecture. Please make sure that you can execute the notebooks before joining the class so you can easily follow the coding parts in the lectures.
For the homework assignments, use an IDE of your choice. IDE choice really depends on personal preferences. A very popular choice is PyCharm (JetBrains offers a free pro license for students). If you are familiar with coding this should be easy to manage. Other people like Spyder, JupyterLab or Google Colab. Do some research to figure out which IDE suits your background and preferences best.
The Makefile included in this repository is purely for convenience (e.g., setting up the virtual environment, launching a notebook server). It should work on Linux and Mac OS X systems.
$ make help
Make targets:
build create virtualenv and install packages
build-lab `build` + lab extensions
freeze persist installed packaged to requirements.txt
clean remove *.pyc files and __pycache__ directory
distclean remove virtual environment
run run jupyter lab
Check the Makefile for more details
- Open a terminal and navigate to the path that you want to clone the repository to
- Clone the repository
$ git clone git@github.com:sbstn-gbl/learning-from-big-data.git
- Navigate to repository path, create virtual environment and install required modules with
or
$ cd learning-from-big-data && make build
make build-lab
to includejupyterlab
dependencies. - Start a notebook server with
$ make run
If make
does not work on your computer run the steps included in the Makefile manually. You only need to do this setup once.
Please solve the following three pre-course assignments before the first lecture.
Use textbooks or online resources to fill gaps in your skills. The pre-course assignments will prepare you for the materials covered in Learning from Big Data and help you assess whether you are ready for this course. The pre-course assignments are challenging, if you find them too challenging, you should consider enrolling in this course in the following year. If you are not sure, feel free to contact one of the teachers before starting this course.
Please also study the material covered in the following online courses:
- Lecture 05-1: Logistic Regression (Motivation)
- Lecture 05-2: Missing Data
- Lecture 05-3: Logistic Regression w/ MBGD
- Lecture 06-1: Decision Trees
- Lecture 06-2: Representations
- Lecture 06-3: Dimensionality Reduction
- Lecture 06-4: (Extra) Entropy
- Lecture 07-1: AdaBoost
- Lecture 07-2: Gradient Boosting
- Lecture 07-3: Overfitting
- Tutorial Assignment 2 (Part 1)
- Tutorial Assignment 2 (Part 2)
- Lecture 09-1: NN Activation Functions
- Lecture 09-2: NN and PyTorch Autograd
- Lecture 09-3: NN Backpropagation
- Lecture 09-4: NN Example Implementation
- Lecture 09-5: NN Generalization
- Lecture 11-1: NN TensorBoard
- Lecture 11-2: NN Optimizer
- Lecture 11-3: NN Initialization
- Lecture 13-1: Example for Endogeneity
- Tutorial Assignment 3