Materials for the Paris-Saclay Center for Data Science python workshop
Data science is gaining attention impacting many scientific fields and applications. Data science encompasses a large number of topics such as data mining, data wrangling, data visualisation, pattern recognition, or machine learning.
This workshop intends to give an introduction to some of these topics using Python and the PyData ecosystem. It is not a course on deep learning.
You can run the notebooks in a binder:
Goal: introduce the PyData ecosystem to manipulate, explore, and visualize data.
- Introduction to the basics of numpy, pandas, matplotlib, and Dask.
Goal: introduce the basics of machine learning using the scikit-learn library.
- Get familiar with general principles of machine learning;
- Use these principles by using the scikit-learn library on some toy and real-world data examples.
The course uses Python 3 and some data analysis packages such as Numpy, Pandas, scikit-learn, matplotlib, and Dask.
This step is only necessary if you don't have conda installed:
- download the Miniconda installer for your OS here
- run the installer following the instructions here depending on your OS.
# Clone this repo
git clone https://github.com/lesteve/2020-sed-intro-datascience
cd 2020-sed-introdatascience
# Create a conda environment with the required packages for this tutorial:
conda env create -f environment.yml
# Activate your conda environment
conda activate intro-datascience
We strongly recommend you to open and execute the script located at the root of this repository to make sure you have the necessary packages installed:
python check_env.py