Data Science, Machine Learning, and its Applications

Updating Notebooks

This documentation describe two different ways to start working remotely.

Running in the cloud through Google Colab + GitHub

This method doesn't require any other package or program installed in your computer than just your favourite browser.

First open Google Colaboratory through this link https://colab.research.google.com/ . You'll see that it opens the following window.
Click on the GitHub tab and paste the url of this repository https://github.com/benjaminocampo/DataCuration.git on the text blank it shows. Then click on the search button to check which notebooks are saved in this repository.
You'll see a list of all the notebooks that are in the repository. Click on the one you want to update.
Then you can start running the notebook!

Running locally through Conda + Jupyter + VirtualEnv + Git

Conda

First you need to get installed conda which is a python package manager. We recommend its minimal version miniconda. You can check out its installation guide here:

Installing on Linux: https://conda.io/projects/conda/en/latest/user-guide/install/linux.html
Installing on Windows: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html

Jupyter

Once you've installed conda, it can be used to install jupyterlab. Open a terminal window or Anaconda Prompt and run:

conda install -c conda-forge jupyterlab

You can also start with the classic jupyter notebook with:

conda install -c conda-forge notebook

(Note: If you're using Windows, you can use the Windows Powersheell or Anaconda Prompt. Since command line usage varies in Windows and Linux, here's a list of the most important ones if you're unfamiliar with them: https://www.thomas-krenn.com/en/wiki/Cmd_commands_under_Windows).

Git

Installing git will depend on your operative system:

Installing on Linux (on Debian-based distributions):
```
sudo apt install git-all
```
Installing on Windows: Download the .exe from its main page on https://git-scm.com/download/win . Execute it, and follow the steps.

In order to check out that the installation proceeded correctly, open a terminal or Anaconda Prompt and run:

git --version

If it shows your current git version, you can move on with the next step!

Cloning this repository

After installing git, you can clone this repository to have a local version of it. Open a terminal in the directory you want to save it and run:

git clone https://github.com/benjaminocampo/DataCuration.git

Setting up a Virtual Environment

In order to install the packages that are necessary to run the notebooks, we recommend creating a virtual environment, so they won't be installed in your global system.

Check the environment.yml file that lists the dependencies needed to run correctly the notebooks in this repository.

name: diplodatos-datacuration
channels:
  - conda-forge
dependencies:
  - numpy
  - pandas
  - matplotlib
  - statsmodels
  - seaborn=0.11
  - missingno
  - scikit-learn
  - geopandas
  - requests

That means that the environment to create has the name diplodatos-datacuration and the dependencies are seaborn=0.11 and the newest versions of numpy, pandas, matplotlib, statsmodels, missigno, scikit-learn, geopandas, y requests.

The steps to create a virtual environment with these dependencies are the following:

Open your terminal or Anaconda Prompt in the directory you have cloned the repository.
Create the environment from the environment.yml file with:
```
conda env create -f environment.yml
```
(Note: This step might take some time).
Activate the environment in order to have available the dependencies with:
```
conda activate diplodatos-datacuration
```
The active environment is also displayed in front of your prompt in (parentheses) or [brackets] like this:
```
(diplodatos-datacuration)$
```
If you don't have installed ipykernel on your system, run:
```
conda install -c anaconda ipykernel
```
Then, add the active environment to jupyter so it's recognized as a new kernel:
```
ipython kernel install --user --name=diplodatos-datacuration
```
Run jupyter lab or jupyter notebook with:
```
jupyter lab
```
or
```
jupyter notebook
```
The previous step should have opened a tab on your browser with the application. Open the jupyter notebook you're working on.
Be sure that jupyter is using the kernel you have just set by choosing Kernel -> Change Kernel:
You're ready to do science!

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
notebooks		notebooks
readme_images		readme_images
slides		slides
.gitignore		.gitignore
README.md		README.md
docs.pdf		docs.pdf
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks

notebooks

readme_images

readme_images

slides

slides

.gitignore

.gitignore

README.md

README.md

docs.pdf

docs.pdf

environment.yml

environment.yml

Repository files navigation

Data Science, Machine Learning, and its Applications

Updating Notebooks

Running in the cloud through Google Colab + GitHub

Running locally through Conda + Jupyter + VirtualEnv + Git

Conda

Jupyter

Git

Cloning this repository

Setting up a Virtual Environment

About

Releases

Packages

Contributors 4

Languages

benjaminocampo/DataCuration

Folders and files

Latest commit

History

Repository files navigation

Data Science, Machine Learning, and its Applications

Updating Notebooks

Running in the cloud through Google Colab + GitHub

Running locally through Conda + Jupyter + VirtualEnv + Git

Conda

Jupyter

Git

Cloning this repository

Setting up a Virtual Environment

About

Topics

Resources

Stars

Watchers

Forks

Languages