Skip to content

dvillalobos/MSDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to my Git Page

by Duubar Villalobos Jimenez

Data Scientist

dvillalobos.github.io

Linkedin Youtube

Repository: MSDS

MSDS stand for: Master's in Data Science.

Last Update: 07/05/2019


Getting Started


  • The work that is presented in this repository, represent solutions to diverse assignments and projects during my masters for all classes in general.

  • Many of these assignments were created using diverse software; some of my preferred tools are:

Cloning this repository

These instructions will get you a copy for all the projects ready to run on your local machine for development and testing purposes. See individual deployment notes for each project in order to run on your system.

$ git clone https://github.com/dvillalobos/MSDS.git

Prerequisites


In this section, I will briefly mention the software needed and how to install them.



  • My preferred Operating System for the whole masters was Lubuntu.

  • My approach was to create an "Ultimate Operating System" that require little memory with an ability to expand and handle most of the requirements for the assignments and projects related to Data Science.

  • In order to facilitate replication of the installation process; I wrote a script that performs a Data Science Server automatic installation from scratch. If you are interested, you can find my simple guide here.

  • A great advantage of having this setup, is that I was able to have my personal home server at home, to which I connected remotely using SSH (Secure Shell) in order to do my assignments; thus, saving me some money and resources from any given cloud offering such as Amazon AWS, Google Cloud, Microsoft Azure, IBM Cloud or even DigitalOcean just to mention a few.

  • A common disadvantage for this approach, is that I had "limited" resources, and could not scale up in case needed; but for my diverse assignments I did not have a need to scale up.

My current specs for my Home Server are:

  • 64 GB Memory.
  • 8 Cores Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz.
  • 1 TB Solid State Drive.
  • 4 TB Hard Disk Drive.
  • GPU: FirePro V5700 by ATI (limited capability).

Then, I connected remotely from a Chromebook with 4GB of memory.



  • My number one choice for Data Science projects is R. In my particular case, I did perform an installation of RStudio Server; thus, allowing me to connect using any popular web browser (Firefox or Chrome). In my personal experience, I find R more suitable for Data Science projects due to the strong statistical capabilities and extensive set of libraries available, alongside the Open Source community which is great!

  • Some of the tools that I have employed in R are: Plot.ly, Shiny and tidy just to mention a few. The list is long and some projects include Random Forest, XGBost and Neural Networks for predictive analysis for example...


  • My second choice for Data Science special projects is Python 3. In my particular case, I did install Spyder and Atom with Hydrogen for local programming use, the beauty of Python, is that I was able to use a text editor in my Chromebook connected through SSH and was able to run the code from terminal and exploring the results in a web browser. With Python 3, I've been able to build beautiful dashboards and websites that facilitate the communication with diverse tools such as JavaScript, Html and Database management with NoSQL for example.

  • With Python 3, I was able to create custom containers built in Docker using diverse Linux systems, which then, I uploaded to the cloud and ran them in Amazon AWS EC2 Cloud. Some examples run Flask and JavaScript for website interaction with with geospatial data manipulation employing Dash and MongoDB alongside plot.ly.


  • Sometimes, it's easier just to write a small presentation using a notebook; with Jupyter Notebooks I was able to employ diverse kernels from one simple location.

  • The beauty of writing your code in a Jupyter notebook, is that you can upload them to GitHub, and then create and share outstanding results employing the nbviewer online tool. I highly recommend this approach too.



  • Other tools that I've setup in my Home Server are: MongoDB (noSql database framework), MariaDB (sql database framework) and Neo4j (graph database); thus, in order to practice and experiment with calculations in systems were the data is present or non-present.

Credits


  • Credit the author: If you're planing on learning Data Science, you can feel free to take my code as reference, but make sure you do your part. I've invested considerable amount of time searching, thinking and creating and crediting all of these solutions, and I would request for you to add the reference in case some of my solutions inspire you to create further solutions.


Icons made by Flaticon. Flaticon is licensed by Creative Commons BY 3.0.


About

Master in Data Science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published