Skip to content

cybergeekgyan/Data-Science-Portfolio

Repository files navigation

Data Science

datascienceroadmap

License Contribute

Resources

A list including topics from beginners to advanced level.

A recommended track to follow but not necessary is:

  • Data Analyst
  • Data Science
  • Machine Learning
  • Deep Learning
  • Data Engineering

Then, you can proceed to any of this topics below if you know what you want to concentrate in:

  • Natural Language Processing
  • Computer Vision
  • Reinforcement Learning
  • Recommendation Systems

DATA SCIENCE TOPICS

--- Deep Learning Tools ---

  • Tensorflow and Pytorch are the two most popular open-source libraries for Deep Learning.

    • TensorFlow was developed by Google and is used in their speech recognition system, in the new google photos product, gmail, google search and much more.
    • Companies using Tensorflow include AirBnb, Airbus, Ebay, Intel, Uber and dozens more.
  • PyTorch is as just as powerful and is being developed by researchers at Nvidia and leading universities: Stanford, Oxford, ParisTech.

    • Companies using PyTorch include Twitter, Saleforce and Facebook.

So which is better and for what?

  • The interesting thing is that both these libraries are barely over 1 year old.
  • We will use the most cutting edge Deep Learning models and techniques.

--- More Tools ---

  • Theano is another open source deep learning library. It's very similar to Tensorflow in its functionality, but nevertheless we will still cover it.

  • Keras is an incredible library to implement Deep Learning models. It acts as a wrapper for Theano and Tensorflow.

  • Scikit-learn the most practical Machine Learning library. We will mainly use it:

    • to evaluate the performance of our models with the most relevant technique, k-Fold Cross Validation
    • to improve our models with effective Parameter Tuning
    • to preprocess our data, so that our models can learn in the best conditions
  • We will use :-

    • Numpy to do high computations and manipulate high dimensional arrays,
    • Matplotlib to plot insightful charts and
    • Pandas to import and manipulate datasets the most efficiently.

Repository

This Repository containing portfolio of data science projects completed for academic, self learning, and professional purposes. Presented in the form of Jupyter Notebooks:-

Tools

  • Python: NumPy, Pandas, Seaborn, Matplotlib
  • Machine Learning: scikit-learn, TensorFlow, keras