📚 Data Science and Machine Learning are the fastest-growing fields in technology. This repository aims to develop professional and strong analytics skills for organizing, storing, and manipulating large amounts of data. 🏁
- AWS Certified Data Analytics - Specialty (DAS):
824GVEPCQFQQ1DS5
- AWS Certified Machine Learning - Specialty (MLS-C01):
- AWS Certified Database - Specialty (DBS-C01):
- Document as Code
npx create-docusaurus@latest docs classic --typescript
yarn add @docusaurus/theme-search-algolia tailwindcss postcss autoprefixer
📆 | Lessons / Tasks Done ⏰ | Reference Links 🔗 |
---|---|---|
🎓 AWS Certified Data Analytics - Specialty (DAS) (Collecting Streaming Data, Data Collection and Getting Data, Amazon Elastic Map Reduce (EMR), Using Redshift & Redshift Maintenance & Operations, AWS Glue, Athena, and QuickSight, ElasticSearch, AWS Security Services) ✅ | A Cloud Guru - DAS & ACG Practice Exam & UDemy Practice Exam | |
02 | 🎓 AWS Certified Machine Learning - Specialty (MLS-C01) (Data Preparation, Data Analysis and Visualization, Modeling, Algorithms, Evaluation and Optimization, Implementation and Operations) ☑️ | A Cloud Guru - MLS-C01 & ACG Practice Exam & UDemy Practice Exam |
03 | 🎓 AWS Certified Database - Specialty (DBS-C01) (Relational Database Service, Amazon Aurora / DynamoDB / DocumentDB / RedShift, Migrating Data to Databases, Monitoring & Optimization) ☑️ | A Cloud Guru - DBS-C01 & ACG Practice Exam & UDemy Practice Exam |
🛠 Reproducible Local Development for Data Science and Machine Learning projects | Data Science | |
05 | 👨💻 Python Project : Spotify Data Analysis using Python | Project |
06 | 📚 Statistics (Descriptive statistics - Mean, Median, Mode, Variance, & Standard deviation) | Statistics for Data Science with Python |
07 | Tableau Project : Sales Insights - Data Analysis using Tableau & SQL | Project |
08 | 🚀 Project : Data Analysis using Python | Project |
🛠 Production-grade project structure for successful data-science or machine-learning projects 🚀
├── Makefile <- Makefile with convenience commands like `make data` or `make train`
├── README.md 🤝 Explain your project and its structure for better collaboration.
├── config/
│ └── logging.config.ini
├── data 🔍 Where all your raw and processed data files are stored.
│ ├── external <- Data from third-party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, unprocessed, immutable data dump.
│
├── docs 📓 A default docusaurus | mkdocs project; see docusaurus.io | mkdocs.org for details
│
├── models 🧠 Store your trained and serialized models for easy access and versioning.
│
├── notebooks 💻 Jupyter notebooks for exploration and visualization.
│ ├── data_exploration.ipynb
│ ├── data_preprocessing.ipynb
│ ├── model_training.ipynb
│ └── model_evaluation.ipynb
│
├── pyproject.toml <- Project configuration file with package metadata for analytics
│ and configuration for tools like black
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports 📊 Generated analysis (reports, charts, and plots) as HTML, PDF, LaTeX.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt 🛠 The requirements file for reproducing the analysis environment, for easy environment setup.
│
├── setup.cfg <- Configuration file for flake8
│
├── src 💾 Source code for data processing, feature engineering, and model training.
│ ├── data/
│ │ └── data_preprocessing.py
│ ├── features/
│ │ └── feature_engineering.py
│ ├── models/
│ │ └── model.py
│ └── utils/
│ └── helper_functions.py
├── tests/
│ ├── test_data_preprocessing.py
│ ├── test_feature_engineering.py
│ └── test_model.py
├── setup.py 🛠 A Python script to make the project installable.
├── Dockerfile
├── docker-compose.yml
├── .gitignore
└── analytics 🧩 Source code for use in this project.
│
├── __init__.py <- Makes analytics a Python module
│
├── data <- Scripts to download, preprocess, or generate data
│ └── make_dataset.py
│
├── features <- Scripts to turn raw data into features for modeling
│ └── build_features.py
│
├── models <- Scripts to train models and then use trained models to make predictions.
│ ├── predict_model.py
│ └── train_model.py
│
└── visualization <- Scripts to create exploratory and results-oriented visualizations
└── visualize.py
- Datasets: Amazon Datasets & Kaggle Datasets
- DataHub
- KDNuggets & Towards Data Science & Kaggle Winner’s Blog
- Statistics: Simply Statistics
- TensorFlow & Keras
- Artificial Intelligence: DeepMind Blog
- Top algorithms that every data scientist should have in their toolbox:
- Linear regression
- Logistic regression
- Principal component analysis (PCA)
- Decision trees
- Random forests
- CART algorithm
- Naive Bayes
- KNN
- Support vector machines (SVM)
- K-means clustering
- Neural networks