Skip to content

The log of my 60 days of Data Engineering challenge - to keep myself accountable

Notifications You must be signed in to change notification settings

Zahidul-Islam/60-days-of-data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

60 Days of Data Engineering

Inspired by #100DaysOfCode, I've decided to challenge myself into becoming a Data Engineer by studying and building Data/ML pipeline for 10-12 hours every day for the next 60 days. This started today 3rd of September and should be finished by 4th of November, 2019. My focus will be on ML/DL pipeline and Data Engineering tools around it such as KubeFlow, Apache Airflow, Apache Spark, Apache Kafka, and Tensorflow. I will document my progress on Github and update daily logs in LinkedIn.

Day 5: September 7, 2019

Today's Progress: Today was no a productive day. Only finish Week 1 content of Natural Language Processing with Tensorflow course.

Thoughts: I am excited and looking forward to start Insight Data Engineering Fellows Program on September 9th 2019.

Day 4: September 6, 2019

Today's Progress: Today I finished the Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning course on Coursera.

Thoughts: It was not a difficult course. However, it gave me a solid understanding of Tensorflow 2.0 API and Convolutional Neural Networks (ConvNets). Building some simple image classifiers were fun.

Useful Links:

👉 Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning https://www.coursera.org/learn/introduction-tensorflow

👉 Fashion MNIST with Keras and TPUs https://research.google.com/seedbank/seed/fashion_mnist_with_keras_and_tpus

👉 Understanding Convolutions https://colah.github.io/posts/2014-07-Understanding-Convolutions/

Day 3: September 5, 2019

Today's Progress: Today I started TensorFlow in Practice Specialization from deeplearning.ai. I am in week 4 of Introduction to TensorFlow for Artificial Intelligence course.

Thoughts: I like the way Al Advocate (Instructor) introduced Convolutional neural network by building a simple classifier using fashion mnist dataset and Tensorflow. TensorFlow in Practice Specialization is hands-on. Looking forward to learn more about TensorFlow.

Useful Links:

👉 TensorFlow in Practice Specialization https://www.coursera.org/specializations/tensorflow-in-practice

👉 Different Convolution Filters https://lodev.org/cgtutor/filtering.html

👉 Machine Learning Fairness https://developers.google.com/machine-learning/fairness-overview/

👉 Collection of Interactive Machine Learning Examples https://research.google.com/seedbank/

👉 Step-by-step Guide to Install TensorFlow 2 https://medium.com/@cran2367/install-and-setup-tensorflow-2-0-2c4914b9a265

Day 2: September 4, 2019

Today's Progress: I wrote a blog post on LinkedIn where I explained Apache Airflow core concepts.

Thoughts: There are so many interesting concepts in Airflow. It is an excellent tool for workflow orchestration. I want to spend more time on building custom Operator, Hook and data pipeline.

Link to work: Apache Airflow Core Concepts

Here are some useful links:

👉 A Definitive Compilation of Apache Airflow Resources - Aakash Pydi https://towardsdatascience.com/a-definitive-compilation-of-apache-airflow-resources-82bc4980c154

👉 DAG Writing Best Practices in Apache Airflow https://www.astronomer.io/guides/dag-best-practices/

👉 Automate AWS Tasks Thanks to Airflow Hooks - Arnaud https://blog.sicara.com/automate-aws-tasks-boto3-airflow-hooks-593c3120e8fc

👉 Getting started with Apache Airflow - Adnan Siddiqi https://towardsdatascience.com/getting-started-with-apache-airflow-df1aa77d7b1b

👉 Orchestration and DAG Design in Apache Airflow — Two Approaches https://medium.com/hashmapinc/orchestration-and-dag-design-in-apache-airflow-two-approaches-35edd3eaf7c0

👉 Apache Airflow Core Concepts - Zahidul Islam https://www.linkedin.com/pulse/apache-airflow-core-concepts-zahidul-islam/?trackingId=X3YNEn0IQHehblxk9G0Z7Q%3D%3D

Day 1: September 3, 2019

Today's Progress: Spent time learning about Apache Airflow. Airflow is a platform to programmatically author, schedule and monitor workflows. Link: https://airflow.apache.org/index.html

Thoughts: Very happy with my progress, and excited to start building a Dynamodb to BigQuery ETL pipeline using Airflow tomorrow.

There are so many excellent blogs on Airflow. Today I want to share some beginner-friendly resources:

👉 Airflow official documentation https://airflow.apache.org/index.html

👉 Apache Airflow for the confused - Jonathan Pichot https://medium.com/nyc-planning-digital/apache-airflow-for-the-confused-b588935669df

👉 Apache Airflow: Tutorial and Beginners Guide https://www.polidea.com/blog/apache-airflow-tutorial-and-beginners-guide/

👉 Apache Airflow on Docker for Complete Beginners https://medium.com/@itunpredictable/apache-airflow-on-docker-for-complete-beginners-cf76cf7b2c9a

👉 Understanding Apache Airflow’s key concepts https://medium.com/@dustinstansbury/understanding-apache-airflows-key-concepts-a96efed52b1a

👉 How to start automating your data pipelines with Airflow - Sriram Baskaran https://blog.insightdatascience.com/airflow-101-start-automating-your-batch-workflows-with-ease-8e7d35387f94

About

The log of my 60 days of Data Engineering challenge - to keep myself accountable

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published