Skip to content

svpino/twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

The Twitter Archive

Here is a compilation of the most relevant content I've posted on Twitter.

Mathematics

  • Math is hard, but you shouldn't be scared — I told everyone that I didn't care. "Screw math! I've never been great with it, so I'm not starting with machine learning to fail at the end." That was many years ago. Math is still hard, but I don't think you should be scared at all. Here is why.

  • Math resources for machine learning — If you are looking to get a background in math before starting with machine learning, here is all the material you need covering the following topics: Probabilities & Statistics, Linear Algebra, and Multivariate Calculus. More than enough to get started.

  • How much math do you really need? — How much math do you need to know to be a machine learning engineer? Let's talk about how Andrew Ng answers this question.

Probabilities and Statistics

  • An introduction to the basic principles of probabilities — If you want to become a better gambler, you need to learn probabilities. Let's talk about the basic principles of probabilities that you need to understand.

  • Probabilities in a continuous context — Imagine I tell you this: "The probability of a particular event happening is zero." Contrary to what you may think, this doesn't mean that this event is impossible. In other words, events with 0 probability could still happen! This seems contradictory. What's going on here?

  • De Méré's Paradox — Antoine was born in France back in 1607. Despite not being a nobleman, he called himself "Chevalier De Méré," and spent his days as any other writer and philosopher at the time. But the Chevalier liked gambling, and was obsessed with the probabilities surrounding the game.

  • An introduction to Bayes' Theorem — The doctor tested me, and I came back positive for a disease that infects 1 of every 1,000 people. The test comes back positive 99% of the time if the person has the disease. About 2% of uninfected patients also come back positive. Do I have the disease?

Algorithms

  • Greedy algorithms — One of the most useful things you can learn: Greedy algorithms, how they work, and how to solve problems using them. Here is why they are fundamental.

Feature engineering

  • An introduction to Label and One-Hot encoding — Encoding features from a dataset is a very common transformation that data scientists have to do before running machine learning algorithms. This is an explanation of Label and One-Hot encoding, how they work and how to use them.

  • Do you really need feature engineering? — I've heard multiple times that you don't need to do any feature engineering or selection whenever you are using neural networks. This is not true. Yes, neural networks can extract patterns and ignore unnecessary features from the dataset, but this is usually not enough.

Fundamentals

  • What's machine learning and why should you care? — What is machine learning and why you should care about it? Let me try to convince you.

  • Splitting your datasets — I'm sure you've heard that we split our datasets into different subsets right before running our machine learning algorithms. This is an explanation of why we do that, the goal of each subset, and how to use them.

  • Random splits aren't always a good solution — When we start with machine learning, we learn to split our datasets in testing and training by taking a percentage of the data. Unfortunately, this practice could lead to overestimating the performance of your model.

  • Don't look at your test set — When building a machine learning model, I want my test set to represent real world data as closely as possible. The best strategy I've found is to split the test set aside before I even look at the data. Here is why this helps.

  • Everything you need to know about the batch size — When using Gradient Descent, the batch size is one of the most consequential hyperparameters at our disposal. This is an explanation about the influence of the batch size.

  • Are you overfitting to the validation set? — You aren't doing yourself any favors if you aren't throwing away your validation data regularly. It's painful, I know, but you are looking for trouble if you don't do it. Let's talk about what happens with your data and your model.

  • Is your model overfitting or underfitting? — A quick mental model to help identify if your model is suffering from overfitting or underfitting.

  • Always split your dataset before transforming the data — You should always split your dataset before transforming the data. A valid concerned is about knowing the true range of a column without looking at all of your data. Let's explore this.

  • The importance of learning curves — Learning curves are a popular way to understand your data and your model. An underrated technique is to display the error of the model as we progressively increase the dataset size. This will allow us to determine whether we are overfitting or underfitting.

  • Metrics and imbalanced classification problems — I built a model to predict whether you'll be involved in a crash next time you get in a car. And it's 99% accurate! Allow me to show you.

  • Cheating with a data leakage — Can you identify the problem with this 3-step approach? 1. Prepare a dataset, 2. Split it (train, validation, and test sets), 3. Build a model. The issue is subtle, and unfortunately, many people build machine learning models this way. Let's talk about this.

  • Generalization and neural networks — It takes a single picture of an animal for my son to start recognizing it everywhere. Neural networks aren't as good as we are, but they are good enough to be competitive. This is an explanation of how neural networks generalize.

  • An intuitive look into convolutions — How the heck can a computer recognize what's in an image? This is an introduction about convolutions, a key part of how Convolutional Neural Networks work.

  • Using PCA to remove a picture's background — A really useful machine learning algorithm is PCA. You can do cool things with it, like completely removing the background of a picture (robust PCA).

  • An example of dimensionality reduction — This is a great example of dimensionality reduction using single value decomposition on a dataset of images. We can go from 64 dimensions down to 5 dimensions and still recognize the images!

  • Introduction to Dropout — A good way to understand how things work is by breaking them down step by step. We are going to do this here with Dropouts and get to the bottom of what happens when we use them.

  • Why do we use ReLU in deep learning? — A quick summary of the reasons we prefer to use ReLU as the activation function when using deep learning instead of Sigmoid or TanH.

  • 11 key Supervised Learning concepts — 11 key concepts of Machine Learning. Supervised Learning Edition.

  • The Curse of Dimensionality — The amount of data needed to extract any relevant information increases exponentially with the number of features in your dataset. This is the Curse of Dimensionality. In English: "More features is not necessarily a good thing." But of course, it's not that simple.

  • Sensitivity and Specificity - A short introduction to Sensitivity and Specificity.

Machine learning techniques

  • An overview of Active Learning — Imagine you have a ton of data, but most of it isn't labeled. Even worse: labeling is very expensive. How can we get past this problem? Let's talk about a different—and pretty cool—way to train a machine learning model.

  • An introduction to Siamese Networks — You want to build a face recognition system for your office, but getting many pictures from your coworkers is not a choice. Also, having to retrain the model for every new employee seems like a burden. How do we solve this?

  • An introduction to Transfer learning — The ability to reuse the knowledge of one model and adapt it to solve a different problem is one of the most consequential breakthroughs in machine learning. This is a thread explaining Transfer learning and how we can use it in practice.

  • How to implement Transfer Learning — With Transfer Learning you can reuse the knowledge from a different model to kick-start your new model. Practically, this is how you can do transfer learning.

  • An introduction to Autoencoders — A lot in machine learning is pretty dry and boring, but understanding how autoencoders work feels different. This is a thread about autoencoders, things they can do, and a pretty cool example.

  • Test-time augmentation — Here is a simple trick that improves the results of your models. Best part: You'll surprise your team. Guaranteed. Here is an introduction to Test-Time Augmentation, and how you can start using it today.

  • How do you deal with an imbalanced dataset? - 6 different techniques to solve one of the most popular Machine Learning problems (and a question that always comes up during interviews).

Building a machine learning career

Exercises and projects

  • 10 Computer Vision project ideas — Do you wanna start getting your hands dirty with machine learning and Computer Vision? Here you have 10 projects to start practicing and improve your portfolio.

  • 20 fundamental questions you need to ace — Here are 20 fundamental questions that you need to ace before getting a machine learning job. Almost every company will ask these to weed out non-prepared candidates. You don't want to show up unless you are comfortable having a discussion about all of these.

  • 25 True/False machine learning questions — 25 True/False machine learning questions that are horrible for interviews but pretty fun to answer. Most importantly: they will make you think and will keep your knowledge sharp. These are mostly beginner-friendly.

  • How to build a portfolio when looking for a job? — One issue I see with people applying for a job: They struggle to highlight their experience in an effective way. If you are trying to get a job as a Data Scientist or Machine Learning Engineer, here is something you can do.

Machine learning in the real world

  • Questions when building a machine learning project — There are 4 stages in the machine learning project lifecycle. Here are 29 questions that you should ask at each step of the process.

  • A classification problem that turns ugly quickly — Let's imagine a system where you submit a bunch of pictures of an object, and it recommends a price range in which the object could be sold. To identify the object, we could build a classification model. It turns out, however, that things can get complex really quick.

  • A day in the life of a Machine Learning Engineer — Do you want to know what a day in the life of a Machine Learning Engineer looks like? I'm not gonna make this boring and talk about biking, coffee, or hipster stuff. Instead, I'll list some of the things that you may find yourself doing.

  • A process to improve your data — It turns out that good data is hard to come by. Even datasets reviewed and used for years are riddled with mistakes that conspire against your work. Here are some tips to improve your data.

  • How much data do you need? — Software developers suck at estimating time. Machine learning engineers at estimating how much data they need. People built an image classification using 500 images and now think that every problem needs the same. It doesn't work like that.

  • Using pre-trained models to your advantage — A big part of my work is to build computer vision models to recognize things. It's usually ordinary stuff: An antenna, a fire extinguisher, a bag, a ladder. Here is a trick I use to solve some of these problems.

  • Baseline models — Before you start building a machine learning model, you need a baseline. I find it helpful to think about 3 different levels and tackle them in order. Here is how I do this.

  • Training with 100% of the data — Last week I trained a machine learning model using 100% of the data. Then I used the model to predict the labels on the same dataset I used to train it. I'm not kidding. Hear me out.

  • When the validation loss is lower than the training loss — I built a machine learning model, and my validation loss is lower than my training loss. People asked me why. We're used to seeing the opposite, so this is definitely suspicious. Is this really a problem?

  • The backbone of a machine learning system — There are a lot of moving pieces on a machine learning system. This covers the backbone of the process, from data engineering all the way to a retraining pipeline.

  • Machine learning pipelines — What's a machine learning pipeline? Well, it turns out that many different things classify as "machine learning pipelines." Here are five of the different "pipelines" you should be aware of.

  • On how decoding images messes up predictions — Here is the story of one of those hidden issues with machine learning models that books don't tell you about. This happened in real life.

  • The Stretch Pants approach — When designing a machine learning model, remember the "stretch pants" approach: Don't waste time looking for pants that perfectly match your size. Instead, use large stretch pants that will shrink down to the right size. What does this mean for your model?

Examples

  • Solving MNIST using a CNN — Explaining a solution line by line is always fun. This thread goes to an excruciating amount of detail through a Convolutional Neural Network that solves the MNIST problem. An explanation for every single line of code.

  • A few problems you can solve using KNN — The perfect way to get into machine learning is to find an algorithm that improves your work right away without much drama. For software developers, KNN (K-Nearest Neighbors) is a perfect introduction. Here are five different problems where KNN could help.

  • 5 cool machine learning projects I've been involved with — As a Machine Learning Engineer, I get to be involved in a lot of crazy cool projects. Here are 5 of them.

About

A collection of the most relevant content I've posted on Twitter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published