Skip to content

luisfredgs/Awesome-machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 

Repository files navigation

Awesome Machine Learning

This repository is a compilation of various sources of knowledge related to Data Science and Machine Learning. The list includes curated videos, blog posts, textbooks, GitHub repositories, and more. If you find this repository helpful, kindly consider giving it a star! Your support would aid me in improving and maintaining this project. Thank you for your time! 🌟🌟🌟🌟🌟

If you want to contribute to this list (please do), send me a pull request or contact me.

High-quality FREE courses on Youtube

NLP and Large Language Models

Visual Data Science & ML

Useful Datasets for NLP Research (e.g., text classification and sentiment analysis)

Tutorials on Computer Vision & NLP

Personal Blogs

  • βœ…Distill.pub - A modern medium for presenting research that showcases AI/ML concepts in clear, dynamic and vivid form
  • βœ…Christopher Olah's Blog - A machine learning researcher who likes to understand things clearly, and explain them well
  • βœ…Jay Alammar - Visualizing machine learning one concept at a time

Blog Posts

Top Machine Learning Books πŸ“š

A growing curated list of machine learning books.

Github Repositories

Neural Net Drawing Libraries

Transformers and NLP

  • βœ…State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
  • πŸ“ŒBERTopic - A topic modeling technique that employs transformers and c-TF-IDF technique to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions
  • βœ…LSA-Text-Summarization - This code implements the summarization of text documents using Latent Semantic Analysis
  • βœ…extractive-text-summarization - Extractive text summarization based on word frequencies and spacy
  • βœ…SimCSE - Simple Contrastive Learning of Sentence Embeddings
  • βœ…Koan - A word2vec negative sampling implementation with correct CBOW update
  • βœ…Apache OpenNLP - a machine learning based toolkit for the processing of natural language text
  • βœ…sense2vec - Contextually-keyed word vectors
  • βœ…Mega - Moving Average Equipped Gated Attention. Mega is a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism.

Graph AI (Hot Topic πŸ”₯)

Graph AI, which leverages machine learning methods to learn patterns on graph-structured data, has been a hot research topic. Graphs are a kind of data structure that models a set of objects (nodes) and their relationships (edges). The power of graph formalism lies both in its focus on relationships between points as well as in its generality. Recently, research in the graph domain with machine learning has received more and more attention because of the great expressive power of graphs. As a powerful non-Euclidean data structure for machine learning, graph draws attention to analyses that focus on node classification, link prediction, and clustering.

  • ❀️ Pytorch Geometric - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)
  • ❀️ DGL library - An easy-to-use, high performance and scalable Python package for deep learning on graphs
  • ❀️ PyGOD - a Python library for graph outlier detection (anomaly detection)
  • ❀️ Graph-MLPMixer A Generalization of ViT/MLP-Mixer to Graphs
  • ❀️ StellarGraph A Python library for machine learning on graphs and networks which offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data
  • πŸ‘οΈ cuGraph - Represents a collection of packages focused on GPU-accelerated graph analytics
  • βœ… igraph - a fast and open-source C library to manipulate and analyze graphs with interfaces in Python, R and C++
  • βœ…Karate Club - an unsupervised machine learning extension library for NetworkX. Karate Club consists of state-of-the-art methods to do unsupervised learning on graph-structured data. According to the authors, it is a Swiss Army knife for small-scale graph mining research.

Machine Learning in Rust πŸ¦€

The Rust ML landscape is still young and better described as experimental. Nevertheless, Rust's performance, flexibility, and unique approach to abstractions make it a promising language for building backends for Machine Learning, which is nowadays dominated by C/C++.

  • βœ…huggingface/tokenizers - The core of tokenizers, written in Rust with a focus on performance and versatility
  • βœ…DimaKudosh/word2vec - Rust interface to word2vec
  • πŸ“ŒLinfa - Provide a comprehensive toolkit to build Machine Learning applications with Rust in spirit to Python's scikit-learn
  • πŸ”₯Burn - This library aims to be a comprehensive deep-learning framework in Rust that offers exceptional flexibility for both researchers and practitioners

Machine Learning in C++ πŸ’ͺ

A faster run time is essential in machine learning, which explains why C++ is suitable for machine learning and large-scale AI applications. Nowadays, C++ is powering most machine learning engines.

  • ❀️ mlpack - an intuitive, fast, and flexible header-only C++ machine learning library with bindings to other languages
  • βœ… ensmallen - a high-quality C++ library for non-linear numerical optimization, which provides many types of optimizers that can be used for virtually any numerical optimization task
  • βœ…Armadillo - C++ library for linear algebra & scientific computing. Useful for algorithm development directly in C++ or quick conversion of research code into production environments
  • πŸ“ŒNumCpp - A Templatized Header Only C++ Implementation of the Python NumPy Library
  • βœ…DLib - C++ toolkit containing machine learning algorithms used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments
  • βœ…Caffe - A deep learning framework developed with cleanliness, readability, and speed in mind
  • βœ…DyNet - A dynamic neural network library working well with networks that have dynamic structures that change for every training instance. Written in C++ with bindings in Python

High Performance Dataframes

  • βœ…cuDF - Built based on the Apache Arrow columnar memory format, and with a pandas-like API that will be familiar to data engineers, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
  • βœ…Polars: Blazingly fast DataFrames in Rust - Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as the memory model.

Statistical packages in Python

  • πŸ“Œstatannotations - Python package to optionally compute statistical test and add statistical annotations on plots generated with seaborn
  • ❀️statsmodels - Python package that provides a complement to scipy for statistical computations, and allows users to explore data, estimate statistical models, perform statistical tests, and descriptive statistics

Time series πŸ“ˆ

  • βœ…sktime - a library for time series analysis in Python. It provides a unified interface for multiple time series learning tasks
  • πŸ“ŒDarts - a Python library for user-friendly forecasting and anomaly detection on time series
  • βœ…Prophet - A procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects.

Causal Inference methods

  • βœ…CausalML - a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms.
  • βœ…BiomedSciAI/causallib - Enables estimating the causal effect of an intervention on some outcome from real-world non-experimental observational data

Deep learning on Tabular Data

  • βœ… SAINT - Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training.
  • βœ… ARM-Net - Adaptive Relation Modeling Network for Structured Data.
  • βœ… TabTransformer - A implementation in Keras of TabTansformer, an attention network for tabular data.

Research Papers

Graph AI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published