Day 1 & Day 2: Basic NLP and semantic analysis
Intellipaat: Natural Language Processing (NLP) Tutorial | NLP Training https://www.youtube.com/watch?v=KVxIx8f_VpM
ODSC: Understanding Unstructured Data with Language Models - Alex Peattie https://www.youtube.com/watch?time_continue=37&v=4fMwu7K3HmQ
Day 3: Topic modeling
https://github.com/atulsinghphd/NLP/blob/master/TopicModelingUsingLDA.ipynb
https://github.com/moorissa/nmf_nyt
https://www.analyticsvidhya.com/blog/2018/10/stepwise-guide-topic-modeling-latent-semantic-analysis/
https://towardsdatascience.com/topic-modeling-and-latent-dirichlet-allocation-in-python-9bf156893c24 https://github.com/susanli2016/NLP-with-Python/blob/master/LDA_news_headlines.ipynb
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333320/
Guided LDA : https://github.com/NThakur20/GuidedLDA
Bhargav Srinivasa Desikan - Topic Modelling with Gensim https://www.youtube.com/watch?v=KZkLmN1Bzok https://github.com/bhargavvader/personal/blob/master/notebooks/text_analysis_tutorial/topic_modelling_unrun.ipynb
Text Analysis https://github.com/bhargavvader/personal/blob/master/notebooks/text_analysis_tutorial/text_analysis_tutorial_unrun.ipynb
https://www.youtube.com/watch?v=ZkAFJwi-G98
https://www.youtube.com/watch?v=NYkbqzTlW3w
Evaluation
https://towardsdatascience.com/the-proper-way-to-use-machine-learning-metrics-4803247a2578
Day 4: Predict next word
https://github.com/seyedsaeidmasoumzadeh/Predict-next-word
Patrick Harrison: Modern NLP in Python | PyData DC 2016 ( 1:11:00) https://www.youtube.com/watch?v=6zm9NC9uRkk
https://towardsdatascience.com/building-a-next-word-predictor-in-tensorflow-e7e681d4f03f https://towardsdatascience.com/skip-gram-nlp-context-words-prediction-algorithm-5bbf34f84e0c
Day 5: Word Embeddings - word2Vec, Glove, NGram
Minsuk Heo : Word2Vec (introduce and tensorflow implementation) https://www.youtube.com/watch?v=64qSgA66P-8 https://github.com/minsuk-heo/python_tutorial/blob/master/data_science/nlp/word2vec_tensorflow.ipynb
https://skymind.ai/wiki/word2vec
https://datascience.stackexchange.com/questions/9785/predicting-a-word-using-word2vec-model
https://www.guru99.com/word-embedding-word2vec.html
https://github.com/tensorflow/docs/blob/master/site/en/tutorials/representation/word2vec.md
Unsupervised sentence representation with deep learning https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93
An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation https://github.com/jhlau/doc2vec#pre-trained-doc2vec-models https://arxiv.org/abs/1607.05368
GloVe is an unsupervised learning algorithm from standford for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
https://nlp.stanford.edu/projects/glove/
FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
Day 6: LSTM, Attention, Transformers
LSTM
https://github.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction
How to make a digital version of You! https://medium.com/datadriveninvestor/how-to-make-digital-version-of-you-2a29cf823e85
Chatbots are cool! A framework using Python https://towardsdatascience.com/chatbots-are-cool-a-framework-using-python-part-1-overview-7c69af7a7439
Day 7: Computer vision - CNN
Interpreting Deep Learning Models for Computer Vision
Day 8: Pretrained models - BERT, ElMo, ULMFit
BERT is a method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP.
BERT Explained: State of the art language model for NLP https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270 https://github.com/google-research/bert
https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/
Breaking BERT Down https://towardsdatascience.com/breaking-bert-down-430461f60efb
Some examples of applying BERT in specific domain https://towardsdatascience.com/how-to-apply-bert-in-scientific-domain-2d9db0480bd9
Embeddings from Language Models(ELMo for short)
This model is pre-trained with a self-supervising task called a bidirectional language model; they show that the representation from this model is powerful and improves the state-of-the-art performance on many tasks such as question-answer activities, natural language inference, semantic role labeling, coreference resolution, named-entity recognition, and sentiment analysis.
A Step-by-Step NLP Guide to Learn ELMo for Extracting Features from Text https://www.analyticsvidhya.com/blog/2019/03/learn-to-use-elmo-to-extract-features-from-text/
https://github.com/IreneZihuiLi/deeplearning/blob/master/ELMo_test.py
https://gluon-nlp.mxnet.io/examples/sentence_embedding/elmo_sentence_representation.html
Tutorial on Text Classification (NLP) using ULMFiT and fastai Library in Python https://www.analyticsvidhya.com/blog/2018/11/tutorial-text-classification-ulmfit-fastai-library/ https://github.com/navneetkrc/Colab_fastai/blob/master/ULMFiT_fastai_Text_Classification.ipynb
https://github.com/jannenev/ulmfit-language-model
https://humboldt-wi.github.io/blog/research/information_systems_1819/group4_ulmfit/
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) http://jalammar.github.io/illustrated-bert/
Day 9: Transfer learning
A neural network is trained on a data. This network gains knowledge from this data, which is compiled as “weights” of the network. These weights can be extracted and then transferred to any other neural network. Instead of training the other neural network from scratch, we “transfer” the learned features.
A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning
Transfer Learning using ELMO Embeddings https://towardsdatascience.com/transfer-learning-using-elmo-embedding-c4a7e415103c
Transfer Learning using Feature Extraction from Trained model: Food Images Classification https://appliedmachinelearning.blog/2019/07/29/transfer-learning-using-feature-extraction-from-trained-models-food-images-classification/
Day 10: Research paper
TODO
ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing https://www.researchgate.net/publication/331246329_ScispaCy_Fast_and_Robust_Models_for_Biomedical_Natural_Language_Processing
Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition https://arxiv.org/abs/1711.07908v3
VoxCeleb2: Deep Speaker Recognition https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf
Aditional references
A Friendly Introduction to Machine Learning https://www.youtube.com/watch?v=IpGxLWOIZy4
But what is a Neural Network? https://www.youtube.com/watch?v=aircAruvnKk&t=5s
Gradient descent, how neural networks learn https://www.youtube.com/watch?v=IHZwWFHWa-w&t=59s
What is backpropagation really doing? https://www.youtube.com/watch?v=Ilg3gGewQ5U
Activation Functions in Neural Networks (Sigmoid, ReLU, tanh, softmax) https://www.youtube.com/watch?v=9vB5nzrL4hY
Core Concepts of Deep Learning & Neural Networks https://www.analyticsvidhya.com/blog/2016/08/evolution-core-concepts-deep-learning-neural-networks/
Cost functions http://neuralnetworksanddeeplearning.com/chap3.html
Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) https://www.youtube.com/watch?v=WCUNPb-5EYI&t=176s
How convolutional neural networks work, in depth https://www.youtube.com/watch?v=JB8T_zN7ZC0&t=83s
TF-IDF Document Similarity using Cosine Similarity https://www.youtube.com/watch?v=hc3DCn8viWs
Text Similarities : Estimate the degree of similarity between two texts https://medium.com/@adriensieg/text-similarities-da019229c894
Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models https://explosion.ai/blog/deep-learning-formula-nlp
ImageNet: VGGNet, ResNet, Inception, and Xception with Keras
ImageNet is formally a project aimed at (manually) labeling and categorizing images into almost 22,000 separate object categories for the purpose of computer vision research.
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/ https://github.com/fchollet/deep-learning-models
Softmax Regression https://github.com/nikhilroxtomar/Introduction-to-TensorFlow/blob/master/06%20-%20%20Multinomial%20Logistic%20Regression%20%7C%20Softmax%20Regression.ipynb
https://www.youtube.com/watch?v=1e_EiTjypHU
Performing OCR by running parallel instances of Tesseract 4.0 : Python https://appliedmachinelearning.blog/2018/06/30/performing-ocr-by-running-parallel-instances-of-tesseract-4-0-python/
Training Deep Learning based Named Entity Recognition from Scratch : Disease Extraction Hackathon https://appliedmachinelearning.blog/2019/04/01/training-deep-learning-based-named-entity-recognition-from-scratch-disease-extraction-hackathon/
Art of Effective Visualization of Multi-dimensional Data by Dipanjan Sarkar https://www.youtube.com/watch?v=2yRl-DEu0g0&t=2651s https://github.com/dipanjanS/art_of_data_visualization
Doc Product: Medical Q&A with Deep Language Models https://github.com/re-search/DocProduct
Choosing one of many Python visualization tools https://blog.magrathealabs.com/choosing-one-of-many-python-visualization-tools-7eb36fa5855f
Building a Speaker Identification System from Scratch with Deep Learning
https://towardsdatascience.com/automatic-speaker-recognition-using-transfer-learning-6fab63e34e74
https://github.com/CorentinJ/Real-Time-Voice-Cloning
KGCNs: Machine Learning over Knowledge Graphs with TensorFlow
https://blog.grakn.ai/kgcns-machine-learning-over-knowledge-graphs-with-tensorflow-a1d3328b8f02
https://medium.com/octavian-ai/deep-learning-with-knowledge-graphs-3df0b469a61a
https://medium.com/tensorflow/introducing-neural-structured-learning-in-tensorflow-5a802efd7afd
https://github.com/Accenture/AmpliGraph
Reinforcement Learning
https://github.com/LuEE-C/Generative_NLP_RL_GAN
https://towardsdatascience.com/applications-of-reinforcement-learning-in-real-world-1a94955bcd12
https://www.youtube.com/watch?v=Jx_Twc75ka0&feature=youtu.be&t=368
GLOSSARY OF TERMS AND DEFINITIONS https://www.analyticsinsight.net/understanding-artificial-intelligence-a-comprehensive-glossary-of-terms-and-definitions/
CellStrat:Machine Learning Classification Algorithms
https://www.youtube.com/watch?v=pEEwqBQHD68
CellStrat:Introduction to Regression Techniques (Online Webinar)
https://www.youtube.com/watch?v=7Tb63nc3aoM
CellStrat:Types of Machine Learning 100918
https://www.youtube.com/watch?v=pZ36Gyh0EbY
Recursive Neural Tensor Networks (Online Webinar 230819)