Skip to content

sreejukomath/NLPProjects

Repository files navigation

10 Days NLP Challenge

Day 1 & Day 2: Basic NLP and semantic analysis

Intellipaat: Natural Language Processing (NLP) Tutorial | NLP Training https://www.youtube.com/watch?v=KVxIx8f_VpM

ODSC: Understanding Unstructured Data with Language Models - Alex Peattie https://www.youtube.com/watch?time_continue=37&v=4fMwu7K3HmQ

Day 3: Topic modeling

https://github.com/atulsinghphd/NLP/blob/master/TopicModelingUsingLDA.ipynb

https://github.com/moorissa/nmf_nyt

https://www.analyticsvidhya.com/blog/2018/10/stepwise-guide-topic-modeling-latent-semantic-analysis/

https://towardsdatascience.com/topic-modeling-and-latent-dirichlet-allocation-in-python-9bf156893c24 https://github.com/susanli2016/NLP-with-Python/blob/master/LDA_news_headlines.ipynb

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333320/

Guided LDA : https://github.com/NThakur20/GuidedLDA

Bhargav Srinivasa Desikan - Topic Modelling with Gensim https://www.youtube.com/watch?v=KZkLmN1Bzok https://github.com/bhargavvader/personal/blob/master/notebooks/text_analysis_tutorial/topic_modelling_unrun.ipynb

Text Analysis https://github.com/bhargavvader/personal/blob/master/notebooks/text_analysis_tutorial/text_analysis_tutorial_unrun.ipynb

https://www.youtube.com/watch?v=ZkAFJwi-G98

https://www.youtube.com/watch?v=NYkbqzTlW3w

Evaluation

https://towardsdatascience.com/metrics-for-evaluating-machine-learning-classification-models-python-example-59b905e079a5

https://towardsdatascience.com/the-proper-way-to-use-machine-learning-metrics-4803247a2578

Day 4: Predict next word

https://github.com/seyedsaeidmasoumzadeh/Predict-next-word

https://chunml.github.io/ChunML.github.io/project/Creating-Text-Generator-Using-Recurrent-Neural-Network/

Patrick Harrison: Modern NLP in Python | PyData DC 2016 ( 1:11:00) https://www.youtube.com/watch?v=6zm9NC9uRkk

https://towardsdatascience.com/building-a-next-word-predictor-in-tensorflow-e7e681d4f03f https://towardsdatascience.com/skip-gram-nlp-context-words-prediction-algorithm-5bbf34f84e0c

Day 5: Word Embeddings - word2Vec, Glove, NGram

Minsuk Heo : Word2Vec (introduce and tensorflow implementation) https://www.youtube.com/watch?v=64qSgA66P-8 https://github.com/minsuk-heo/python_tutorial/blob/master/data_science/nlp/word2vec_tensorflow.ipynb

https://skymind.ai/wiki/word2vec

https://datascience.stackexchange.com/questions/9785/predicting-a-word-using-word2vec-model

https://www.guru99.com/word-embedding-word2vec.html

https://github.com/tensorflow/docs/blob/master/site/en/tutorials/representation/word2vec.md

Unsupervised sentence representation with deep learning https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation https://github.com/jhlau/doc2vec#pre-trained-doc2vec-models https://arxiv.org/abs/1607.05368

GloVe is an unsupervised learning algorithm from standford for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

https://nlp.stanford.edu/projects/glove/

FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.

https://fasttext.cc/

Day 6: LSTM, Attention, Transformers

LSTM

https://github.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction

How to make a digital version of You! https://medium.com/datadriveninvestor/how-to-make-digital-version-of-you-2a29cf823e85

Chatbots are cool! A framework using Python https://towardsdatascience.com/chatbots-are-cool-a-framework-using-python-part-1-overview-7c69af7a7439

Day 7: Computer vision - CNN

https://ahmedbesbes.com/understanding-deep-convolutional-neural-networks-with-a-practical-use-case-in-tensorflow-and-keras.html

https://ahmedbesbes.com/automate-the-diagnosis-of-knee-injuries-with-deep-learning-part-1-an-overview-of-the-mrnet-dataset.html

https://ahmedbesbes.com/automate-the-diagnosis-of-knee-injuries-with-deep-learning-part-2-building-an-acl-tear-classifier.html

https://ahmedbesbes.com/automate-the-diagnosis-of-knee-injuries-with-deep-learning-part-3-interpret-models-predictions.html

Interpreting Deep Learning Models for Computer Vision

https://medium.com/google-developer-experts/interpreting-deep-learning-models-for-computer-vision-f95683e23c1d

Day 8: Pretrained models - BERT, ElMo, ULMFit

BERT is a method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP.

BERT Explained: State of the art language model for NLP https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270 https://github.com/google-research/bert

https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/

Breaking BERT Down https://towardsdatascience.com/breaking-bert-down-430461f60efb

Some examples of applying BERT in specific domain https://towardsdatascience.com/how-to-apply-bert-in-scientific-domain-2d9db0480bd9

Embeddings from Language Models(ELMo for short)

This model is pre-trained with a self-supervising task called a bidirectional language model; they show that the representation from this model is powerful and improves the state-of-the-art performance on many tasks such as question-answer activities, natural language inference, semantic role labeling, coreference resolution, named-entity recognition, and sentiment analysis.

A Step-by-Step NLP Guide to Learn ELMo for Extracting Features from Text https://www.analyticsvidhya.com/blog/2019/03/learn-to-use-elmo-to-extract-features-from-text/

https://github.com/IreneZihuiLi/deeplearning/blob/master/ELMo_test.py

https://github.com/keitakurita/Practical_NLP_in_PyTorch/blob/master/allennlp/elmo_text_classification.ipynb

https://gluon-nlp.mxnet.io/examples/sentence_embedding/elmo_sentence_representation.html

Tutorial on Text Classification (NLP) using ULMFiT and fastai Library in Python https://www.analyticsvidhya.com/blog/2018/11/tutorial-text-classification-ulmfit-fastai-library/ https://github.com/navneetkrc/Colab_fastai/blob/master/ULMFiT_fastai_Text_Classification.ipynb

https://github.com/jannenev/ulmfit-language-model

https://humboldt-wi.github.io/blog/research/information_systems_1819/group4_ulmfit/

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) http://jalammar.github.io/illustrated-bert/

Day 9: Transfer learning

A neural network is trained on a data. This network gains knowledge from this data, which is compiled as “weights” of the network. These weights can be extracted and then transferred to any other neural network. Instead of training the other neural network from scratch, we “transfer” the learned features.

A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning

https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a

Transfer Learning using ELMO Embeddings https://towardsdatascience.com/transfer-learning-using-elmo-embedding-c4a7e415103c

Transfer Learning using Feature Extraction from Trained model: Food Images Classification https://appliedmachinelearning.blog/2019/07/29/transfer-learning-using-feature-extraction-from-trained-models-food-images-classification/

Day 10: Research paper

TODO

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing https://www.researchgate.net/publication/331246329_ScispaCy_Fast_and_Robust_Models_for_Biomedical_Natural_Language_Processing

Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition https://arxiv.org/abs/1711.07908v3

VoxCeleb2: Deep Speaker Recognition https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf

Aditional references

A Friendly Introduction to Machine Learning https://www.youtube.com/watch?v=IpGxLWOIZy4

But what is a Neural Network? https://www.youtube.com/watch?v=aircAruvnKk&t=5s

Gradient descent, how neural networks learn https://www.youtube.com/watch?v=IHZwWFHWa-w&t=59s

What is backpropagation really doing? https://www.youtube.com/watch?v=Ilg3gGewQ5U

Activation Functions in Neural Networks (Sigmoid, ReLU, tanh, softmax) https://www.youtube.com/watch?v=9vB5nzrL4hY

Core Concepts of Deep Learning & Neural Networks https://www.analyticsvidhya.com/blog/2016/08/evolution-core-concepts-deep-learning-neural-networks/

Cost functions http://neuralnetworksanddeeplearning.com/chap3.html

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) https://www.youtube.com/watch?v=WCUNPb-5EYI&t=176s

How convolutional neural networks work, in depth https://www.youtube.com/watch?v=JB8T_zN7ZC0&t=83s

TF-IDF Document Similarity using Cosine Similarity https://www.youtube.com/watch?v=hc3DCn8viWs

Text Similarities : Estimate the degree of similarity between two texts https://medium.com/@adriensieg/text-similarities-da019229c894

Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models https://explosion.ai/blog/deep-learning-formula-nlp

ImageNet: VGGNet, ResNet, Inception, and Xception with Keras

ImageNet is formally a project aimed at (manually) labeling and categorizing images into almost 22,000 separate object categories for the purpose of computer vision research.

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/ https://github.com/fchollet/deep-learning-models

Softmax Regression https://github.com/nikhilroxtomar/Introduction-to-TensorFlow/blob/master/06%20-%20%20Multinomial%20Logistic%20Regression%20%7C%20Softmax%20Regression.ipynb

https://www.youtube.com/watch?v=1e_EiTjypHU

Performing OCR by running parallel instances of Tesseract 4.0 : Python https://appliedmachinelearning.blog/2018/06/30/performing-ocr-by-running-parallel-instances-of-tesseract-4-0-python/

Training Deep Learning based Named Entity Recognition from Scratch : Disease Extraction Hackathon https://appliedmachinelearning.blog/2019/04/01/training-deep-learning-based-named-entity-recognition-from-scratch-disease-extraction-hackathon/

Art of Effective Visualization of Multi-dimensional Data by Dipanjan Sarkar https://www.youtube.com/watch?v=2yRl-DEu0g0&t=2651s https://github.com/dipanjanS/art_of_data_visualization

Doc Product: Medical Q&A with Deep Language Models https://github.com/re-search/DocProduct

Choosing one of many Python visualization tools https://blog.magrathealabs.com/choosing-one-of-many-python-visualization-tools-7eb36fa5855f

Building a Speaker Identification System from Scratch with Deep Learning

https://medium.com/analytics-vidhya/building-a-speaker-identification-system-from-scratch-with-deep-learning-f4c4aa558a56

https://towardsdatascience.com/automatic-speaker-recognition-using-transfer-learning-6fab63e34e74

https://github.com/CorentinJ/Real-Time-Voice-Cloning

KGCNs: Machine Learning over Knowledge Graphs with TensorFlow

https://blog.grakn.ai/kgcns-machine-learning-over-knowledge-graphs-with-tensorflow-a1d3328b8f02

https://medium.com/octavian-ai/deep-learning-with-knowledge-graphs-3df0b469a61a

https://medium.com/tensorflow/introducing-neural-structured-learning-in-tensorflow-5a802efd7afd

https://github.com/Accenture/AmpliGraph

Reinforcement Learning

https://github.com/LuEE-C/Generative_NLP_RL_GAN

https://towardsdatascience.com/applications-of-reinforcement-learning-in-real-world-1a94955bcd12

https://www.youtube.com/watch?v=Jx_Twc75ka0&feature=youtu.be&t=368

GLOSSARY OF TERMS AND DEFINITIONS https://www.analyticsinsight.net/understanding-artificial-intelligence-a-comprehensive-glossary-of-terms-and-definitions/

CellStrat:Machine Learning Classification Algorithms

https://www.youtube.com/watch?v=pEEwqBQHD68

CellStrat:Introduction to Regression Techniques (Online Webinar)

https://www.youtube.com/watch?v=7Tb63nc3aoM

CellStrat:Types of Machine Learning 100918

https://www.youtube.com/watch?v=pZ36Gyh0EbY

Recursive Neural Tensor Networks (Online Webinar 230819)

https://www.youtube.com/watch?v=k-L2Q6F1CVI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages