GitHub - iAmKankan/Natural-Language-Processing-NLP-Tutorial: NLP tutorials and guidelines to learn efficiently

Index

NLP(Natural language processing)

What is NLP?
Why we learn NLP?
General NLP tasks
Why NLP is so hard
Main NLP Approaches
NLP Roadmap

Text Preprocessing Level 1- Stopwords, Tokens, Stemming, Lammetization

Text Preprocessing Level 2- Bag Of Words, TFIDF, Unigrams, Bigram

Text Preprocessing Level 3- Word Embeddings(Word2vec, One-hot, The Skip-Gram Mode, CBOW, GLOVE)

NLP and Probability

Why we need Probability in NLP?

References

Natural Language Processing (NLP)

Natural language processing is a subset of Artificial intelligence that helps computers to understand, interpret, and utilize the human languages.

NLP allows computers to communicate with peoples using human languages.
NLP also provides computers with the ability to read text, hear speech, and try to intrepret it.
NLP draws several disciplines, including Computational linguistics and computer science, as this attempts to fill the gap in between human and computer communication.
NLP breaks down language into shorter, more basic pieces, called tokens(period, words, etc), and attempts to understand the relationships of tokens.
This approach often uses higher-level NLP features, such as:
- Sentiment analysis: It identifies the general mood, or subjective opinions, which is stored in large amount of texts, It is more useful for opinion mining.
- Contextual Extraction: Extract structured data from text-base sources.
- Text-to-Speech and Speech-to-text: It transforms the voice into text and vice a versa.
- Document Summarization: Automatically creates a synopsis, condensing large amounts of text.
- Machine Translation: Translates the text or speech of one language into another language.

Typical NLP tasks

${\color{Purple}\textrm{Information Retrieval}}$	${\color{Purple}\textrm{Find documents based on keywords}}$
${\color{Purple}\textrm{Information Extraction}}$	${\color{Purple}\textrm{ Identify and extract personal name, date, company name, city..}}$
${\color{Purple}\textrm{Language generation}}$	${\color{Purple}\textrm{ Description based on a photograph, Title for a photograph}}$
${\color{Purple}\textrm{Text classification}}$	${\color{Purple}\textrm{ Assigning predefined categorization to documents. Identify Spam emails and move them to a Spam folder}}$
${\color{Purple}\textrm{Machine Translation}}$	${\color{Purple}\textrm{ Translate any language Text to another}}$
${\color{Purple}\textrm{Grammar checkers}}$	${\color{Purple}\textrm{ Check the grammar for any language}}$

Why learn NLP?

Some example like they are built with the use of NLP :

Spell Correction(MS Word/any other editor)
Search engines(Google, Bing, Yahoo)
Speech engines(like Siri, Google assistant)
Spam classifiers(All e-mails services)
News feeds(Google, Yahoo!, and so on)
Machine Translation(Google translation)
IBM Watson

Some NLP tools

Most of the tools are written in Java and have similar functionalities. GATE, Mallet, Open NLP, UIMA, Standford toolkit, Gensim, NLTK(Natural Language Tool kit).

Why it is Hard?

Multiple ways of representation of the same scenario
Includes common sense and contextual representation
Complex representation information (simple to hard vocabulary)
Mixing of visual cues
Ambiguous in nature
Idioms, metaphors, sarcasm (Yeah! right), double negatives, etc. make it difficult for automatic processing
Human language interpretation depends on real world, common sense, and contextual knowledge

NLP Roadmap

Main Approaches of NLP

Natural Language Processing and Deep Learning

The developments in the field of deep learning that have led to massive increases in performance in NLP.
Before deep learning, the main techniques used in NLP were the bag-of-words model and techniques like TF-IDF, Naive Bays and Support Vector Machine(SVM).
In fact, this is a quick, robust, simple system in today standerd.

Nowadays, in advanced areas of NLP, we use techniques like Hidden Markov Models to do things like
- speech recognition and
- parts of speech tagging.

Problem with Bag of Words

Consider the phrases - dog toy and toy dog
These are different things, but in a Bag of Words Model ordered does not matter, and so these would be treated the same.

Solution

Neural Network- Modeling sentences as sequences and as hierarchy(LSTM) has led to state of the art improvements over previous go to techniques.
Word Embeddings- These give words a neural representation so that words can be plugged into a Neural Network just like any other feature vector.

Sequence Learning

Sequence learning is the study of machine learning algorithms designed for applications that require sequential data or temporal data.

RECURRENT NEURAL NETWORK

Sequential data prediction is considered as a key problem in machine learning and artificial intelligence
Unlike images where we look at the entire image, we read text documents sequentially to understand the content.
The likelihood of any sentence can be determined from everyday use of language.
The earlier sequence of words (int time) is important to predict the next word, sentence, paragraph or chapter.
If a word occurs twice in a sentence, but could not be accommodated in the sliding window, then the word is learned twice
An architecture that does not impose a fixed-length limit on the prior context

RNN-Language Model-Encoding a sentence into a fixed sized vector-Exploding and vanishing gradients-LSTM-GRU

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
README.md		README.md
commonTerms.md		commonTerms.md
word_embedding.md		word_embedding.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

commonTerms.md

commonTerms.md

word_embedding.md

word_embedding.md

Repository files navigation

Index

NLP(Natural language processing)

Text Preprocessing Level 1- Stopwords, Tokens, Stemming, Lammetization

Text Preprocessing Level 2- Bag Of Words, TFIDF, Unigrams, Bigram

Text Preprocessing Level 3- Word Embeddings(Word2vec, One-hot, The Skip-Gram Mode, CBOW, GLOVE)

NLP and Probability

References

Natural Language Processing (NLP)

Natural language processing is a subset of Artificial intelligence that helps computers to understand, interpret, and utilize the human languages.

Typical NLP tasks

Why learn NLP?

Some example like they are built with the use of NLP :

Some NLP tools

Why it is Hard?

NLP Roadmap

Main Approaches of NLP

Natural Language Processing and Deep Learning

Problem with Bag of Words

Solution

Sequence Learning

RECURRENT NEURAL NETWORK

References

About

iAmKankan/Natural-Language-Processing-NLP-Tutorial

Folders and files

Latest commit

History

Repository files navigation

Index

NLP(Natural language processing)

NLP and Probability

Natural Language Processing (NLP)

Natural language processing is a subset of Artificial intelligence that helps computers to understand, interpret, and utilize the human languages.

Typical NLP tasks

Why learn NLP?

Some example like they are built with the use of NLP :

Some NLP tools

Why it is Hard?

NLP Roadmap

Main Approaches of NLP

Natural Language Processing and Deep Learning

Problem with Bag of Words

Solution

Sequence Learning

RECURRENT NEURAL NETWORK

References

About

Topics

Resources

Stars

Watchers

Forks