Skip to content

iAmKankan/Natural-Language-Processing-NLP-Tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 

Repository files navigation

Index

deep

NLP(Natural language processing)

NLP and Probability

Natural Language Processing (NLP)

deep

Natural language processing is a subset of Artificial intelligence that helps computers to understand, interpret, and utilize the human languages.

  • NLP allows computers to communicate with peoples using human languages.
  • NLP also provides computers with the ability to read text, hear speech, and try to intrepret it.
  • NLP draws several disciplines, including Computational linguistics and computer science, as this attempts to fill the gap in between human and computer communication.
  • NLP breaks down language into shorter, more basic pieces, called tokens(period, words, etc), and attempts to understand the relationships of tokens.
  • This approach often uses higher-level NLP features, such as:
    • Sentiment analysis: It identifies the general mood, or subjective opinions, which is stored in large amount of texts, It is more useful for opinion mining.
    • Contextual Extraction: Extract structured data from text-base sources.
    • Text-to-Speech and Speech-to-text: It transforms the voice into text and vice a versa.
    • Document Summarization: Automatically creates a synopsis, condensing large amounts of text.
    • Machine Translation: Translates the text or speech of one language into another language.

Typical NLP tasks

light

${\color{Purple}\textrm{Information Retrieval}}$ ${\color{Purple}\textrm{Find documents based on keywords}}$
${\color{Purple}\textrm{Information Extraction}}$ ${\color{Purple}\textrm{ Identify and extract personal name, date, company name, city..}}$
${\color{Purple}\textrm{Language generation}}$ ${\color{Purple}\textrm{ Description based on a photograph, Title for a photograph}}$
${\color{Purple}\textrm{Text classification}}$ ${\color{Purple}\textrm{ Assigning predefined categorization to documents. Identify Spam emails and move them to a Spam folder}}$
${\color{Purple}\textrm{Machine Translation}}$ ${\color{Purple}\textrm{ Translate any language Text to another}}$
${\color{Purple}\textrm{Grammar checkers}}$ ${\color{Purple}\textrm{ Check the grammar for any language}}$

Why learn NLP?

light

Some example like they are built with the use of NLP :

  1. Spell Correction(MS Word/any other editor)
  2. Search engines(Google, Bing, Yahoo)
  3. Speech engines(like Siri, Google assistant)
  4. Spam classifiers(All e-mails services)
  5. News feeds(Google, Yahoo!, and so on)
  6. Machine Translation(Google translation)
  7. IBM Watson

Some NLP tools

Most of the tools are written in Java and have similar functionalities. GATE, Mallet, Open NLP, UIMA, Standford toolkit, Gensim, NLTK(Natural Language Tool kit).

Why it is Hard?

light

  • Multiple ways of representation of the same scenario
  • Includes common sense and contextual representation
  • Complex representation information (simple to hard vocabulary)
  • Mixing of visual cues
  • Ambiguous in nature
  • Idioms, metaphors, sarcasm (Yeah! right), double negatives, etc. make it difficult for automatic processing
  • Human language interpretation depends on real world, common sense, and contextual knowledge

NLP Roadmap

light

Main Approaches of NLP

light

Natural Language Processing and Deep Learning

light

  • The developments in the field of deep learning that have led to massive increases in performance in NLP.
  • Before deep learning, the main techniques used in NLP were the bag-of-words model and techniques like TF-IDF, Naive Bays and Support Vector Machine(SVM).
  • In fact, this is a quick, robust, simple system in today standerd.

  • Nowadays, in advanced areas of NLP, we use techniques like Hidden Markov Models to do things like
    • speech recognition and
    • parts of speech tagging.

Problem with Bag of Words

  • Consider the phrases - dog toy and toy dog
  • These are different things, but in a Bag of Words Model ordered does not matter, and so these would be treated the same.

Solution

  • Neural Network- Modeling sentences as sequences and as hierarchy(LSTM) has led to state of the art improvements over previous go to techniques.
  • Word Embeddings- These give words a neural representation so that words can be plugged into a Neural Network just like any other feature vector.

Sequence Learning

plum Sequence learning is the study of machine learning algorithms designed for applications that require sequential data or temporal data.

RECURRENT NEURAL NETWORK

  • Sequential data prediction is considered as a key problem in machine learning and artificial intelligence
  • Unlike images where we look at the entire image, we read text documents sequentially to understand the content.
  • The likelihood of any sentence can be determined from everyday use of language.
  • The earlier sequence of words (int time) is important to predict the next word, sentence, paragraph or chapter.
  • If a word occurs twice in a sentence, but could not be accommodated in the sliding window, then the word is learned twice
  • An architecture that does not impose a fixed-length limit on the prior context

RNN-Language Model-Encoding a sentence into a fixed sized vector-Exploding and vanishing gradients-LSTM-GRU

References

deep