Skip to content

pemagrg1/Natural-Language-Processing-NLP-using-Spacy

Repository files navigation

Created Date: 8 March 2019

Natural Langauge Processing (NLP) using Spacy

SpaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. The library is published under the MIT license. Today we’ll be talking about how to get started with NLP using Spacy. But before starting, make sure that you have Python and Spacy installed in your system.

To install Spacy and English Model:
sudo pip install spacy python -m spacy download en

In spacy, the object “nlp” is used to create documents, access linguistic annotations and different nlp properties. The default model which is english-core-web, for which we load the “en” model.

import spacy 
nlp = spacy.load(“en”)
  1. WORD TOKENIZE
    Tokenize words to get the tokens of the text i.e breaking the sentences into words.
  2. SENTENCE TOKENIZE
    Tokenize sentences if the there are more than 1 sentence i.e breaking the sentences to list of sentence.
  3. STOP WORDS REMOVAL
    Remove irrelevant words using nltk stop words like is,the,a etc from the sentences as they don’t carry any information.
  4. Lemma
    lemmatize the text so as to get its root form eg: functions,funtionality as function
  5. Get word frequency
    counting the word occurrence using FreqDist library. Word frequency helps us to determine how important the word is in the document by knowing how many times the word is being used.
  6. POS tags
    POS tag helps us to know the tags of each word like whether a word is noun, adjective etc.
  7. NER
    NER(Named Entity Recognition) is the process of getting the entity names

BLOG: https://medium.com/@pemagrg/nlp-for-beninners-using-spacy-6161cf48a229

Releases

No releases published

Packages

No packages published