Natural Langauge Processing (NLP) using Spacy

Created Date: 8 March 2019

Natural Langauge Processing (NLP) using Spacy

SpaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. The library is published under the MIT license. Today we’ll be talking about how to get started with NLP using Spacy. But before starting, make sure that you have Python and Spacy installed in your system.

To install Spacy and English Model:
sudo pip install spacy python -m spacy download en

In spacy, the object “nlp” is used to create documents, access linguistic annotations and different nlp properties. The default model which is english-core-web, for which we load the “en” model.

import spacy 
nlp = spacy.load(“en”)

WORD TOKENIZE
Tokenize words to get the tokens of the text i.e breaking the sentences into words.
SENTENCE TOKENIZE
Tokenize sentences if the there are more than 1 sentence i.e breaking the sentences to list of sentence.
STOP WORDS REMOVAL
Remove irrelevant words using nltk stop words like is,the,a etc from the sentences as they don’t carry any information.
Lemma
lemmatize the text so as to get its root form eg: functions,funtionality as function
Get word frequency
counting the word occurrence using FreqDist library. Word frequency helps us to determine how important the word is in the document by knowing how many times the word is being used.
POS tags
POS tag helps us to know the tags of each word like whether a word is noun, adjective etc.
NER
NER(Named Entity Recognition) is the process of getting the entity names

BLOG: https://medium.com/@pemagrg/nlp-for-beninners-using-spacy-6161cf48a229

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
JapaneseNLP using Spacy		JapaneseNLP using Spacy
learning-notebooks		learning-notebooks
.gitignore		.gitignore
01_WordTokenize.py		01_WordTokenize.py
02_SentenceTokenize.py		02_SentenceTokenize.py
03_RemoveStopwords.py		03_RemoveStopwords.py
04_Lemmatize.py		04_Lemmatize.py
05_WordFrequency.py		05_WordFrequency.py
06_PartOfSpeechTagging.py		06_PartOfSpeechTagging.py
07_NER(NamedEntityRecognition).py		07_NER(NamedEntityRecognition).py
08_WordVectors.py		08_WordVectors.py
09_WordSimilarity.py		09_WordSimilarity.py
10_SentenceSimilarity.py		10_SentenceSimilarity.py
11_SentimentAnalysis.py		11_SentimentAnalysis.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JapaneseNLP using Spacy

JapaneseNLP using Spacy

learning-notebooks

learning-notebooks

.gitignore

.gitignore

01_WordTokenize.py

01_WordTokenize.py

02_SentenceTokenize.py

02_SentenceTokenize.py

03_RemoveStopwords.py

03_RemoveStopwords.py

04_Lemmatize.py

04_Lemmatize.py

05_WordFrequency.py

05_WordFrequency.py

06_PartOfSpeechTagging.py

06_PartOfSpeechTagging.py

07_NER(NamedEntityRecognition).py

07_NER(NamedEntityRecognition).py

08_WordVectors.py

08_WordVectors.py

09_WordSimilarity.py

09_WordSimilarity.py

10_SentenceSimilarity.py

10_SentenceSimilarity.py

11_SentimentAnalysis.py

11_SentimentAnalysis.py

README.md

README.md

Repository files navigation

Natural Langauge Processing (NLP) using Spacy

About

Releases

Packages

Languages

pemagrg1/Natural-Language-Processing-NLP-using-Spacy

Folders and files

Latest commit

History

Repository files navigation

Natural Langauge Processing (NLP) using Spacy

About

Topics

Resources

Stars

Watchers

Forks

Languages