GitHub - BALaka-18/rake_new2: A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

ABOUT THIS PROJECT

rake_new2

rake_new2 is a Python library that enables simple and fast keyword extraction from any text. This library helps beginners or those lost while finding keywords, understand which keywords are more important.

HOW IS THIS DIFFERENT FROM ANY OTHER ALGORITHM ? : This library gives you weights/scores along with each keyword/keyphrase. This helps you pick out the correct key-phrases. Just choose the ones with more weights.

New in version 1.0.5

Handles repetitive keywords/key-phrases
Handles consecutive punctuations.
Handles HTML tags in text : The user is allowed an option to choose if they want to keep HTML tags as keywords too.

Installation

Use the package manager pip to install rake_new2.

pip install rake_new2

Quick Start

from rake_new2 import Rake

text = "Red apples are good in taste."
text2 = "<h1> Hello world !</h1>"
rk,rk_new1,rk_new2 = Rake(),Rake(keep_html_tags=True),Rake(keep_html_tags=False)

# Case 1
# Initialize
rk.get_keywords_from_raw_text(text)
kw_s = rk.get_keywords_with_scores()
# Returns keywords with degree scores : {(1.0, 'taste'), (1.0, 'good'), (4.0, 'red apples')}
kw = rk.get_ranked_keywords()
# Returns keywords only : ['red apples', 'taste', 'good']
f = rk.get_word_freq()
# Returns word frequencies as a Counter object : {'red': 1, 'apples': 1, 'good': 1, 'taste': 1}
deg = rk.get_kw_degree()
# Returns word degrees as defaultdict object : {'red': 2.0, 'apples': 2.0, 'good': 1.0, 'taste': 1.0}

# Case 2 : Sample case for testing the 'keep_html_tags' parameter. Default = False
print("\nORIGINAL TEXT : {}".format(text))
# Sub Case 1 : Keeping the HTMLtags
rk_new1.get_keywords_from_raw_text(text2)
kw_s1 = rk_new1.get_keywords_with_scores()
kw1 = rk_new1.get_ranked_keywords()
print("Keeping the tags : ",kw1)

# Sub Case 2 : Eliminating the HTML tags
rk_new2.get_keywords_from_raw_text(text2)
kw_s2 = rk_new2.get_keywords_with_scores()
kw2 = rk_new2.get_ranked_keywords()
print("Eliminating the tags : ",kw2)

'''OUTPUT >>
ORIGINAL TEXT : <h1> Hello world !</h1>
Keeping the tags :  {'h1', 'hello'}
Eliminating the tags :  {'hello world'}
'''

Debugging

You might come across a stopwords error.

It implies that you do not have the stopwords corpus downloaded from NLTK.

To download it, use the command below.

python -c "import nltk; nltk.download('stopwords')"

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Contributors

Student Name	GitHub ID	Merged PR No.	Open source programme name	If DWOC, level of PR
Sabarish Rajamohan	sabarish98	#16	Hacktoberfest	--
Soham Kar	2bit-hack	#20	Hacktoberfest	--
Jawen Voon	jawsvk	#26	Hacktoberfest	--
Ananthakrishnan Nair RS	akrish4	#47	DWOC	Level-1
Tushar Nankani	tusharnankani	#43	DWOC	Level-3

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github		.github
build/lib/rake_new2		build/lib/rake_new2
dist		dist
rake_new2.egg-info		rake_new2.egg-info
rake_new2		rake_new2
resources		resources
tests		tests
tfidf_vectorizer		tfidf_vectorizer
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
requirements.txt		requirements.txt
setup.py		setup.py

License

BALaka-18/rake_new2

Folders and files

Latest commit

History

Repository files navigation

ABOUT THIS PROJECT

New in version 1.0.5

Installation

Quick Start

Debugging

Contributing

License

Contributors

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Sponsor this project

Languages