Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: how could I extract a specific number of keywords instead of sentence? #180

Open
Archkik opened this issue Jul 21, 2022 · 2 comments
Labels

Comments

@Archkik
Copy link

Archkik commented Jul 21, 2022

how could I extract a specific number of keywords instead of sentence with python API?

@miso-belica
Copy link
Owner

miso-belica commented Jul 21, 2022

You can pick from the summary anything you want by providing custom function. The function gets collection if SentenceInfo objects.

# -*- coding: utf-8 -*-

from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


LANGUAGE = "english"


def pick_sentences(infos: list[SentenceInfo]):
	# your algorithm here
	return [] # any SentenceInfo objects you want to pick


if __name__ == "__main__":
    url = "https://en.wikipedia.org/wiki/Automatic_summarization"
    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
    stemmer = Stemmer(LANGUAGE)

    summarizer = Summarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)

    for sentence in summarizer(parser.document, pick_sentences):
        print(sentence)

@miso-belica
Copy link
Owner

@Archkik does this work for your use-case? Is your issue different somehow? Can you describe what you are trying to achieve then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants