Skip to content

theeluwin/lexrankr

Repository files navigation

lexrankr

Build Status Coverage Status PyPI version

Clustering based multi-document selective text summarization using LexRank algorithm.

This repository is a source code for the paper 설진석, 이상구. "lexrankr: LexRank 기반 한국어 다중 문서 요약." 한국정보과학회 학술발표논문집 (2016): 458-460.

  • Mostly designed for Korean, but not limited to.
  • Click here to see how to install KoNLPy properly.
  • Check out textrankr, which is a simpler summarizer using TextRank.

Installation

pip install lexrankr

Tokenizers

Tokenizers are not included. You have to implement one by yourself.

Example:

from typing import List

class MyTokenizer:
    def __call__(self, text: str) -> List[str]:
        tokens: List[str] = text.split()
        return tokens

한국어의 경우 KoNLPy를 사용하는 방법이 있습니다.

from typing import List
from konlpy.tag import Okt

class OktTokenizer:
    okt: Okt = Okt()

    def __call__(self, text: str) -> List[str]:
        tokens: List[str] = self.okt.pos(text, norm=True, stem=True, join=True)
        return tokens

Usage

from typing import List
from lexrankr import LexRank

# 1. init
mytokenizer: MyTokenizer = MyTokenizer()
lexrank: LexRank = LexRank(mytokenizer)

# 2. summarize (like, pre-computation)
lexrank.summarize(your_text_here)

# 3. probe (like, query-time)
summaries: List[str] = lexrank.probe()
for summary in summaries:
    print(summary)

Test

Use docker.

docker build -t lexrankr -f Dockerfile .
docker run --rm -it lexrankr