Skip to content

hyunbool/Text-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Text-Segmentation

Text Segmentation 관련 논문 정리

Text Segmentation

title summary
TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages (1997)](https://www.aclweb.org/anthology/J97-1003.pdf)
A HIDDEN MARKOV MODEL APPROACH TO TEXT SEGMENTATION AND EVENT TRACKING(1998)
Statistical Models for Text Segmentation(1999)
Advances in Domain Independent Linear Text Segmentation(2000) - C99 알고리즘
Latent Semantic Analysis for Text Segmentation(2001) - LSA 사용
A Statistical Model for Domain-Independent Text Segmentation(2001)
Minimum Cut Model for Spoken Lecture Segmentation(2006)
Bayesian Unsupervised Topic Segmentation(2008)
Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion(2009)
Linear Text Segmentation using Affinity Propagation(2001)
TopicTiling: A Text Segmentation Algorithm based on LDA(2012)
Domain-Independent Unsupervised Text Segmentation for Data Management(2014)
Text Segmentation based on Semantic Word Embeddings(2015)
Unsupervised Text Segmentation Using Semantic Relatedness Graphs(2016)
합성곱 신경망을 이용한 On-Line 주제 분리(2016)
Text Segmentation as a Supervised Learning Task(2018) - text segmentation 위한 wiki dataset 만듦
- 기존에 unsupervised, probalistic하게 해결하던 task를 supervised하게 해결
Attention-based Neural Text Segmentation(2018)
Scientific Literature Summarization Using Document Structure and Hierarchical Attention Model(2019)
SECTOR: A Neural Model for Coherent Topic Segmentation and Classification(2019)
LANGUAGE MODEL PRE-TRAINING FOR HIERARCHICAL DOCUMENT REPRESENTATIONS(2019) - text segmentation으로 실험 진행
BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification(2019)
BTS: 한국어 BERT를 사용한 텍스트 세그멘테이션(2019)
Context-Aware Latent Dirichlet Allocation for Topic Segmentation(2020)
Chapter Captor: Text Segmentation in Novels(2020) 1. 구텐버그 프로젝트에 포함된 소설을 이용해 text segmentation 데이터셋 구축
2. Local Method:
* Weighted Overlap Cut(WOC): unsupervised, 각 챕터 내 빈번히 등장하는 단어가 다를것이라는 점에서 착안, 두 문장을 비교해 단어의 밀집도(overlap하는 경우)가 최소화 되는 곳을 Break point로 둠
* BERT for Break Prediction (BBP): supervised, 두 문장을 비교해 두 문장이 연속적인지(같은 챕터인지) 아니면 연속적이지 않은지(break point)를 분류 문제로 계산
3. Global Method using Optimization: segment의 길이를 일정하게 만드는 것이 좋은 segmentation 결과를 보여줌
* 동적 프로그래밍 기법을 사용해 recursive하게 해결
Books of Hours: the First Liturgical Corpus for Text Segmentation(2020)
A Joint Model for Document Segmentation and Segment Labeling(2020)
Discourse as a Function of Event: Profiling Discourse Structure in News Articles around the Main Event(2020)
Improving BERT with Focal Loss for Paragraph Segmentation of Novels(2020)
Topical Change Detection in Documents via Embeddings of Long Sequences(2020)
Text Segmentation by Cross Segment Attention(2020)

Topic Modeling

title summary
Latent Dirichlet Allocation(2002) LDA가 처음 소개된 논문
A Hybrid Neural Network-Latent Topic Model(2012)
Modelling Sequential Text with an Adaptive Topic Model(2012)
Learning from LDA using Deep Neural Networks(2015)
Mixing Dirichlet Topic Models andWord Embeddings to Make lda2vec(2016)
Contextual-LDA: A Context Coherent Latent Topic Model for Mining Large Corpora(2016)
Recurrent Attentional Topic Model(2017)
Discovering Discrete Latent Topics with Neural Variational Inference(2017)
A Detailed Survey on Topic Modeling for Document and Short Text Data(2019)
감정 딥러닝 필터를 활용한 토픽 모델링 방법론(2019)

Applications

title summary
A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization(2020)

Releases

No releases published

Packages

No packages published