Skip to content

Successfully developed an encoder-decoder based sequence to sequence (Seq2Seq) model which can summarize the entire text of an Indian news summary into a short paragraph with limited number of words.

SayamAlt/Abstractive-Text-Summarization-of-News-Articles

Repository files navigation

Abstractive-Text-Summarization-of-News-Articles

Text Summarization Text Summarization Text Summarization

Overview

Significance of Text Summarization

There is a tremendous amount of text available, and it keeps expanding daily. Consider the internet, which has a wide variety of online pages, news stories, status updates, blogs, and more. The greatest thing we can do to traverse the unstructured data is use search and skim the results. Much of this text material needs to be condensed into concise summaries that highlight the key points so that we can traverse it more effectively and determine whether the longer papers actually contain the information we need.

We need automatic text summarization methods due to the following reasons:

  1. Summaries reduce reading time.
  2. When researching documents, summaries make the selection process easier.
  3. Automatic summarization improves the effectiveness of indexing.
  4. Automatic summarization algorithms are less biased than human summarizers.
  5. Personalized summaries are useful in question-answering systems as they provide personalized information.
  6. Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of texts they are able to process.

Text Summarization Use Cases

Text Summarization is used across a wide range of industries and applications.

These include:

  1. Creating chapters for YouTube videos or educational online courses via video editing platforms.
  2. Summarizing and sharing key parts of corporate meetings to reduce the need for mass attendance.
  3. Automatically identifying key parts of calls and flagging sections for follow-up via revenue intelligence platforms.
  4. Summarizing large analytical documents to ease readability and understanding.
  5. Segmenting podcasts and automatically providing a Table of Contents for listeners.

About Dataset

Context

I'm currently working on summarising chat context so that an agent can quickly comprehend earlier context. I'm curious to see how the deep learning models perform when applied to existing datasets. News articles have excellent grammar and vocabulary, which helps us understand things better.

Content

The dataset consists of 4515 examples and contains Author_name, Headlines, Url of Article, Short text, Complete Article. The summarized news articles were extracted from Inshorts and only scraped from various Indian news reporting journals such as Hindu, Indian times and the Guardian. Time period ranges from Febrauary to August 2017.

Acknowledgements

I would like to thank the authors of Inshorts for their fabulous work.

Inspiration

Generating short length descriptions(headlines) from text(news articles).
Summarizing large amount of information which can be represented in compressed space

Purpose

I didn't locate any open source data sets to work on when I was working on the summarising task, but I think there are others who are, and I hope this would be helpful to them.

About

Successfully developed an encoder-decoder based sequence to sequence (Seq2Seq) model which can summarize the entire text of an Indian news summary into a short paragraph with limited number of words.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published