Document-Summary-Generator

Document-Summary-Generator is a repository for the Science and Technology Council Hackathon 2023, IIT Kanpur. This repository contains our approach towards the problem statement Generative AI for Impact. We tried to implement a simple web page that can receive text from pdf,text,doc files, users or web pages and generate summary of the text using Longformer Encoder-Decoder (LED) model. The user can ask for different summary lengths. Moreover, we included a keyword search feature that enables users to ask questions about the text or generate summaries based on some keywords present in the text. The detailed documentation and user-guide can be found in the docs folder.

Installation

Requirements

Python 3.6 or higher
pip

Setup Project

Fork the repository.
Clone the repository. In your terminal, type:

git clone https://github.com/your-username/Document-Summary-Generator.git

Navigate to the repository directory : cd Document-Summary-Generator
Setup Python Virtual Environment : virtualenv venv
Activate the virtual environment. On Windows , venv\Scripts\activate.bat. On Linux/macOS , source venv/bin/activate.
Install the required modules : pip install -r requirements.txt
Run the app.py file and search http://127.0.0.1:8000 on the browser.

Project Structure

UPLOAD_FOLDER : All the files (pdf, word or text) that will be uploaded will get stored here. It contains two files as samples.
src : It contains the exception.py file for handling Custom Exceptions and the logger.py file which creates log files that helps us track our progress while the project is running and also stores the exceptions occured. The utils.py file contains two classes : the first to handle summarizing using LED Model and the second to generate tokens and embedding vectors of text using Huggingface BERT model.
src.components: The src folder has a subfolder names components, which stores two files : data_transformation.py which can separate out the text from pdf, text or word files as paragraphs and the web_scraping.py file which can extract the headers and paragraphs from websites.
Templates: The templates folder contains the HTML code for our frontend page.
The app.py is a Flask app that provides endpoints to the webapp to perform the different tasks : uploading of documents, taking url , keywords , number of characters as inputs, display summaries and extracted texts, etc.
The requirements.txt file contains all the Python modules required in the project.

Snapshot of the project

It can summarize Webpages, and PDF Files including research articles, for you!

The team members are :

Anwesh Saha (https://github.com/Anweshbyte)
Arindom Bora (https://github.com/AriBora)
Ajay Sankar Makkena (https://github.com/mas622424)
Khush Khandelwal (https://github.com/khandelwalkhush05)
Vineet Kumar (https://github.com/Vineet-the-git)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPLOAD_FOLDER

UPLOAD_FOLDER

docs

docs

src

src

templates

templates

.gitignore

.gitignore

README.md

README.md

app.py

app.py

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Document-Summary-Generator

Installation

Requirements

Setup Project

Project Structure

Snapshot of the project

The team members are :

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
UPLOAD_FOLDER		UPLOAD_FOLDER
docs		docs
src		src
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py

Attention-is-All-We-Need/Document-Summary-Generator

Folders and files

Latest commit

History

Repository files navigation

Document-Summary-Generator

Installation

Requirements

Setup Project

Project Structure

Snapshot of the project

The team members are :

About

Topics

Resources

Stars

Watchers

Forks

Languages