conf-summ-python-node-server

This is a simple python server written in Flask to process incoming jobs from the lambda function. The server is responsible for taking a video as a job and generating a summary of the video. The server is also responsible for storing the genearted summary in the DynamoDB table along with some metadata.

API Reference

Health Check

Returns the health of the server

  GET /health

Response:

{
  "status": "ok"
}

Send Job

Send a job to the server to process the video and generate a summary

  POST /process_job

Parameter	Type	Description
`videoURL`	`string`	Required. URL of the video to process.
`videoSource`	`string`	Required. The source of video. Possible values `youtube`, `aws-s3`
`title`	`string`	The title of the video. Required only if the `videoSource` is `aws-s3`

Request:

{
  "videoURL": "https://www.youtube.com/watch?v=0qLXf31Cawk",
  "videoSource": "youtube",
}

Response:

{
  sent to process
}

Environment Variables

To run this project, you will need to add the following environment variables to your .env file

AWS_ACCESS_KEY_ID = 
AWS_SECRET_ACCESS_KEY = 
AWS_DYNAMODB_TABLE_NAME = 
AWS_REGION = 
AWS_S3_BUCKET_NAME = 
OPENAI_API_KEY = 
PORT = 
ENV = local

Run Locally

Clone the project

  git clone https://github.com/siddharthakhuntia-lohum/conf-summ-python-worker.git

Go to the project directory

  cd conf-summ-python-worker

Activate the virtual environment

  source conf-summ/bin/activate

Install dependencies

  pip install -r requirements.txt

Start the server

  flask --app application run

Deployment

The server is deployed on AWS Elastic Beanstalk. The deployment is done through pipeline using AWS Codepipeline. The deployment is triggered when a new commit is pushed to the main branch.

Roadmap

Add support for more video sources
Improve the summarization prompt
Optimize the summarization process to reduce the time taken to generate the summary
Increase the length of the summary
Use multimodal summarization techniques to generate a better summary

Flows

Basic Flow

sequenceDiagram
    Frontend->>Node Server: POST Request
    Node Server->>SQS: Send Job
    SQS->>Lambda Function: Triggered
    Lambda Function->>Python Server: Process Job
    Python Server->>DynamoDB: Store Summary

Flow in python server

flowchart TD
    A[Start] --> B[Receive Job]
    B --> C{Is Job Valid}
    C -->|Yes| D[Process Job]
    D --> E{Check Video Source}
    E -->|Youtube| F{Check if transcript exists}
    F -->|Yes| I[Fetch Transcript]
    F -->|No| G[Download Audio]
    G --> H[Transcribe Audio using Whisper]
    H --> I[Summarize Transcript]
    I --> J[Spilt Summary into Chunks]
    J --> K{Check if current_tokens < max_tokens}
    K -->|Yes| L[Use Stuff Summarizer]
    K -->|No| M[Use Map Reduce Summarizer]
    L --> N[Store Summary in DynamoDB]
    M --> N[Store Summary in DynamoDB]
    N --> O[Delete Downloaded files]
    O --> P[Success Callback]
    P --> R[End]
    E -->|AWS S3| G

    C -->|No| Q[Error Callback]
    Q --> R[End]

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.ebextensions		.ebextensions
aws_clients		aws_clients
conf-summ		conf-summ
node_modules		node_modules
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
application.py		application.py
package-lock.json		package-lock.json
package.json		package.json
process_job.py		process_job.py
requirements.txt		requirements.txt
summarizer.py		summarizer.py

siddharthakhuntia-lohum/conf-summ-python-worker

Folders and files

Latest commit

History

Repository files navigation

conf-summ-python-node-server

API Reference

Health Check

Send Job

Environment Variables

Run Locally

Deployment

Roadmap

Flows

Basic Flow

Flow in python server

About

Resources

Stars

Watchers

Forks

Languages