Skip to content

siddharthakhuntia-lohum/conf-summ-python-worker

Repository files navigation

conf-summ-python-node-server

This is a simple python server written in Flask to process incoming jobs from the lambda function. The server is responsible for taking a video as a job and generating a summary of the video. The server is also responsible for storing the genearted summary in the DynamoDB table along with some metadata.

API Reference

Health Check

Returns the health of the server

  GET /health

Response:

{
  "status": "ok"
}

Send Job

Send a job to the server to process the video and generate a summary

  POST /process_job
Parameter Type Description
videoURL string Required. URL of the video to process.
videoSource string Required. The source of video. Possible values youtube, aws-s3
title string The title of the video. Required only if the videoSource is aws-s3

Request:

{
  "videoURL": "https://www.youtube.com/watch?v=0qLXf31Cawk",
  "videoSource": "youtube",
}

Response:

{
  sent to process
}

Environment Variables

To run this project, you will need to add the following environment variables to your .env file

AWS_ACCESS_KEY_ID = 
AWS_SECRET_ACCESS_KEY = 
AWS_DYNAMODB_TABLE_NAME = 
AWS_REGION = 
AWS_S3_BUCKET_NAME = 
OPENAI_API_KEY = 
PORT = 
ENV = local

Run Locally

Clone the project

  git clone https://github.com/siddharthakhuntia-lohum/conf-summ-python-worker.git

Go to the project directory

  cd conf-summ-python-worker

Activate the virtual environment

  source conf-summ/bin/activate

Install dependencies

  pip install -r requirements.txt

Start the server

  flask --app application run

Deployment

The server is deployed on AWS Elastic Beanstalk. The deployment is done through pipeline using AWS Codepipeline. The deployment is triggered when a new commit is pushed to the main branch.

Roadmap

  • Add support for more video sources
  • Improve the summarization prompt
  • Optimize the summarization process to reduce the time taken to generate the summary
  • Increase the length of the summary
  • Use multimodal summarization techniques to generate a better summary

Flows

Basic Flow

sequenceDiagram
    Frontend->>Node Server: POST Request
    Node Server->>SQS: Send Job
    SQS->>Lambda Function: Triggered
    Lambda Function->>Python Server: Process Job
    Python Server->>DynamoDB: Store Summary

Flow in python server

flowchart TD
    A[Start] --> B[Receive Job]
    B --> C{Is Job Valid}
    C -->|Yes| D[Process Job]
    D --> E{Check Video Source}
    E -->|Youtube| F{Check if transcript exists}
    F -->|Yes| I[Fetch Transcript]
    F -->|No| G[Download Audio]
    G --> H[Transcribe Audio using Whisper]
    H --> I[Summarize Transcript]
    I --> J[Spilt Summary into Chunks]
    J --> K{Check if current_tokens < max_tokens}
    K -->|Yes| L[Use Stuff Summarizer]
    K -->|No| M[Use Map Reduce Summarizer]
    L --> N[Store Summary in DynamoDB]
    M --> N[Store Summary in DynamoDB]
    N --> O[Delete Downloaded files]
    O --> P[Success Callback]
    P --> R[End]
    E -->|AWS S3| G

    C -->|No| Q[Error Callback]
    Q --> R[End]
    


About

python server to process jobs for conference-summarizer

Resources

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •