This is a simple python server written in Flask to process incoming jobs from the lambda function. The server is responsible for taking a video as a job and generating a summary of the video. The server is also responsible for storing the genearted summary in the DynamoDB table along with some metadata.
Returns the health of the server
GET /health
Response:
{
"status": "ok"
}
Send a job to the server to process the video and generate a summary
POST /process_job
Parameter | Type | Description |
---|---|---|
videoURL |
string |
Required. URL of the video to process. |
videoSource |
string |
Required. The source of video. Possible values youtube , aws-s3 |
title |
string |
The title of the video. Required only if the videoSource is aws-s3 |
Request:
{
"videoURL": "https://www.youtube.com/watch?v=0qLXf31Cawk",
"videoSource": "youtube",
}
Response:
{
sent to process
}
To run this project, you will need to add the following environment variables to your .env file
AWS_ACCESS_KEY_ID =
AWS_SECRET_ACCESS_KEY =
AWS_DYNAMODB_TABLE_NAME =
AWS_REGION =
AWS_S3_BUCKET_NAME =
OPENAI_API_KEY =
PORT =
ENV = local
Clone the project
git clone https://github.com/siddharthakhuntia-lohum/conf-summ-python-worker.git
Go to the project directory
cd conf-summ-python-worker
Activate the virtual environment
source conf-summ/bin/activate
Install dependencies
pip install -r requirements.txt
Start the server
flask --app application run
The server is deployed on AWS Elastic Beanstalk. The deployment is done through pipeline using AWS Codepipeline. The deployment is triggered when a new commit is pushed to the main branch.
- Add support for more video sources
- Improve the summarization prompt
- Optimize the summarization process to reduce the time taken to generate the summary
- Increase the length of the summary
- Use multimodal summarization techniques to generate a better summary
sequenceDiagram
Frontend->>Node Server: POST Request
Node Server->>SQS: Send Job
SQS->>Lambda Function: Triggered
Lambda Function->>Python Server: Process Job
Python Server->>DynamoDB: Store Summary
flowchart TD
A[Start] --> B[Receive Job]
B --> C{Is Job Valid}
C -->|Yes| D[Process Job]
D --> E{Check Video Source}
E -->|Youtube| F{Check if transcript exists}
F -->|Yes| I[Fetch Transcript]
F -->|No| G[Download Audio]
G --> H[Transcribe Audio using Whisper]
H --> I[Summarize Transcript]
I --> J[Spilt Summary into Chunks]
J --> K{Check if current_tokens < max_tokens}
K -->|Yes| L[Use Stuff Summarizer]
K -->|No| M[Use Map Reduce Summarizer]
L --> N[Store Summary in DynamoDB]
M --> N[Store Summary in DynamoDB]
N --> O[Delete Downloaded files]
O --> P[Success Callback]
P --> R[End]
E -->|AWS S3| G
C -->|No| Q[Error Callback]
Q --> R[End]