Automated Translate and Transcode Video Pipeline

This application is a serverless application pipeline that does the following:

Take a video as an input
Translate the text into a different language
Add subtitles in the different language
Generate an AI based voice to dub the video with
Create a new version of the video with the subtitles and voice spoken in a different language

The application uses the following AWS services

Amazon S3
Amazon Lambda
Amzon Transcribe
Amazon Translate
Amazon Polly
Amazon Media Encoder
StepFunctions

The architecture is as follows:

1. Upload the original file

To trigger the pipeline, copy a video file into a bucket.

The Input file is named based on the original language and the desired translation. So in this case, the original is en-US and you want it in Spanish eg.

myvideo__en-US__es.mp4

2. Transcribe the Audio

This upload triggers the first Lambda called LambdaTranscribe which transcribes the video and places a JSON output into a bucket called transcribe.json.conygre.com with the filename:

timestampmyvideo__en-US__es.json

3. Create Subtitles Files In Both Languages

This JSON which contains the transcribed text then needs to be converted into a subtitles file, and then translated.

This is done by the second Lambda ConvertTranscribeToSubtitle. It creates two files:

<timestamp>myvideo__en-US__es_original.srt

<timestamp>myvideo__en-US__es_translated.srt

These files are placed into the next bucket transcribe.srt.conygre.com

4. Create the Speech Markup Language Files

The SRT files need to be converted into SSML files. This is done by the third Lambda ConvertSubtitleToSSML. This creates file with the name:

<timestamp>myvideo__en-US__es_translated.ssml

Note that it is written to ignore files with the word 'original' in them since they will not require an audio file

The file is placed into the bucket transcribe.ssml.conygre.com

5. Create the Audio File for the New Language

An audio file is then created based on the SSML file for the translated SSML.

This is done by the fourth Lambda called SSMLToAudio. It takes the SSML and runs Amazon Polly to create an audio file. The audio file has the name

<timestamp>myvideo__en-US__de_translated<pollyJobId>

6. Convert the output to be a new Playable Resource

The fifth and final Lambda then runs to run a MediaConvert job using all the files created already.

7. The StepFunctions

The entire flow is coordinated using Step Functions. You can see the flow here:

Review your Results

There is finally an HTML file called testmedia.html that can be used to display the finished media in a Web page. This can be edited for your final output.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.aws-sam		.aws-sam
functions		functions
samples		samples
statemachine		statemachine
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
architecture.png		architecture.png
buildspec.yml		buildspec.yml
config.js		config.js
samconfig.toml		samconfig.toml
sample-trigger-event.json		sample-trigger-event.json
stepfunctions.png		stepfunctions.png
template.yaml		template.yaml
testmedia.html		testmedia.html

nicktodd/video-translation-stepfunctions

Folders and files

Latest commit

History

Repository files navigation

Automated Translate and Transcode Video Pipeline

1. Upload the original file

2. Transcribe the Audio

3. Create Subtitles Files In Both Languages

4. Create the Speech Markup Language Files

5. Create the Audio File for the New Language

6. Convert the output to be a new Playable Resource

7. The StepFunctions

Review your Results

About

Resources

Stars

Watchers

Forks

Languages