Welcome to youtube-audiobook-chapter-identifier 🎧📖

This bot's goal is to identify the chapters in an audiobook hosted on YouTube 🔎🕵🏻‍♂️📋

Result Example

__________________
Result for The Animal Farm
https://www.youtube.com/watch?v=iosHzNmVYbA

Chapter 1        0:00:07         Duration: 0:00:07
Chapter 2        0:16:51         Duration: 0:16:43
Chapter 3        0:33:05         Duration: 0:16:14
Chapter 4        0:47:13         Duration: 0:14:07
Chapter 5        0:57:48         Duration: 0:10:34
Chapter 6        1:17:14         Duration: 0:19:26
Chapter 7        1:34:49         Duration: 0:17:35
Chapter 8        1:57:52         Duration: 0:23:03
Chapter 9        2:22:48         Duration: 0:24:55
Chapter 10       2:45:23         Duration: 0:22:34


by https://github.com/ThisIsDjonathan/youtube-audiobook-chapter-identifier
__________________

How I Built This

This is done in 3 steps:

The YoutubeVideoHelper.py will download the YouTube content as a .mp4;
Then the AudioToTextHelper.py will use the OpenAI whisper to transcribe the audio to text;
The last step is done by the Audiobook.py which will find where each chapter starts based on the result text from the step above.

The script will create a folder inside the ./audiobooks/ directory for each audiobook.

This is the file structure: 📦 youtube-audiobook-chapter-identifier
┣ 📂 audiobooks
┃ ┗ 📂 Audio Book 1
┃ ┗ 🎧 youtube-content.mp4
┃ ┗ 📋 audio-to-text.json

How to use it

First, install the Python dependencies:

pip install -r requirements.txt

Then update the main.py setting the Audiobook title and its Youtube URL.

def main():
    audiobook_title = 'The Animal Farm'
    youtube_url = 'https://www.youtube.com/watch?v=iosHzNmVYbA'

And finally run the script: python main.py

How it Works

YoutubeVideoHandler 🎧📖

We are using the pytube library to download the Youtube data. We download the audio only and save the file as youtube-content.mp4.

The Speech to Text 🗣️👂✍🏻

After download the audio file from YouTube we use the OpenAI whisper to transcribe the audio to text. The result of this process is a JSON file saved as audio-to-text.json

Chapter Finder 🕵🏻‍♂️📋

The chapter finder (Audiobook.find_chapters()) will loop through each segment resulted in the whisper transcription and look for the word "chapter". This should be done in a better way since currently I'm using a simple and dumb if statement to do so 😅

Contributing

Check the open issues 😁

Author

👤 Djonathan Krause

Website: djonathan.com
Github: @ThisIsDjonathan

Show your support

Please ⭐️ this repository if this project helped you!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
audiobooks		audiobooks
.gitignore		.gitignore
AudioToTextHelper.py		AudioToTextHelper.py
Audiobook.py		Audiobook.py
README.md		README.md
YoutubeContentHelper.py		YoutubeContentHelper.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audiobooks

audiobooks

.gitignore

.gitignore

AudioToTextHelper.py

AudioToTextHelper.py

Audiobook.py

Audiobook.py

README.md

README.md

YoutubeContentHelper.py

YoutubeContentHelper.py

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Welcome to youtube-audiobook-chapter-identifier 🎧📖

Result Example

How I Built This

How to use it

How it Works

YoutubeVideoHandler 🎧📖

The Speech to Text 🗣️👂✍🏻

Chapter Finder 🕵🏻‍♂️📋

Contributing

Author

Show your support

About

Releases

Packages

Languages

ThisIsDjonathan/youtube-audiobook-chapter-identifier

Folders and files

Latest commit

History

Repository files navigation

Welcome to youtube-audiobook-chapter-identifier 🎧📖

Result Example

How I Built This

How to use it

How it Works

YoutubeVideoHandler 🎧📖

The Speech to Text 🗣️👂✍🏻

Chapter Finder 🕵🏻‍♂️📋

Contributing

Author

Show your support

About

Resources

Stars

Watchers

Forks

Languages