Skip to content

bdeekshith066/AI-Assistant-Text-Speech-Image-Video

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Assistant-Text-Speech-Image-Video

This repository contains a Python script for an AI assistant capable of generating text, speech, image, and video responses based on user input. The assistant utilizes OpenAI's GPT-3.5 model for text generation, the Google Text-to-Speech (gTTS) library for speech synthesis, integrates with the ClipDrop API for generating images, and utilizes the Hugging Face Transformers library for video generation from text prompts.

Features:

Text Response: Users can receive textual responses generated by the GPT-3.5 model based on their input queries. Speech Response: Users have the option to receive spoken responses synthesized from the generated text using gTTS. Image Response: Users can request image responses derived from their input prompts, facilitated by the ClipDrop API. Video Response: The assistant can also provide video responses based on user prompts, utilizing the Hugging Face Transformers library for video generation. Dependencies: OpenAI API: The openai library is used to interact with the GPT-3.5 model for text generation. gTTS: Google Text-to-Speech library for converting text responses into speech. Requests: Used for making HTTP requests to the ClipDrop API for generating image responses. Hugging Face Transformers: Utilized for generating video responses based on text prompts. IPython: Required for displaying audio files within Jupyter Notebooks.

Installation:

Ensure the following dependencies are installed before running the script:

pip install openai==0.27.0 gtts
pip install requests

For speech-related functionalities, additional installations may be required:

pip install pydub SpeechRecognition
apt-get install -y python3-pyaudio
!apt -y install -qq aria2
!pip install -q torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 torchtext==0.14.1 torchdata==0.5.1 --extra-index-url https://download.pytorch.org/whl/cu116 -U
!pip install -q pandas-gbq==0.18.1 open_clip_torch pytorch_lightning
!pip install -q git+https://github.com/camenduru/modelscope

Download required models:

!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/resolve/main/VQGAN_autoencoder.pth -d /content/models -o VQGAN_autoencoder.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/resolve/main/open_clip_pytorch_model.bin -d /content/models -o open_clip_pytorch_model.bin
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/resolve/main/text2video_pytorch_model.pth -d /content/models -o text2video_pytorch_model.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/raw/main/configuration.json -d /content/models -o configuration.json

Set configuration for GPU mode:

Contribution:

Contributions to this project are welcome! If you have ideas for improving the chatbot's functionality, adding new features, or enhancing its performance, feel free to submit a pull request.

Author

License:

This project is licensed under the MIT License.

About

AI-Assistant-Text-Speech-Image: A versatile AI companion offering responses in text, speech, and image formats

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published