If you want to contribute, open an issue or submit a pull request.

The problem statement includes building a voice-powered chatbot for the BHUVAN portal.

In this project, we plan to build a chatbot capable of understanding and processing multilingual voice‐based search queries. It should accurately interpret the user’s query and deliver context-aware responses. We are expected to enhance the user experience through a well-built voice-enabled interface. SAARTHI is our voice-powered chatbot.

Prerequisites

Our chatbot can be run remotely on Google Colab without installing anything on your system.

Getting Started

Navigate to Google Colab: colab.research.google.com
Go to File -> Open Notebook -> from GitHub -> Paste this URL: https://github.com/that-coding-kid/Saarthi.git
Run all cells, startup time is approximately 10 minutes.
You can now interact with the web interface at your convenience.
The dataset consists of a set of URLs scrapped, preprocessed and saved as a text file.

Conclusion: Open the "saarthi_backend.ipynb" in colab, run all the cells, then once the server is live, click on the link generated after:

from google.colab.output import eval_js

print(eval_js("google.colab.kernel.proxyPort(8000)"))

Flow

Brief about our Approach:

1.Web Scrapping and Data Preprocessing: Leveraging Langchain, we efficiently scraped data from the portal's URLs. We manually enriched the dataset with additional descriptors to augment the bot's intent awareness. Further enhancing the conversational depth, we refined the data ensuring a more descriptive and contextually nuanced interaction.

2.Creating Embeddings: We generated embeddings for the dataset by utilizing the Hugging Face platform, specifically the 'instructor-XL' embeddings. These embeddings form the basis for similarity searches and contribute to the overall functionality of the system.

3.Retrieval Augmented Generation using FAISS and Mixtral-8x7B-Instruct-v0.1: Implemented retrieval augmented generation using FAISS as a knowledge base and semantic index similarity. The system, upon receiving a query, retrieves the context of k-nearest neighbours, enhancing precision in generated responses for a more contextually accurate interaction.

4.Voice to Text using Whisper small model: Our system seamlessly integrates the open-source ASR Model- ‘Whisper Voice API’ (Whisper-small model) for live voice-to-text transcription, ensuring accurate and efficient conversion.

5.Dynamic and context-aware responses using Mixtral-8x7B-Instruct-v0.1: Subsequently, the query, along with the context retrieved from Faiss, is input into 'Mixtral-8x7B-Instruct-v0.1', an open-source Large Language Model. Accessing its API from Hugging Face, this model demonstrates superior accuracy compared to Llama 13B and is on par with GPT-3.5 in terms of performance. The response is then generated, leveraging the capabilities of Mixtral-8x7B-Instruct-v0.1, incorporating the contextual information from the query and Faiss-retrieved context.

6.Voice-based input and output (Multilingual): We provide users with the option to choose between various languages such as Hindi,English etc. as their preferred language. The bot is also capable of translating inputs given in a different language than the selected input, for eg- Say the the language field is set as English and we decide to give a Hindi input, then the response generated would be in English along with the transcription of the said query. This can help in cases where information needs to be conveyed to different persons without having the user to know a different language. The generated response is presented in a voice-based format, utilizing translators and text-to-voice models to enhance the overall user experience.

Accuracies

Whisper

	large	medium	small	base	tiny	WERR: S → M
English	0.15	0.17	0.17	0.2	0.2	0
Italian	0.16	0.17	0.22	0.33	0.5	0.24
German	0.18	0.18	0.21	0.27	0.4	0.14
Spanish	0.19	0.19	0.2	0.28	0.4	0.07
French	0.26	0.26	0.29	0.37	0.5	0.09
Portuguese	0.25	0.28	0.28	0.39	0.5	0.02
Japanese*	0.29	0.3	0.34	0.44		0.11
Danish	0.3	0.3	0.41	0.64	0.8	0.25
Swedish	0.29	0.31	0.38	0.51	0.6	0.19
Indonesian	0.31	0.31	0.38	0.52		0.17
Greek	0.29	0.31	0.44	0.62	0.8	0.29
Chinese*	0.33	0.33	0.35	0.44		0.06
Thai*	0.34	0.34	0.52	0.59	0.7	0.34
Tagalog	0.36	0.37	0.48	0.7	0.9	0.24
Korean	0.4	0.4	0.44	0.51		0.09
Norwegian	0.42	0.42	0.46	0.75	0.9	0.09
Finnish	0.41	0.43	0.53	0.7	0.9	0.19
Arabic	0.52	0.53	0.61	0.75	0.9	0.14
Hindi	0.6	0.67	0.104	0.11		0.35

Technology and Tech Stack used:

We have used multiple open-source APIs to achieve our task. The domain-wise APIs and their GitHub links are below.

Audio Transcription: OpenAI-Whisper
Text Translation: Googletrans
Voice-Recoding: JavaScript API
Embeddings Creation: hkunlp/instructor-xl
Database Used: FAISS-Index
Web Scraping: Langchain.webloader
Pre-Processing: RegEx
QA Chain: Langchain
LLM: Mixtral - 8x7B
Web Framework: Django
Text-To-Speech: SpeechSynthesisUtterance

If you want to contribute, open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Databases		Databases
Saarthi		Saarthi
DataPreprocessing.ipynb		DataPreprocessing.ipynb
README.md		README.md
Saarthi-20240119T051115Z-001.zip		Saarthi-20240119T051115Z-001.zip
Saarthi_backend.ipynb		Saarthi_backend.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databases

Databases

Saarthi

Saarthi

DataPreprocessing.ipynb

DataPreprocessing.ipynb

README.md

README.md

Saarthi-20240119T051115Z-001.zip

Saarthi-20240119T051115Z-001.zip

Saarthi_backend.ipynb

Saarthi_backend.ipynb

Repository files navigation

Prerequisites

Getting Started

Flow

Accuracies

Whisper

Technology and Tech Stack used:

If you want to contribute, open an issue or submit a pull request.

About

Releases

Packages

Contributors 4

Languages

that-coding-kid/Saarthi

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Getting Started

Flow

Accuracies

Whisper

Technology and Tech Stack used:

If you want to contribute, open an issue or submit a pull request.

About

Topics

Resources

Stars

Watchers

Forks

Languages