Skip to content

vdutts7/ee16b-ai-chat

Repository files navigation


Logo Logo Logo

EE16B AI Chatbot

EE16B AI Chatbot ~ trained on official course website

screen-recording.mp4

Table of Contents

    πŸ“ About
    πŸ’» How to build πŸš€ Next steps πŸ”§ Tools used
    πŸ‘€ Contact

πŸ“ About

More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on all Spring 2023 lectures. EE16B students, staff, and more generally anyone can use this repo and adjust to their liking.

UC Berkeley πŸ»πŸ”΅πŸŸ‘ β€’ EE16B: Designing Information Devices and Systems II βš™οΈ β€’ Spring 2023

(back to top)

πŸ’» How to Build

Note: macOS version, adjust accordingly for Windows / Linux

Initial setup

Clone the repo and install dependencies.

git clone https://github.com/vdutts7/ee16b-ai-chat
cd ee16b-ai-chat
pnpm install

Create a .env file and add your API keys (refer .env.local.example for this template):

OPENAI_API_KEY=""
NEXT_PUBLIC_SUPABASE_URL=""
NEXT_PUBLIC_SUPABASE_ANON_KEY=""
SUPABASE_SERVICE_ROLE_KEY=""

Get API keys:

IMPORTANT: Verify that .gitignore contains .env in it.

Prepare Supabase environment

I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc

You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql. You should now have a documents table created with 4 columns.

Embed and upsert

Inside the config folder is the transcripts folder with all lectures as .txt files and the corresponding JSON files for the metadatas. .txt files were scraped from the lecture recordings separately ahead of time but OpenAI's Whisper is a great package for Speech-to-Text transcription). Change according to preferences. pageContent and metadata are by default stored in Supabase along with an int8 type for the 'id' column.

Manually run the embed-script.ipynb notebook in the scripts folder OR run the package script from terminal:

npm run embed

This is a one-time process and depending on size of data you wish to upsert, it can take a few minutes. Check Supabase database to see updates reflected in the rows of your table there.

Technical explanation

This code performs the following:

  • Installs the supabase Python library using pip. This allows interaction with a Supabase database.

  • Loads various libraries:

    supabase - For interacting with Supabase

    langchain - For text processing and vectorization

    json - For loading JSON metadata files

  • Loads the Supabase URL and API key from .env. This is used to create a supabase_client to connect to the Supabase database.

  • Loads text data from .txt lecture transcripts and JSON metadata files.

  • Uses a RecursiveCharacterTextSplitter to split the lecture text into chunks. This allows breaking the text into manageable pieces for processing. Chunk size and chunk overlap can be changed according to preference and basically control the amount of specificity. A larger chunk size and smaller overlap will result in fewer, broader chunks, while a smaller chunk size and larger overlap will produce more, narrower chunks.

  • Creates OpenAI text-embedding-ada-002 embeddings. This makes several vectors of 1536 dimensionality optimized for cosine similarity searches. These vectors are then combined with the metadata in the JSON files along with other lecture-specific info and upserted to the database as vector embeddings in row tabular format i.e. a SupabaseVectorStore.

visualized-flow-chart

Run app

Run app and verify everything went smoothly:

npm run dev

Go to http://localhost:3000. You should be able to type and ask questions now. Done βœ…

πŸš€ Next steps

Deploy

I used Vercel as this was a small project.

Alternatives: Heroku, Firebase, AWS Elastic Beanstalk, DigitalOcean, etc.

Customizations

UI/UX: change to your liking.

Bot behavior: edit prompt template in /utils/makechain.ts to fine-tune and add greater control on the bot's outputs.

Data: modify .txt files in /config/transcripts and main script in /scripts/embed-script.ipynb

(back to top)

πŸ”§ Tools used

Next Typescript Langchain OpenAI Supabase Tailwind CSS Vercel

(back to top)

πŸ‘€ Contact

me@vdutts7.com

πŸ”— Project Link: https://github.com/vdutts7/ee16b-ai-chat

(back to top)