GitHub - sanjuashok/rag: playing around with retrieval augmented generation

What is the rough outline of what we're doing here:

The end result is: query a bunch of text and generate a prompt that returns paragraphs of text that are related to the query.

This requires the following:

Get a corpus to index.
Parse the text into chunks
Pass the text into openai ada embeddings and get a vector.
Write the embedding into a db with a mapping from embedding -> sentence chunk
Get a query string
Transform the query string into an embedding.
Run knn using the query embedding against the indexed data.
Return top 5 sentence chunks.
Voila -> you have context for the query against the big boi LLM.

Then you query the LLM with the following: Answer the follow query "" given the following context { }

Implementation and Data

The implementation was based on following the instructions from this comment on HN.

Data set is ronaneldan's tiny stories dataset from HuggingFace.

Outcome

And it sorta works!

For the query "who went to the park?", you get the following output from the vector similarity search.

Once upon a time, there was a little boy named Tim. Tim loved going to the park to play. One day, Tim went to the park with his mom and dad. He was very happy.
At the park, Tim saw a big tree. He wanted to give the tree a hug. So, he hugged the tree and felt good. Tim liked the tree a lot. He played with his ball, ran around, and had a lot of fun.
Tim had a successful day at the park. He played and laughed a lot. When it was time to go home, Tim felt tired but happy. He couldn't wait to come back to the park again.



Sue and Mike were planning a picnic in the park. They wanted to have the best time ever! Sue and Mike packed snacks for their picnic. They brought lots of different snacks, but their favorite was the beans.
When they arrived in the park, they spread out their blanket and started to get ready for the picnic. Suddenly, Sue noticed something strange – there was a deaf rabbit in the middle of the park.
Sue was excited. She called out to Mike, "Let's give him some beans!" Mike nodded with a smile and together they took out an extra portion of beans and placed it near the rabbit.
The rabbit hopped around the beans and started to eat. Sue and Mike were delighted to see the rabbit enjoying their snack. They had a wonderful picnic and plan to visit the park again soon.



Once there was a boy. His name was Tim. He was three years old and was very adventurous. One day he decided to go for a walk with his parents.
As they walked, they noticed something wonderful. There was a lake, with bright blue waters, and beautiful green trees surrounding it. 
The family couldn't help but add it to their list of favorite places. 
Tim was so excited, he said, "Can we go explore? I would love to see what's there!"
His parents replied, "Yes Tim! What an adventurous idea!"
They all went to explore. Tim discovered so many amazing things. He found colorful rocks, and new plants he'd never seen before. He even saw some animals stepping out of the water.
Tim added these sights to his memory, and felt a sense of wonder as he explored. He had a great day with his family and enjoyed their adventurous outing.

The output from ChatGPT using the above as context and the original query is:

Tim, Sue, and Mike went to the park.

Pretty cool.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
data		data
.gitignore		.gitignore
README.md		README.md
main.py		main.py
np_embs.npy		np_embs.npy
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

data

data

.gitignore

.gitignore

README.md

README.md

main.py

main.py

np_embs.npy

np_embs.npy

requirements.txt

requirements.txt

Repository files navigation

Implementation and Data

Outcome

About

Releases

Packages

Languages

sanjuashok/rag

Folders and files

Latest commit

History

Repository files navigation

Implementation and Data

Outcome

About

Resources

Stars

Watchers

Forks

Languages