Resources for Evaluation of LLMs / Generative AI

This repository includes the slides and some of the notebooks that are used in my Evaluation workshops.

Some of the notebooks do require an OpenAI API key.

These notebooks are intended for explaining key points of the talk, please don't try to bring them to production use. If you want to dig deeper or have issues, go to the source for each of these projects.

About the workshop

Notebook links

Prompting a Chatbot: Colab notebook

Testing Properties of a System: Guidance AI

Thumb: Prompt Testing Libary for LLMs: Github

Langtest tutorials from John Snow Labs: Colab Notebooks

LLM Evaluation Harness from EleutherAI: Github or Colab notebook

Ragas showing Model as an evaluator: Github or Colab notebook

Evaluate LLMs and RAG a practical example using Langchain and Hugging Face: Github

MLFlow Automated Evaluation: Blog

LLM Grader on AWS: Video and Notebook

Argilla for Annotation: Spaces login: admin password: 12345678

LLM AutoEval for RunPod by Maxime Labonne Colab

Evaluating LLM responses with Marvin Githb

Conference Presentations

Generative AI Summit, Austin (Oct 2023) - Slides

ODSC West, San Francisco (Nov 2023) - Slides

Arize Holiday Conference (Dec 2023) - Slides

Data Innovation Conference (Apr 2024) - Slides

Videos

Evaluation for Large Language Models and Generative AI - A Deep Dive - YouTube

Constructing an Evaluation Approach for Generative AI Models - YouTube

Large Language Models (LLMs) Can Explain Their Predictions - YouTube & Slides

Other Additional Resources

Josh Tobin's Evaluation talk YouTube

Mahesh Deshwal's LLM Evaluation Google Doc

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
presentation_slides		presentation_slides
.DS_Store		.DS_Store
.gitignore		.gitignore
LLM_evaluation_harness_for_Arc_Easy_and_SST.ipynb		LLM_evaluation_harness_for_Arc_Easy_and_SST.ipynb
README.md		README.md
Sentiment_LLM.ipynb		Sentiment_LLM.ipynb
prompts.md		prompts.md
ragas_quickstart.ipynb		ragas_quickstart.ipynb
workshop_one_pager.png		workshop_one_pager.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

presentation_slides

presentation_slides

.DS_Store

.DS_Store

.gitignore

.gitignore

LLM_evaluation_harness_for_Arc_Easy_and_SST.ipynb

LLM_evaluation_harness_for_Arc_Easy_and_SST.ipynb

README.md

README.md

Sentiment_LLM.ipynb

Sentiment_LLM.ipynb

prompts.md

prompts.md

ragas_quickstart.ipynb

ragas_quickstart.ipynb

workshop_one_pager.png

workshop_one_pager.png

Repository files navigation

Resources for Evaluation of LLMs / Generative AI

About the workshop

Notebook links

Conference Presentations

Videos

Other Additional Resources

About

Releases

Packages

Languages

rajshah4/LLM-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Resources for Evaluation of LLMs / Generative AI

About the workshop

Notebook links

Conference Presentations

Videos

Other Additional Resources

About

Resources

Stars

Watchers

Forks

Languages