A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
-
Updated
May 1, 2024 - Python
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
The Typescript SDK for the prompt engineering, prompt management, and prompt testing tool Prompt Foundry
A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.
Calibration game is a game to get better at identifying hallucination in LLMs.
Visualize LLM Evaluations for OpenAI Assistants
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
LLMs Evaluation
Exploring the depths of LLMs 🚀
[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
Code for the paper Prediction-Powered Ranking of Large Language Models, Arxiv 2024.
A prompt collection for testing and evaluation of LLMs.
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."