llm-evaluation

Here are 51 public repositories matching this topic...

j0st / PoliticalLLM

A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.

german pct manifesto-project rag wahlomat political-ideology-detection llms llm-evaluation

Updated May 1, 2024
Python

prompt-foundry / typescript-sdk

Star

The Typescript SDK for the prompt engineering, prompt management, and prompt testing tool Prompt Foundry

typescript open-ai prompt-engineering prompt-testing prompt-manager prompt-management llm-eval llm-test llm-evaluation prompt-evaluation

Updated May 23, 2024
TypeScript

IteraLabs / knowledge-benchmarks

Star

A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.

nlp artificial-intelligence benchmarks natural-language-understanding llm llm-evaluation

Updated May 18, 2024

SharathHebbar / eval_llms

Star

eleutherai llm-evaluation llms-benchmarking

Updated Feb 4, 2024
Jupyter Notebook

wittyicon29 / Custom-Evaluate-LLM

Star

Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain

llms langchain llm-evaluation

Updated Apr 21, 2024
Jupyter Notebook

DavidGir / LangChain-Familiarization

Star

For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.

models prompt parsers pinecone rag llm langchain-python langchain-chains langchain-agent llm-evaluation llmchain

Updated Mar 29, 2024
Jupyter Notebook

awesome-software / lm-evaluation-harness

Star

A framework for few-shot evaluation of language models.

llm-evaluation

Updated Jan 31, 2024
Python

aknvictor / calibrationgame

Star

Calibration game is a game to get better at identifying hallucination in LLMs.

game calibration llms llm-evaluation

Updated Feb 4, 2024
CSS

euskoog / openai-assistants-evals

Star

Visualize LLM Evaluations for OpenAI Assistants

openai tailwindcss llms llm-evaluation openai-assistants

Updated Mar 27, 2024
TypeScript

awesome-software / ray-summit-2023-training

Star

llm-evaluation

Updated Sep 21, 2023
Jupyter Notebook

FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.

nlp natural-language-processing evaluation openai question-answering gpt-4 answer-evaluation large-language-models llms gpt-evaluation llm-evaluation

Updated Apr 25, 2024
Python

VidhyaVarshanyJS / EnsembleX

Star

EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.

python benchmark knapsack huggingface streamlit large-language-models llm llm-evaluation open-llm-leaderboard

Updated May 5, 2024
Python

GURPREETKAURJETHRA / LLMs-Evaluation

Star

LLMs Evaluation

large-language-models llm generative-ai llm-evaluation

Updated May 16, 2024
Jupyter Notebook

GiacomoMeloni / ExploringLLMs

Star

Exploring the depths of LLMs 🚀

rag llm prompt-engineering generative-ai retrieval-augmented-generation llm-evaluation

Updated Dec 7, 2023
Jupyter Notebook

ivarfresh / Interaction_LLMs

Star

[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.

personality-traits bfi linguistic-alignment llms generative-agents llm-evaluation