LLM Workbench

https://www.llmwb.com/

Supercharged workbench for LLMs. Test prompt templates from different models and providers with datasets of prompt arguments to replace the placeholders.

Problem

I've built three different AI chatbots now. In the process, I've had to shoddily build subsets of the features supported by this application. I've also wanted to have a no-code platform to test prompts for various arguments to see how the prompt works.

I personally don't find the auto prompt-writer libraries that appealing - I want to get a decent enough vibe-check across a variety of parameters.

I also personally don't use any of the abstraction libraries and don't find them that useful. However, there seemed little tooling for people who want to raw-dog test prompts.

I was inspired by Anthropic's recent workbench platform seemed like a good step-up from OpenAI's Playground.

The application solves these specific user problems:

For a specific llm prompt that takes args: Record<string, string> as its prompt parameters, I want to be able to test different model parameters.
For a specific llm propmt, I want to test out different sets of prompt parameters, maybe happy case, or cases that I've seen fail somehow.
For a specific set of prompt parameters, I want to test out different llm prompts to do a vibe-check on their outputs.
(Providers seem to have different default parameters especially for open source models) I want to test the same prompt + model for different providers to understand their behavior
I want to view the history of multiple runs for a specific set of parameter that I've provided.

Features

Create "templates": messages / raw prompts that use {{ }} for denoting variables using Mustache.js.
Create "datasets": Create a list of variables that you want to test the templates on.
Support all parameters: Be able to input all the parameters available for different APIs as well as on the UI. I've found that a lot of semi-professional playground tools do not support parameters like tools or logit_bias. The project is built to be able to easily extend new parameters when they are added by providers.
Support all meaningful providers in the ecosystem:

Todo

Manual input of prompt arguments as JSON
Import CSV
Version control prompts and datasets
Custom providers and custom models
Multi-modal input + parameters
Toggle visible columns + show compiled inputs
Default prompts + share data

Security

The templates / datasets / API keys that you add on the website are only stored locally on your browser.

Running locally

LLM Workbench as of now is just a Next.js App. Assuming you have yarn, you can run with the following:

yarn
yarn dev

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
docs		docs
public		public
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
next.config.mjs		next.config.mjs
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

License

suhjohn/llm-workbench

Folders and files

Latest commit

History

Repository files navigation

LLM Workbench

Problem

Features

Todo

Security

Running locally

About

Topics

Resources

License

Stars

Watchers

Forks

Languages