SWE-bump-bench

Overview

SWE-bump-bench is a benchmark for evaluating large language models on real world breaking-change dependency upgrade tasks collected from GitHub. Given a codebase, a package to bump, and package version to bump to, a language model is tasked with generating a patch that resolves any breaking changes for the upgrade.

The dataset contains only repositories consisting primarily of typescript.

Data Collection

First, we collect a list of repos where strict: true is set in a projects tsconfig.json. These repos are added to raw/repos.csv. Next we filter the list to repositories that have at least one major package version to upgrade and which contains breaking changes.

Breaking changes are defined as changes that cause tsc to fail after the package is updated in the repository.

The collect script does this filtering for us:

pnpm run data collect -i raw/repos.csv -o tasks.json

The final output, tasks.json is a JSON file with a collection of valid tasks.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
evals/bumpgen/v_8df9f7de936707815eb12e226517a1b0023383eb		evals/bumpgen/v_8df9f7de936707815eb12e226517a1b0023383eb
raw		raw
src		src
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE.md		LICENSE.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tasks.json		tasks.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evals/bumpgen/v_8df9f7de936707815eb12e226517a1b0023383eb

evals/bumpgen/v_8df9f7de936707815eb12e226517a1b0023383eb

raw

raw

src

src

.gitignore

.gitignore

.nvmrc

.nvmrc

LICENSE.md

LICENSE.md

README.md

README.md

package.json

package.json

pnpm-lock.yaml

pnpm-lock.yaml

tasks.json

tasks.json

tsconfig.json

tsconfig.json

Repository files navigation

SWE-bump-bench

Overview

Data Collection

License

About

Releases

Packages

Languages

License

xeol-io/swe-bump-bench

Folders and files

Latest commit

History

Repository files navigation

SWE-bump-bench

Overview

Data Collection

License

About

Resources

License

Stars

Watchers

Forks

Languages