Releases · tatsu-lab/alpaca_eval

19 Apr 06:28

github-actions

v0.6.2

46ca37b

Release v0.6.2 Latest

Latest

What's Changed

[BUG] backward compatibility with AF by @YannDubs in #278
Add Nanbeige-Plus-Chat-v0.1 to AlpacaEval by @yuani114 in #279
Update README.md by @Dominic789654 in #280
[BUG] revert to GPT4 preview 1106 by @YannDubs in #283
Add support for analyzing evaluators with custom cross-annotations by @rdnfn in #281
[ENH] llama3 by @YannDubs in #285

New Contributors

@Dominic789654 made their first contribution in #280
@rdnfn made their first contribution in #281

Full Changelog: v0.6.1...v0.6.2

Contributors

YannDubs, yuani114, and 2 other contributors

Assets 2

13 Apr 05:40

github-actions

v0.6.1

26b6af7

Release v0.6.1

What's Changed

Add Aligner-2B+Qwen1.5-72B-Chat & Aligner-2B+Claude3 Opus to AlpacaEval by @AlignInc in #259
Supplement for Aligner by @AlignInc in #261
Add Ein-70B-v0.1 to AlpacaEval by @bin-bi in #262
Add TempNet-LLaMA2-Chat to AlpacaEval by @xumao-nju in #264
Add Conifer-7B-DPO to AlpacaEval by @liulixin29 in #267
Updating link to a super fast demo! by @kyleliang919 in #268
Add Nanbeige2-8B-Chat to AlpacaEval by @yuani114 in #274
[ENH] adding drbx and gpt4 turbo by @YannDubs in #275

New Contributors

@AlignInc made their first contribution in #259
@bin-bi made their first contribution in #262
@xumao-nju made their first contribution in #264
@liulixin29 made their first contribution in #267
@yuani114 made their first contribution in #274

Full Changelog: v0.6...v0.6.1

Contributors

bin-bi, kyleliang919, and 5 other contributors

Assets 2

20 Mar 02:50

github-actions

v0.6

f5046ae

Release v0.6

What's Changed

[DATA] Add Gemma by @YannDubs in #242
[NOTEBOOK] adding final length correction notebook. by @YannDubs in #244
add Mistral-7B-ReMax-v0.1 by @liziniu in #245
[ENH] add claude 3 by @YannDubs in #247
[ENH] add contextual by @YannDubs in #250
[ENH] add mistral large by @YannDubs in #251
Add Samba-CoE-v0.2 to AlpacaEval by @kyleliang919 in #253
Add Samba-CoE-v0.2-best-of-16 to AlpacaEval by @kyleliang919 in #256
Add Mistral-ORPO-Beta to AlpacaEval by @jiwooya1000 in #257
Yann/length correction by @YannDubs in #258

New Contributors

@liziniu made their first contribution in #245
@kyleliang919 made their first contribution in #253
@jiwooya1000 made their first contribution in #257

Full Changelog: v0.5.4...v0.6

Contributors

jiwooya1000, kyleliang919, and 2 other contributors

Assets 2

24 Feb 08:56

github-actions

v0.5.4

3c43e9d

Release v0.5.4

What's Changed

Add Qwen1.5-72B-Chat to AlpacaEval by @Lukeming-tsinghua in #226
Add claude-instant-1.2, deepseek-llm-67b-chat, wizardlm-70b, Qwen-14B-Chat (config + outputs without annotations) by @gblazex in #228
[DATA] Adding annotations for the arena models by @YannDubs in #229
Update README.md - Add missing "Y" to "ou" by @yoderj in #230
[DEV] Analyzing length-controlled metrics. by @YannDubs in #231
[DOC] add annotation interpretation by @YannDubs in #232
[DATA] add results from the Arena openai models by @YannDubs in #234
update ELO for llama-2-13b-chat-hf by @gblazex in #235
[NOTEBOOK] add length-corrected GLM by @YannDubs in #237
[ENH] add inverse mapper to make sure in and out types are the same by @YannDubs in #240
[ENH] update to allow AF to use AE by @YannDubs in #241

New Contributors

@Lukeming-tsinghua made their first contribution in #226
@yoderj made their first contribution in #230

Full Changelog: v0.5.3...v0.5.4

Contributors

gblazex, yoderj, and 2 other contributors

Assets 2

01 Feb 08:54

github-actions

v0.5.3

8779373

Release v0.5.3

What's Changed

[ENH] add mistral-medium by @YannDubs in #205
[ENH] add internlm2-chat-20b-ppo by @C1rN09 in #207
prettify "pretty_name" of internlm2 by @C1rN09 in #208
[ENH] add outputs & configs form dolphin 2.2.1 by @YannDubs in #209
Add PairRM 0.4B + Yi-34B-Chat to AlpacaEval 2.0 by @jdf-prog in #210
dolphin 2.1.1 configs.yaml by @gblazex in #212
Update README.md (small typo) by @xwinxu in #213
[TEST]: fix ordering of df by @YannDubs in #214
Add Snorkel-Mistral-PairRM-DPO (best-of-16) to Alpaca Eval 2.0 by @viethoangtranduong in #215
update InternLM2 chat template by @C1rN09 in #216
Add Starling-LM-7B-alpha, vicuna-13b-v1.5, vicuna-7b-v1.5 to AlpacaEval (config + outputs without annotations) by @gblazex in #217
[RES] add 3 models for arena correlations by @YannDubs in #218
Add xwinlm-70b-v0.3 to AlpacaEval by @nbl97 in #221
[ENH] add referenced_models locally by @YannDubs in #224

New Contributors

@C1rN09 made their first contribution in #207
@gblazex made their first contribution in #212
@xwinxu made their first contribution in #213
@viethoangtranduong made their first contribution in #215

Full Changelog: v0.5.2...v0.5.3

Contributors

gblazex, YannDubs, and 5 other contributors

Assets 2

10 Jan 23:57

github-actions

v0.5.2

83e91f3

Release v0.5.2

What's Changed

[BUG] force openai >1.5.0 by @YannDubs in #202
[WIP] precompute all leaderboard for AE2 by @YannDubs in #199
[ENH] add OpenHermes by @YannDubs in #203

Full Changelog: v0.5.1...v0.5.2

Contributors

YannDubs

Assets 2

10 Jan 06:16

github-actions

v0.5.1

91a903f

Release v0.5.1

What's Changed

[BUG] fix no OAI org id set by @YannDubs in #200

Full Changelog: v0.5.0...v0.5.1

Contributors

YannDubs

Assets 2

10 Jan 02:32

github-actions

v0.5.0

0c14d6f

Release v0.5.0

What's Changed

Fix mssg check by @Muennighoff in #174
Add MiniChat-1.5-3B to AlpacaEval and Fix MiniChat-3B by @GeneZC in #176
Add 01-ai/Yi-34B-Chat to AlpacaEval by @HyperdriveHustle in #175
feat: add way to verify results by @YannDubs in #177
show img in readme by @YannDubs in #178
Add PairRM best-of-16 to AlpacaEval by @jdf-prog in #181
Verify Yi by @YannDubs in #182
chore: add phi-2 sft by @lxuechen in #184
add cut-13b by @wwxu21 in #186
chore: add phi-2 dpo by @lxuechen in #185
Support phi2, Support SOLAR 10.7B LMCocktail by @yhyu13 in #183
Update openai.py by @Muennighoff in #188
chore: add link for phi-2-sft by @lxuechen in #190
chore: fix links by @lxuechen in #191
Add deita-7b-v1.0 model by @VPeterV in #192
[ENH] Azure OAI client & more general way of switching between client configs by @YannDubs in #193
[ENH] Weighted win rates by @YannDubs in #189
[ENH] new models: Gemini / claude2.1 / mistral / mixtral / .. by @YannDubs in #195
[ENH] alpaca_eval 2.0 by @YannDubs in #196

New Contributors

@Muennighoff made their first contribution in #174
@HyperdriveHustle made their first contribution in #175
@jdf-prog made their first contribution in #181
@lxuechen made their first contribution in #184
@wwxu21 made their first contribution in #186
@yhyu13 made their first contribution in #183
@VPeterV made their first contribution in #192

Full Changelog: v0.3.6...v0.5.0

Contributors

lxuechen, yhyu13, and 7 other contributors

Assets 2

24 Nov 22:50

github-actions

v0.3.6

9e8e898

Release v0.3.6

What's Changed

feat: verify all the cohere model & use it as eval by @YannDubs in #170
Add Tulu 2 models to AlpacaEval by @hamishivi in #171

New Contributors

@hamishivi made their first contribution in #171

Full Changelog: v0.3.5...v0.3.6

Contributors

hamishivi and YannDubs

Assets 2

16 Nov 23:19

github-actions

v0.3.5

ba9e449

Release v0.3.5

What's Changed

[WIP] GPT4 turbo as evaluator by @YannDubs in #160
[ENH] add GPT4 turbo as evaluator in README by @YannDubs in #165
Add minichat-3b to AlpacaEval by @GeneZC in #167
fix: filter openai spam filter by @YannDubs in #169

New Contributors

@GeneZC made their first contribution in #167

Full Changelog: v0.3.3...v0.3.5

Contributors

GeneZC and YannDubs

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: tatsu-lab/alpaca_eval

Release v0.6.2

What's Changed

New Contributors

Contributors

Release v0.6.1

What's Changed

New Contributors

Contributors

Release v0.6

What's Changed

New Contributors

Contributors

Release v0.5.4

What's Changed

New Contributors

Contributors

Release v0.5.3

What's Changed

New Contributors

Contributors

Release v0.5.2

What's Changed

Contributors

Release v0.5.1

What's Changed

Contributors

Release v0.5.0

What's Changed

New Contributors

Contributors

Release v0.3.6

What's Changed

New Contributors

Contributors

Release v0.3.5

What's Changed

New Contributors

Contributors