Releases: tatsu-lab/alpaca_eval
Releases · tatsu-lab/alpaca_eval
Release v0.6.2
What's Changed
- [BUG] backward compatibility with AF by @YannDubs in #278
- Add Nanbeige-Plus-Chat-v0.1 to AlpacaEval by @yuani114 in #279
- Update README.md by @Dominic789654 in #280
- [BUG] revert to GPT4 preview 1106 by @YannDubs in #283
- Add support for analyzing evaluators with custom cross-annotations by @rdnfn in #281
- [ENH] llama3 by @YannDubs in #285
New Contributors
- @Dominic789654 made their first contribution in #280
- @rdnfn made their first contribution in #281
Full Changelog: v0.6.1...v0.6.2
Release v0.6.1
What's Changed
- Add Aligner-2B+Qwen1.5-72B-Chat & Aligner-2B+Claude3 Opus to AlpacaEval by @AlignInc in #259
- Supplement for Aligner by @AlignInc in #261
- Add Ein-70B-v0.1 to AlpacaEval by @bin-bi in #262
- Add TempNet-LLaMA2-Chat to AlpacaEval by @xumao-nju in #264
- Add Conifer-7B-DPO to AlpacaEval by @liulixin29 in #267
- Updating link to a super fast demo! by @kyleliang919 in #268
- Add Nanbeige2-8B-Chat to AlpacaEval by @yuani114 in #274
- [ENH] adding drbx and gpt4 turbo by @YannDubs in #275
New Contributors
- @AlignInc made their first contribution in #259
- @bin-bi made their first contribution in #262
- @xumao-nju made their first contribution in #264
- @liulixin29 made their first contribution in #267
- @yuani114 made their first contribution in #274
Full Changelog: v0.6...v0.6.1
Release v0.6
What's Changed
- [DATA] Add Gemma by @YannDubs in #242
- [NOTEBOOK] adding final length correction notebook. by @YannDubs in #244
- add Mistral-7B-ReMax-v0.1 by @liziniu in #245
- [ENH] add claude 3 by @YannDubs in #247
- [ENH] add contextual by @YannDubs in #250
- [ENH] add mistral large by @YannDubs in #251
- Add Samba-CoE-v0.2 to AlpacaEval by @kyleliang919 in #253
- Add Samba-CoE-v0.2-best-of-16 to AlpacaEval by @kyleliang919 in #256
- Add Mistral-ORPO-Beta to AlpacaEval by @jiwooya1000 in #257
- Yann/length correction by @YannDubs in #258
New Contributors
- @liziniu made their first contribution in #245
- @kyleliang919 made their first contribution in #253
- @jiwooya1000 made their first contribution in #257
Full Changelog: v0.5.4...v0.6
Release v0.5.4
What's Changed
- Add Qwen1.5-72B-Chat to AlpacaEval by @Lukeming-tsinghua in #226
- Add claude-instant-1.2, deepseek-llm-67b-chat, wizardlm-70b, Qwen-14B-Chat (config + outputs without annotations) by @gblazex in #228
- [DATA] Adding annotations for the arena models by @YannDubs in #229
- Update README.md - Add missing "Y" to "ou" by @yoderj in #230
- [DEV] Analyzing length-controlled metrics. by @YannDubs in #231
- [DOC] add annotation interpretation by @YannDubs in #232
- [DATA] add results from the Arena openai models by @YannDubs in #234
- update ELO for llama-2-13b-chat-hf by @gblazex in #235
- [NOTEBOOK] add length-corrected GLM by @YannDubs in #237
- [ENH] add inverse mapper to make sure in and out types are the same by @YannDubs in #240
- [ENH] update to allow AF to use AE by @YannDubs in #241
New Contributors
- @Lukeming-tsinghua made their first contribution in #226
- @yoderj made their first contribution in #230
Full Changelog: v0.5.3...v0.5.4
Release v0.5.3
What's Changed
- [ENH] add mistral-medium by @YannDubs in #205
- [ENH] add internlm2-chat-20b-ppo by @C1rN09 in #207
- prettify "pretty_name" of internlm2 by @C1rN09 in #208
- [ENH] add outputs & configs form dolphin 2.2.1 by @YannDubs in #209
- Add PairRM 0.4B + Yi-34B-Chat to AlpacaEval 2.0 by @jdf-prog in #210
- dolphin 2.1.1 configs.yaml by @gblazex in #212
- Update README.md (small typo) by @xwinxu in #213
- [TEST]: fix ordering of df by @YannDubs in #214
- Add Snorkel-Mistral-PairRM-DPO (best-of-16) to Alpaca Eval 2.0 by @viethoangtranduong in #215
- update InternLM2 chat template by @C1rN09 in #216
- Add Starling-LM-7B-alpha, vicuna-13b-v1.5, vicuna-7b-v1.5 to AlpacaEval (config + outputs without annotations) by @gblazex in #217
- [RES] add 3 models for arena correlations by @YannDubs in #218
- Add xwinlm-70b-v0.3 to AlpacaEval by @nbl97 in #221
- [ENH] add referenced_models locally by @YannDubs in #224
New Contributors
- @C1rN09 made their first contribution in #207
- @gblazex made their first contribution in #212
- @xwinxu made their first contribution in #213
- @viethoangtranduong made their first contribution in #215
Full Changelog: v0.5.2...v0.5.3
Release v0.5.2
Release v0.5.1
Release v0.5.0
What's Changed
- Fix mssg check by @Muennighoff in #174
- Add MiniChat-1.5-3B to AlpacaEval and Fix MiniChat-3B by @GeneZC in #176
- Add 01-ai/Yi-34B-Chat to AlpacaEval by @HyperdriveHustle in #175
- feat: add way to verify results by @YannDubs in #177
- show img in readme by @YannDubs in #178
- Add PairRM best-of-16 to AlpacaEval by @jdf-prog in #181
- Verify Yi by @YannDubs in #182
- chore: add phi-2 sft by @lxuechen in #184
- add cut-13b by @wwxu21 in #186
- chore: add phi-2 dpo by @lxuechen in #185
- Support phi2, Support SOLAR 10.7B LMCocktail by @yhyu13 in #183
- Update openai.py by @Muennighoff in #188
- chore: add link for phi-2-sft by @lxuechen in #190
- chore: fix links by @lxuechen in #191
- Add deita-7b-v1.0 model by @VPeterV in #192
- [ENH] Azure OAI client & more general way of switching between client configs by @YannDubs in #193
- [ENH] Weighted win rates by @YannDubs in #189
- [ENH] new models: Gemini / claude2.1 / mistral / mixtral / .. by @YannDubs in #195
- [ENH] alpaca_eval 2.0 by @YannDubs in #196
New Contributors
- @Muennighoff made their first contribution in #174
- @HyperdriveHustle made their first contribution in #175
- @jdf-prog made their first contribution in #181
- @lxuechen made their first contribution in #184
- @wwxu21 made their first contribution in #186
- @yhyu13 made their first contribution in #183
- @VPeterV made their first contribution in #192
Full Changelog: v0.3.6...v0.5.0
Release v0.3.6
What's Changed
- feat: verify all the cohere model & use it as eval by @YannDubs in #170
- Add Tulu 2 models to AlpacaEval by @hamishivi in #171
New Contributors
- @hamishivi made their first contribution in #171
Full Changelog: v0.3.5...v0.3.6