Block or Report
Block or report ollmer
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
-
mmlu
mmlu PublicForked from hendrycks/test
Measuring Massive Multitask Language Understanding | ICLR 2021
-
lm-evaluation-harness
lm-evaluation-harness PublicForked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Python
-
SWE-agent
SWE-agent PublicForked from princeton-nlp/SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.