Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution Prediction #2659

Open
wants to merge 22 commits into
base: dev
Choose a base branch
from
Open

Execution Prediction #2659

wants to merge 22 commits into from

Conversation

bmosaicml
Copy link
Contributor

@bmosaicml bmosaicml commented Oct 19, 2023

What does this PR do?

This PR introduces the execution prediction task. It is an auxiliary task compatible with any code evaluation dataset that requires the model to inspect a piece of code, and complete assert test statements by predicting the code's output on a given input.

Tested with this run: exec-prediction-has8Hy

It is slightly more challenging than human eval but still meaningful signal for 30B models.

| Category   | Benchmark                       | Subtask   |   Accuracy | Number few shot   | Model                    |
|:-----------|:--------------------------------|:----------|-----------:|:------------------|:-------------------------|
|            | human_eval_execution_prediction |           |   0.170163 | 3-shot            | mosaicml/mpt-7b-instruct |

Below is an example of how the execution prediction task is formatted:

"""
Below is a list of python functions each followed by a correct assert statement testing its behavior. The final assert statement is incomplete; your task is to complete the final assert statement so that it passes.
"""

####

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx!= idx2:
                distance = abs(elem - elem2)
                if distance < threshold:
                    return True

    return False


def test0():
        assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False

####

from typing import List


def filter_by_substring(strings: List[str], substring: str) -> List[str]:
    """ Filter an input list of strings only for ones that contain given substring
    >>> filter_by_substring([], 'a')
    []
    >>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a')
    ['abc', 'bacd', 'array']
    """
    return [x for x in strings if substring in x]


def test1():
        assert filter_by_substring(["grunt", "trumpet", "prune", "gruesome"], "run") == ['grunt', 'prune']

####

from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """
    result = []
    current_string = []
    current_depth = 0

    for c in paren_string:
        if c == '(':
            current_depth += 1
            current_string.append(c)
        elif c == ')':
            current_depth -= 1
            current_string.append(c)

            if current_depth == 0:
                result.append(''.join(current_string))
                current_string.clear()

    return result


def test():
        assert separate_paren_groups("(()()) ((())) () ((())()())") ==

The model would then be expected to continue the line such that the test function succeeds.

What issue(s) does this change relate to?

Before submitting

  • Have you read the contributor guidelines?
  • Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • Did you update any related docs and document your change?
  • Did you update any related tests and add any new tests related to your change? (see testing)
  • Did you run the tests locally to make sure they pass?
  • Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

bmosaicml and others added 9 commits April 26, 2023 18:32
* Seed the fewshot sampling in the ICL datasets (mosaicml#2100)

* merge

* add ece for lm and mc

* fetch upstream

* fetch upstream

* Apply suggestions from code review

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* incorporate comments

* incorporate comments

* de;ete multi gpu

---------

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
@bmosaicml bmosaicml requested a review from a team as a code owner October 19, 2023 20:47
@maxisawesome
Copy link
Contributor

Looks good to me. Once my branch is approved I'll probably redo it to inherit from other classes etc but seems good for now!

@dakinggg
Copy link
Contributor

@maxisawesome @bmosaicml What do you want to do for merge order here? Should we merge Max's and then update this PR to use Max's? or the other way around?

@maxisawesome
Copy link
Contributor

Let's merge this in, then I can rebase and format it into the new version in my PR. That seems easier to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants