Execution Prediction #2659

bmosaicml · 2023-10-19T20:47:53Z

What does this PR do?

This PR introduces the execution prediction task. It is an auxiliary task compatible with any code evaluation dataset that requires the model to inspect a piece of code, and complete assert test statements by predicting the code's output on a given input.

Tested with this run: exec-prediction-has8Hy

It is slightly more challenging than human eval but still meaningful signal for 30B models.

| Category   | Benchmark                       | Subtask   |   Accuracy | Number few shot   | Model                    |
|:-----------|:--------------------------------|:----------|-----------:|:------------------|:-------------------------|
|            | human_eval_execution_prediction |           |   0.170163 | 3-shot            | mosaicml/mpt-7b-instruct |

Below is an example of how the execution prediction task is formatted:

"""
Below is a list of python functions each followed by a correct assert statement testing its behavior. The final assert statement is incomplete; your task is to complete the final assert statement so that it passes.
"""

####

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx!= idx2:
                distance = abs(elem - elem2)
                if distance < threshold:
                    return True

    return False


def test0():
        assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False

####

from typing import List


def filter_by_substring(strings: List[str], substring: str) -> List[str]:
    """ Filter an input list of strings only for ones that contain given substring
    >>> filter_by_substring([], 'a')
    []
    >>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a')
    ['abc', 'bacd', 'array']
    """
    return [x for x in strings if substring in x]


def test1():
        assert filter_by_substring(["grunt", "trumpet", "prune", "gruesome"], "run") == ['grunt', 'prune']

####

from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """
    result = []
    current_string = []
    current_depth = 0

    for c in paren_string:
        if c == '(':
            current_depth += 1
            current_string.append(c)
        elif c == ')':
            current_depth -= 1
            current_string.append(c)

            if current_depth == 0:
                result.append(''.join(current_string))
                current_string.clear()

    return result


def test():
        assert separate_paren_groups("(()()) ((())) () ((())()())") ==

The model would then be expected to continue the line such that the test function succeeds.

What issue(s) does this change relate to?

Before submitting

Have you read the contributor guidelines?
Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
Did you update any related docs and document your change?
Did you update any related tests and add any new tests related to your change? (see testing)
Did you run the tests locally to make sure they pass?
Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

* Seed the fewshot sampling in the ICL datasets (mosaicml#2100) * merge * add ece for lm and mc * fetch upstream * fetch upstream * Apply suggestions from code review Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * incorporate comments * incorporate comments * de;ete multi gpu --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

composer/datasets/in_context_learning_evaluation.py

maxisawesome · 2023-11-15T00:11:54Z

Looks good to me. Once my branch is approved I'll probably redo it to inherit from other classes etc but seems good for now!

dakinggg · 2023-11-15T01:08:40Z

@maxisawesome @bmosaicml What do you want to do for merge order here? Should we merge Max's and then update this PR to use Max's? or the other way around?

maxisawesome · 2023-11-16T23:06:47Z

Let's merge this in, then I can rebase and format it into the new version in my PR. That seems easier to me.

bmosaicml and others added 9 commits April 26, 2023 18:32

merge dev

bd68416

Merge branch 'dev' into codetracing

3380be1

Merge branch 'dev' into codetracing

f11c49a

wip

a0a81b1

add execution pred

136e7de

pre commit

591a617

fix merge

0e2a325

fix merge

1b6c45e

bmosaicml requested a review from a team as a code owner October 19, 2023 20:47

bmosaicml and others added 6 commits October 19, 2023 16:48

Merge branch 'dev' into codetracing

e6965aa

Merge branch 'dev' into codetracing

303c881

Merge branch 'mosaicml:dev' into codetracing

fa3ff71

restore data

e4b6fda

fix bug

0188d1c

Merge branch 'dev' into codetracing

e8a3ef3

bmosaicml requested review from mcarbin and dakinggg November 2, 2023 20:44

fix indexing

e215f20

dakinggg requested a review from maxisawesome November 7, 2023 18:43

Merge branch 'dev' into codetracing

90dd3c3

maxisawesome reviewed Nov 14, 2023

View reviewed changes

composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved

dakinggg and others added 5 commits November 16, 2023 15:07

Merge branch 'dev' into codetracing

79e4d28

Merge branch 'dev' into codetracing

0b91d04

fix rng

0379366

finish

0ec201e

Merge branch 'dev' into codetracing

b844ae0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution Prediction #2659

Execution Prediction #2659

bmosaicml commented Oct 19, 2023 •

edited

maxisawesome commented Nov 15, 2023

dakinggg commented Nov 15, 2023

maxisawesome commented Nov 16, 2023

Execution Prediction #2659

Are you sure you want to change the base?

Execution Prediction #2659

Conversation

bmosaicml commented Oct 19, 2023 • edited

What does this PR do?

What issue(s) does this change relate to?

Before submitting

maxisawesome commented Nov 15, 2023

dakinggg commented Nov 15, 2023

maxisawesome commented Nov 16, 2023

bmosaicml commented Oct 19, 2023 •

edited