GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls. #118322

markshannon · 2024-04-26T13:54:50Z

~~There is a bug in the tier 2 code, and this needs to be benchmarked.~~
Bug is fixed. See below for benchmarking and stats

Issue: Increase the number of micro-ops that we can handle in tier 2 #118095

…alize

brandtbucher · 2024-05-03T00:35:55Z

Finally found the bug (took me way too long). CALL_PY_GENERAL isn't checking the function version anymore, but we need to in order to trace through calls correctly.

This fixes it:

diff --git a/Python/bytecodes.c b/Python/bytecodes.c
index c54763ce719..689bb9cbb57 100644
--- a/Python/bytecodes.c
+++ b/Python/bytecodes.c
@@ -3171,15 +3171,10 @@ dummy_func(
             frame->return_offset = (uint16_t)(1 + INLINE_CACHE_ENTRIES_CALL);
         }
 
-        op(_CHECK_IS_FUNCTION, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {
-            DEOPT_IF(!PyFunction_Check(callable));
-        }
-
         macro(CALL_PY_GENERAL) =
             unused/1 + // Skip over the counter
-            unused/2 +
             _CHECK_PEP_523 +
-            _CHECK_IS_FUNCTION +
+            _CHECK_FUNCTION +
             _CALL_PY_GENERAL +
             _PUSH_FRAME;

markshannon · 2024-05-03T09:05:57Z

I created an issue for this: #118540, but I'll do the simpler fix you suggest for 3.13.

…h calls.

markshannon · 2024-05-03T15:39:27Z

Stats look good.

~13% reduction in tier 1 instructions executed and a ~5% increase in tier 2 uops executed with an increase of uops/trace from 26.3 to 28.5 and 3% reduction in the number of traces executed.

markshannon · 2024-05-03T18:51:54Z

Tier 1 performance is neutral (1.00x faster)

Lib/test/test_call.py

brandtbucher · 2024-05-03T23:15:54Z

Python/bytecodes.c

+            if (new_frame == NULL) {
+                ERROR_NO_POP();
+            }
+            frame->return_offset = (uint16_t)(1 + INLINE_CACHE_ENTRIES_CALL);


Could we use _SAVE_RETURN_OFFSET for this bit?

brandtbucher · 2024-05-03T23:19:35Z

Python/bytecodes.c

+        op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {
+            DEOPT_IF(PyFunction_Check(callable));
+            DEOPT_IF(Py_TYPE(callable) == &PyMethod_Type);
+        }


Maybe these are better as exits so we can handle more cases? If we trace through this, it'd be a shame if we couldn't then stitch a function or method into the call site if its polymorphic.

Suggested change

op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {

DEOPT_IF(PyFunction_Check(callable));

DEOPT_IF(Py_TYPE(callable) == &PyMethod_Type);

}

op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {

EXIT_IF(PyFunction_Check(callable));

EXIT_IF(Py_TYPE(callable) == &PyMethod_Type);

}

brandtbucher · 2024-05-03T23:22:14Z

Python/bytecodes.c

@@ -3226,40 +3326,6 @@ dummy_func(
            _SAVE_RETURN_OFFSET +
            _PUSH_FRAME;

-        inst(CALL_PY_WITH_DEFAULTS, (unused/1, func_version/2, callable, self_or_null, args[oparg] -- unused)) {


Just curious, this isn't worth specializing anymore? I forget exactly why it can't be converted to tier two (I'm guessing that there's more than one reason, since the use of this_instr can be easily worked around).

Two reasons:

Because this loops over both arguments and defaults, it probably isn't much faster than CALL_PY_GENERAL for tier 1. It is almost certainly slower in the JIT because of its size. CALL_PY_GENERAL calls a help function, so is more compact.

We expect that, for 3.14, the tier 2 optimizer will convert them both to the same optimal sequence of operations.

brandtbucher · 2024-05-03T23:23:03Z

Python/ceval.c

@@ -107,7 +107,7 @@ static void
 dump_stack(_PyInterpreterFrame *frame, PyObject **stack_pointer)


Are the changes in this function intentional, or left over from debugging?

Left over from debugging, I needed them as dump_stack was too slow for a realistic number (many millions) of instructions for finding bugs.

It might be worth making this change, but that's for another PR.

brandtbucher · 2024-05-03T23:31:14Z

Python/specialize.c

@@ -1789,8 +1789,7 @@ specialize_class_call(PyObject *callable, _Py_CODEUNIT *instr, int nargs)
        return -1;
    }
    if (Py_TYPE(tp) != &PyType_Type) {
-        SPECIALIZATION_FAIL(CALL, SPEC_FAIL_CALL_METACLASS);
-        return -1;
+        goto generic;
    }
    if (tp->tp_new == PyBaseObject_Type.tp_new) {
        PyFunctionObject *init = get_init_for_simple_managed_python_class(tp);


GH won't let me comment outside the diff, but below this line there's a failure path if we're out of type versions. We could go generic in that case, right?

We could, but I don't want to hide any errors.

brandtbucher · 2024-05-03T23:33:01Z

Python/specialize.c

+    int argcount = -1;
+    if (kind == SPEC_FAIL_CODE_NOT_OPTIMIZED) {
+        SPECIALIZATION_FAIL(CALL, SPEC_FAIL_CODE_NOT_OPTIMIZED);
        return -1;
    }


CALL_PY_GENERAL can handle this, right?

Yes, but I don't want to hide these cases, otherwise we will never know if they are a problem.

We could record stats for failed specialization that have a fall back, like this case, but I think that's a bit too intrusive for 3.13.

brandtbucher · 2024-05-03T23:33:57Z

Python/specialize.c

    int version = _PyFunction_GetVersionForCurrentState(func);
    if (version == 0) {
        SPECIALIZATION_FAIL(CALL, SPEC_FAIL_OUT_OF_VERSIONS);
        return -1;


Once we fix the version checks in tier two, we could handle this too, right? Since we don't care about function version?

brandtbucher · 2024-05-03T23:36:08Z

Python/specialize.c

@@ -1789,8 +1789,7 @@ specialize_class_call(PyObject *callable, _Py_CODEUNIT *instr, int nargs)
        return -1;


Can we handle this with CALL_NON_PY_GENERAL?

Yes, but in specialize_class_call.
It keeps the logic cleaner if we don't make assumptions about how specialize_class_call could fail here.

brandtbucher · 2024-05-03T23:37:09Z

Python/specialize.c

 }
-#endif   // Py_STATS
-

 void
 _Py_Specialize_Call(PyObject *callable, _Py_CODEUNIT *instr, int nargs)


This is really cool. Am I correct that we are now (in theory) able to specialize all calls, except for those where:

We're out of versions somewhere.

The call itself is invalid.

And even these can be specialized, even if it doesn't make a whole lot of sense.

Yes, more or less. There might be a few other cases, but they are all very rare.

markshannon · 2024-05-04T08:05:52Z

Python/optimizer_bytecodes.c

@@ -718,7 +727,7 @@ dummy_func(void) {
        if (first_valid_check_stack == NULL) {
            first_valid_check_stack = corresponding_check_stack;
        }
-        else {
+        else if (corresponding_check_stack) {


corresponding_check_stack may be NULL for PY_CALL_GENERAL.

…etter tier 2 support of calls. (pythonGH-118322) * Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations. * Remove CALL_PY_WITH_DEFAULTS specialization * Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize

bedevere-app bot mentioned this pull request Apr 26, 2024

Increase the number of micro-ops that we can handle in tier 2 #118095

Open

markshannon added the skip news label Apr 26, 2024

markshannon force-pushed the tier-2-call branch 2 times, most recently from c6c1f31 to dce9b23 Compare May 1, 2024 14:59

markshannon added 6 commits May 2, 2024 16:20

Add CALL_PY_GENERAL and CALL_BOUND_METHOD_GENERAL call specializations.

0b157ab

Add CALL_NON_PY_GENERAL call specialization

2262f98

Remove CALL_PY_WITH_DEFAULTS specialization

4af43c2

Fix JIT build

b7795e3

Use CALL_NON_PY_GENERAL in more cases when otherwise failing to speci…

e5c7dce

…alize

Delete unused code

61abb3b

markshannon force-pushed the tier-2-call branch from dce9b23 to 61abb3b Compare May 2, 2024 15:51

markshannon added 4 commits May 3, 2024 10:20

Make sure that function version number is checked when tracing throug…

b49bc86

…h calls.

whitespace

f42519d

Merge branch 'main' into tier-2-call

3294632

Check verion on function, not method

1a2dcfe

markshannon marked this pull request as ready for review May 3, 2024 10:23

markshannon requested review from Fidget-Spinner and gvanrossum as code owners May 3, 2024 10:23

bedevere-app bot added the awaiting core review label May 3, 2024

Lower WASI C recursion limit again

62ff43c

markshannon requested a review from ericsnowcurrently as a code owner May 3, 2024 10:38

markshannon added 5 commits May 3, 2024 12:38

Get test passing on wasi

71c6d41

Lower WASI C recursion limit again

0b40773

Yet another attempt to get the WASI build to work

02cdf2f

Skip test for WASI

1831eb3

Skip another test for WASI

7819b1c

vstinner mentioned this pull request May 3, 2024

test_dynamic fails on wasm32 WASI 8Core 3.x buildbot #117645

Closed

brandtbucher approved these changes May 3, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels May 3, 2024

markshannon commented May 4, 2024

View reviewed changes

markshannon added 3 commits May 4, 2024 09:21

Address some review comments

857d152

Use _SAVE_RETURN_OFFSET in general call instructions

839d16e

Merge branch 'main' into tier-2-call

1d5c0a7

markshannon merged commit 1ab6356 into python:main May 4, 2024
51 of 52 checks passed

bedevere-app bot removed the awaiting merge label May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls. #118322

GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls. #118322

markshannon commented Apr 26, 2024 •

edited

brandtbucher commented May 3, 2024

markshannon commented May 3, 2024 •

edited

markshannon commented May 3, 2024

markshannon commented May 3, 2024

brandtbucher May 3, 2024

brandtbucher May 3, 2024

brandtbucher May 3, 2024

markshannon May 4, 2024

brandtbucher May 3, 2024

markshannon May 4, 2024

brandtbucher May 3, 2024

markshannon May 4, 2024

brandtbucher May 3, 2024

markshannon May 4, 2024

brandtbucher May 3, 2024

markshannon May 4, 2024

brandtbucher May 3, 2024

markshannon May 4, 2024 •

edited

brandtbucher May 3, 2024 •

edited

markshannon May 4, 2024

markshannon May 4, 2024

		@@ -107,7 +107,7 @@ static void
		dump_stack(_PyInterpreterFrame frame, PyObject *stack_pointer)

		@@ -1789,8 +1789,7 @@ specialize_class_call(PyObject callable, _Py_CODEUNIT instr, int nargs)
		return -1;

GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls. #118322

GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls. #118322

Conversation

markshannon commented Apr 26, 2024 • edited

brandtbucher commented May 3, 2024

markshannon commented May 3, 2024 • edited

markshannon commented May 3, 2024

markshannon commented May 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon May 4, 2024 • edited

Choose a reason for hiding this comment

brandtbucher May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon commented Apr 26, 2024 •

edited

markshannon commented May 3, 2024 •

edited

markshannon May 4, 2024 •

edited

brandtbucher May 3, 2024 •

edited