Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls. #118322

Merged
merged 19 commits into from May 4, 2024

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Apr 26, 2024

There is a bug in the tier 2 code, and this needs to be benchmarked.
Bug is fixed. See below for benchmarking and stats

@brandtbucher
Copy link
Member

Finally found the bug (took me way too long). CALL_PY_GENERAL isn't checking the function version anymore, but we need to in order to trace through calls correctly.

This fixes it:

diff --git a/Python/bytecodes.c b/Python/bytecodes.c
index c54763ce719..689bb9cbb57 100644
--- a/Python/bytecodes.c
+++ b/Python/bytecodes.c
@@ -3171,15 +3171,10 @@ dummy_func(
             frame->return_offset = (uint16_t)(1 + INLINE_CACHE_ENTRIES_CALL);
         }
 
-        op(_CHECK_IS_FUNCTION, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {
-            DEOPT_IF(!PyFunction_Check(callable));
-        }
-
         macro(CALL_PY_GENERAL) =
             unused/1 + // Skip over the counter
-            unused/2 +
             _CHECK_PEP_523 +
-            _CHECK_IS_FUNCTION +
+            _CHECK_FUNCTION +
             _CALL_PY_GENERAL +
             _PUSH_FRAME;
 

@markshannon
Copy link
Member Author

markshannon commented May 3, 2024

I created an issue for this: #118540, but I'll do the simpler fix you suggest for 3.13.

@markshannon markshannon marked this pull request as ready for review May 3, 2024 10:23
@markshannon
Copy link
Member Author

Stats look good.

~13% reduction in tier 1 instructions executed and a ~5% increase in tier 2 uops executed with an increase of uops/trace from 26.3 to 28.5 and 3% reduction in the number of traces executed.

@markshannon
Copy link
Member Author

Tier 1 performance is neutral (1.00x faster)

Lib/test/test_call.py Outdated Show resolved Hide resolved
if (new_frame == NULL) {
ERROR_NO_POP();
}
frame->return_offset = (uint16_t)(1 + INLINE_CACHE_ENTRIES_CALL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use _SAVE_RETURN_OFFSET for this bit?

Comment on lines 3215 to 3218
op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {
DEOPT_IF(PyFunction_Check(callable));
DEOPT_IF(Py_TYPE(callable) == &PyMethod_Type);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these are better as exits so we can handle more cases? If we trace through this, it'd be a shame if we couldn't then stitch a function or method into the call site if its polymorphic.

Suggested change
op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {
DEOPT_IF(PyFunction_Check(callable));
DEOPT_IF(Py_TYPE(callable) == &PyMethod_Type);
}
op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {
EXIT_IF(PyFunction_Check(callable));
EXIT_IF(Py_TYPE(callable) == &PyMethod_Type);
}

@@ -3226,40 +3326,6 @@ dummy_func(
_SAVE_RETURN_OFFSET +
_PUSH_FRAME;

inst(CALL_PY_WITH_DEFAULTS, (unused/1, func_version/2, callable, self_or_null, args[oparg] -- unused)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, this isn't worth specializing anymore? I forget exactly why it can't be converted to tier two (I'm guessing that there's more than one reason, since the use of this_instr can be easily worked around).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons:

  1. Because this loops over both arguments and defaults, it probably isn't much faster than CALL_PY_GENERAL for tier 1. It is almost certainly slower in the JIT because of its size. CALL_PY_GENERAL calls a help function, so is more compact.
  2. We expect that, for 3.14, the tier 2 optimizer will convert them both to the same optimal sequence of operations.

Python/ceval.c Outdated
@@ -107,7 +107,7 @@ static void
dump_stack(_PyInterpreterFrame *frame, PyObject **stack_pointer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes in this function intentional, or left over from debugging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left over from debugging, I needed them as dump_stack was too slow for a realistic number (many millions) of instructions for finding bugs.

It might be worth making this change, but that's for another PR.

@@ -1789,8 +1789,7 @@ specialize_class_call(PyObject *callable, _Py_CODEUNIT *instr, int nargs)
return -1;
}
if (Py_TYPE(tp) != &PyType_Type) {
SPECIALIZATION_FAIL(CALL, SPEC_FAIL_CALL_METACLASS);
return -1;
goto generic;
}
if (tp->tp_new == PyBaseObject_Type.tp_new) {
PyFunctionObject *init = get_init_for_simple_managed_python_class(tp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH won't let me comment outside the diff, but below this line there's a failure path if we're out of type versions. We could go generic in that case, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I don't want to hide any errors.

Comment on lines +1872 to 1876
int argcount = -1;
if (kind == SPEC_FAIL_CODE_NOT_OPTIMIZED) {
SPECIALIZATION_FAIL(CALL, SPEC_FAIL_CODE_NOT_OPTIMIZED);
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CALL_PY_GENERAL can handle this, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't want to hide these cases, otherwise we will never know if they are a problem.

We could record stats for failed specialization that have a fall back, like this case, but I think that's a bit too intrusive for 3.13.

Comment on lines 1880 to 1883
int version = _PyFunction_GetVersionForCurrentState(func);
if (version == 0) {
SPECIALIZATION_FAIL(CALL, SPEC_FAIL_OUT_OF_VERSIONS);
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we fix the version checks in tier two, we could handle this too, right? Since we don't care about function version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -1789,8 +1789,7 @@ specialize_class_call(PyObject *callable, _Py_CODEUNIT *instr, int nargs)
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle this with CALL_NON_PY_GENERAL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in specialize_class_call.
It keeps the logic cleaner if we don't make assumptions about how specialize_class_call could fail here.

}
#endif // Py_STATS


void
_Py_Specialize_Call(PyObject *callable, _Py_CODEUNIT *instr, int nargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool. Am I correct that we are now (in theory) able to specialize all calls, except for those where:

  • We're out of versions somewhere.
  • The call itself is invalid.

And even these can be specialized, even if it doesn't make a whole lot of sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, more or less. There might be a few other cases, but they are all very rare.

@@ -718,7 +727,7 @@ dummy_func(void) {
if (first_valid_check_stack == NULL) {
first_valid_check_stack = corresponding_check_stack;
}
else {
else if (corresponding_check_stack) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corresponding_check_stack may be NULL for PY_CALL_GENERAL.

@markshannon markshannon merged commit 1ab6356 into python:main May 4, 2024
51 of 52 checks passed
SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
…etter tier 2 support of calls. (pythonGH-118322)

* Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations.

* Remove CALL_PY_WITH_DEFAULTS specialization

* Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants