Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up tagged pointers in the evaluation stack #117139

Open
2 tasks
Tracked by #108219
Fidget-Spinner opened this issue Mar 21, 2024 · 3 comments
Open
2 tasks
Tracked by #108219

Set up tagged pointers in the evaluation stack #117139

Fidget-Spinner opened this issue Mar 21, 2024 · 3 comments
Assignees
Labels
topic-free-threading type-feature A feature request or enhancement

Comments

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Mar 21, 2024

Feature or enhancement

Proposal:

This issue mostly mirrors Mark's original one at faster-cpython/ideas#632.

As part of PEP 703, tagged pointers will help achieve deferred reference counting.

Tagged pointers will also help the default GIL builds in 3.14 and future with unboxed ints.

Here's the initial design (open for comments):

We change the evaluation stack from PyObject * to PyTaggedObject. PyTaggedObject looks like this.

typedef union PyTaggedObject {
    PyObject *obj;
    uintptr_t bits;
} PyTaggedObject;

Note: I am limiting the tagged pointers to just the evaluation stack (this is up for debate as well). The main reason is to keep the change as minimally invasive as possible for 3.13 and not leak out any implementation details for now. For example, I don't want to support tagged pointers in C API functions. This may/will change in 3.14.

Thanks to the cases generator, we can then autogenerate all the proper accessors to said object. For example, a stack effect of PyObject * will generate the code effect.obj, while a stack effect of PyTaggedObject (the new default) will generate the code (PyObject *)(effect.bits & ~TAGS) or something similar.

Unboxed ints will also help both free-threaded and non-free-threaded builds by reducing refcounting even more for simple arithmetic operations (and making integer operations native). However, I am aiming for deferred tags in 3.13, and anything else is a plus.

Further note: due to wanting to respect immediate finalizers, we are not deferring that many objects. So this may not be a win on default builds. On free-threaded builds, I see a slight (10%) speedup on fibonacci microbenchmarks. Will benchmark on the default bulid as well and see the results.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Tasks

@Fidget-Spinner Fidget-Spinner added the type-feature A feature request or enhancement label Mar 21, 2024
@Fidget-Spinner Fidget-Spinner self-assigned this Mar 21, 2024
@gvanrossum
Copy link
Member

So the mentions of unboxed ints are speculative and not part of the initial implementation, right?

And the tagged pointers just have their tag bit(s) removed when used in the "blob of C code" that's the instruction definition's body. For example,

        inst(STORE_FAST, (value --)) {
            SETLOCAL(oparg, value);
        }

would expand its body to

            SETLOCAL(oparg, (PyObject *)(value.bits & ~TAGS));

How do pointers receive their tag bit? Does every stack push just "or in" the tag bit(s)?

I recall trying to do this manually (before we had the generator) and it was a lot of work -- the code generator should make this much simpler. It also was quite a bit slower -- have you thought about this?

My biggest question is what does the tag bit mean? Is there a specific section of PEP 703 I should read to learn more about this?

@colesbury
Copy link
Contributor

My biggest question is what does the tag bit mean?

PEP 703 doesn't go into detail about the tag bits. I'd expect it to match the description in Mark's issue, except with unboxed ints not part of the initial implementation:

Tag Meaning Encoding
00 Normal reference (uintptr)ptr
01 Deferred reference ((uintptr)ptr) + 1

01 includes both immortal objects and those whose reference count has been deferred (but are not immortal).

I recall trying to do this manually... It also was quite a bit slower

We should only enable this in the free-threaded build to start. We do not want to introduce any performance regressions in the default build. If it turns out to be a net win for the default build, then we can consider enabling it there. Otherwise, we can revisit that in 3.14, if we pursue tagged integers.

How do pointers receive their tag bit? Does every stack push just "or in" the tag bit(s)?

Yeah, most pushes just "or in" the tag bit, which is an immortal check. So for, example GET_ITER currently has:

stack_pointer[-1] = iter;

That might be generated instead as:

#define PACK(obj) ((PyTaggedObject){.bits = (uintptr_t)obj | _Py_IsImmortal(obj)})

stack_pointer[-1] = PACK(iter);

For example, ... [STORE_FAST]

I think we want to use tagged pointers in the locals as well as the evaluation stack, but it may be easier to start with just using tagged pointers the evaluation stack. In that case STORE_FAST will need extra refcounting to convert a tagged pointer into a regular object reference. For example,

PyObject *obj = (PyObject *)(value.bits & ~TAGS);
if ((value.bits & IS_DEFERRED) {
  Py_INCREF(obj);
}
SETLOCAL(oparg, obj);

If we use tagged pointers for the locals as well, then we could skip the extra work in STORE_FAST.

Motivation

The primary motivation is to avoid reference count contention in multithreaded programs on things like functions/methods (in the free-threaded build). In particular, the benefit (from avoid reference counting contention) is going to be in LOAD_ATTR, LOAD_GLOBAL, and their variants.

For example, currently LOAD_ATTR_SLOT has code that looks like:

PyObject *attr = ...;
Py_INCREF(attr);
stack_pointer[-1] = attr;

We want the generated code to instead look something like the following, but probably refactored with helper macros/functions:

PyObject *attr = ...;
PyTaggedObject attr_tag;
if (_Py_IsImmortalOrDeferred(attr)) {
    attr_tag.bits = (uintptr_t)attr | IS_DEFERRED;
}
else {
    attr_tag.obj = attr;
}
stack_pointer[-1] = attr_tag;

@gvanrossum
Copy link
Member

Okay...

So from this definition

#define PACK(obj) ((PyTaggedObject){.bits = (uintptr_t)obj | _Py_IsImmortal(obj)})

I gather that the tag bit is (by default) only set if the object is immortal. From your Motivation here (and also from the section on deferred refcounting in PEP 703) I take it that the tag bit would also be set for selected objects like functions and methods.

Mark's original issue describes variants of INCREF/DECREF to check for the tag bit; presumably these variants should only be used in the interpreter.

@Fidget-Spinner Good luck!

Fidget-Spinner added a commit that referenced this issue Apr 30, 2024

---------

Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024

---------

Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants