Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-API for signalling monitoring events #111997

Closed
scoder opened this issue Nov 12, 2023 · 18 comments
Closed

C-API for signalling monitoring events #111997

scoder opened this issue Nov 12, 2023 · 18 comments
Assignees
Labels
3.13 bugs and security fixes deferred-blocker topic-C-API type-feature A feature request or enhancement

Comments

@scoder
Copy link
Contributor

scoder commented Nov 12, 2023

Feature or enhancement

Proposal:

Language implementations for the CPython runtime (Cython, JIT compilers, regular expression engines, template engines, etc.) need a way to signal PEP-669 monitoring events to the registered listeners.

  1. We need a way to create events and inject them into the monitoring system. Since events have more than one signature, we might end up needing more than one C-API function for this.

  2. We need a way to map 3D source code positions (file name, line, character) to 1D integer offsets. Code objects help, but branches might cross source file boundaries, so that's more tricky. For many use cases, a mapping between (line, character) positions and an integer offset would probably suffice, although templating languages usually provide some kind of include commands, as does Cython. There should be some help in the C-API for building up such a mapping.

The reason why I think we need CPython's help for mapping indices is that both sides, listeners and even producers, need to agree on the same mapping. Sadly, the events don't include line/character positions directly but only a single integer. So event producers need a way to produce a number that event listeners like coverage analysers can map back to a source code position.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/61?u=scoder

Linked PRs

@scoder scoder added the type-feature A feature request or enhancement label Nov 12, 2023
@scoder
Copy link
Contributor Author

scoder commented Nov 12, 2023

@vstinner
Copy link
Member

Do you want to propose an API for that?

@drdavella
Copy link

PEP-669 seems to discuss a mechanism to modify the bytestream for the purpose of implementing monitoring. Would access to this mechanism also be provided as part of this API? I haven't seen too much mention of this feature elsewhere.

Source: https://peps.python.org/pep-0669/#rationale

The quickening mechanism provided by PEP 659 provides a way to dynamically modify executing Python bytecode. These modifications have little cost beyond the parts of the code that are modified and a relatively low cost to those parts that are modified. We can leverage this to provide an efficient mechanism for monitoring that was not possible in 3.10 or earlier.

IMO it would be very useful for an API of this nature to expose low-level interpreter state to monitoring tools. For example, if we are able to see events like INSTRUCTION, it would also be useful to have access to the operands involved.

@scoder
Copy link
Contributor Author

scoder commented Nov 21, 2023

Do you want to propose an API for that?

In order to map source/line/column information to an offset, we need a way to build a co_linetable for a code object: https://github.com/python/cpython/blob/main/Objects/locations.md
That's probably the most complex thing to do here. IIUC, CPython implements this in C here:

cpython/Python/assemble.c

Lines 190 to 444 in d67f947

/* Code location emitting code. See locations.md for a description of the format. */
#define MSB 0x80
static void
write_location_byte(struct assembler* a, int val)
{
PyBytes_AS_STRING(a->a_linetable)[a->a_location_off] = val&255;
a->a_location_off++;
}
static uint8_t *
location_pointer(struct assembler* a)
{
return (uint8_t *)PyBytes_AS_STRING(a->a_linetable) +
a->a_location_off;
}
static void
write_location_first_byte(struct assembler* a, int code, int length)
{
a->a_location_off += write_location_entry_start(
location_pointer(a), code, length);
}
static void
write_location_varint(struct assembler* a, unsigned int val)
{
uint8_t *ptr = location_pointer(a);
a->a_location_off += write_varint(ptr, val);
}
static void
write_location_signed_varint(struct assembler* a, int val)
{
uint8_t *ptr = location_pointer(a);
a->a_location_off += write_signed_varint(ptr, val);
}
static void
write_location_info_short_form(struct assembler* a, int length, int column, int end_column)
{
assert(length > 0 && length <= 8);
int column_low_bits = column & 7;
int column_group = column >> 3;
assert(column < 80);
assert(end_column >= column);
assert(end_column - column < 16);
write_location_first_byte(a, PY_CODE_LOCATION_INFO_SHORT0 + column_group, length);
write_location_byte(a, (column_low_bits << 4) | (end_column - column));
}
static void
write_location_info_oneline_form(struct assembler* a, int length, int line_delta, int column, int end_column)
{
assert(length > 0 && length <= 8);
assert(line_delta >= 0 && line_delta < 3);
assert(column < 128);
assert(end_column < 128);
write_location_first_byte(a, PY_CODE_LOCATION_INFO_ONE_LINE0 + line_delta, length);
write_location_byte(a, column);
write_location_byte(a, end_column);
}
static void
write_location_info_long_form(struct assembler* a, location loc, int length)
{
assert(length > 0 && length <= 8);
write_location_first_byte(a, PY_CODE_LOCATION_INFO_LONG, length);
write_location_signed_varint(a, loc.lineno - a->a_lineno);
assert(loc.end_lineno >= loc.lineno);
write_location_varint(a, loc.end_lineno - loc.lineno);
write_location_varint(a, loc.col_offset + 1);
write_location_varint(a, loc.end_col_offset + 1);
}
static void
write_location_info_none(struct assembler* a, int length)
{
write_location_first_byte(a, PY_CODE_LOCATION_INFO_NONE, length);
}
static void
write_location_info_no_column(struct assembler* a, int length, int line_delta)
{
write_location_first_byte(a, PY_CODE_LOCATION_INFO_NO_COLUMNS, length);
write_location_signed_varint(a, line_delta);
}
#define THEORETICAL_MAX_ENTRY_SIZE 25 /* 1 + 6 + 6 + 6 + 6 */
static int
write_location_info_entry(struct assembler* a, location loc, int isize)
{
Py_ssize_t len = PyBytes_GET_SIZE(a->a_linetable);
if (a->a_location_off + THEORETICAL_MAX_ENTRY_SIZE >= len) {
assert(len > THEORETICAL_MAX_ENTRY_SIZE);
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_linetable, len*2));
}
if (loc.lineno < 0) {
write_location_info_none(a, isize);
return SUCCESS;
}
int line_delta = loc.lineno - a->a_lineno;
int column = loc.col_offset;
int end_column = loc.end_col_offset;
assert(column >= -1);
assert(end_column >= -1);
if (column < 0 || end_column < 0) {
if (loc.end_lineno == loc.lineno || loc.end_lineno == -1) {
write_location_info_no_column(a, isize, line_delta);
a->a_lineno = loc.lineno;
return SUCCESS;
}
}
else if (loc.end_lineno == loc.lineno) {
if (line_delta == 0 && column < 80 && end_column - column < 16 && end_column >= column) {
write_location_info_short_form(a, isize, column, end_column);
return SUCCESS;
}
if (line_delta >= 0 && line_delta < 3 && column < 128 && end_column < 128) {
write_location_info_oneline_form(a, isize, line_delta, column, end_column);
a->a_lineno = loc.lineno;
return SUCCESS;
}
}
write_location_info_long_form(a, loc, isize);
a->a_lineno = loc.lineno;
return SUCCESS;
}
static int
assemble_emit_location(struct assembler* a, location loc, int isize)
{
if (isize == 0) {
return SUCCESS;
}
while (isize > 8) {
RETURN_IF_ERROR(write_location_info_entry(a, loc, 8));
isize -= 8;
}
return write_location_info_entry(a, loc, isize);
}
static int
assemble_location_info(struct assembler *a, instr_sequence *instrs,
int firstlineno)
{
a->a_lineno = firstlineno;
location loc = NO_LOCATION;
int size = 0;
for (int i = 0; i < instrs->s_used; i++) {
instruction *instr = &instrs->s_instrs[i];
if (!same_location(loc, instr->i_loc)) {
RETURN_IF_ERROR(assemble_emit_location(a, loc, size));
loc = instr->i_loc;
size = 0;
}
size += instr_size(instr);
}
RETURN_IF_ERROR(assemble_emit_location(a, loc, size));
return SUCCESS;
}
static void
write_instr(_Py_CODEUNIT *codestr, instruction *instr, int ilen)
{
int opcode = instr->i_opcode;
assert(!IS_PSEUDO_INSTR(opcode));
int oparg = instr->i_oparg;
assert(OPCODE_HAS_ARG(opcode) || oparg == 0);
int caches = _PyOpcode_Caches[opcode];
switch (ilen - caches) {
case 4:
codestr->op.code = EXTENDED_ARG;
codestr->op.arg = (oparg >> 24) & 0xFF;
codestr++;
/* fall through */
case 3:
codestr->op.code = EXTENDED_ARG;
codestr->op.arg = (oparg >> 16) & 0xFF;
codestr++;
/* fall through */
case 2:
codestr->op.code = EXTENDED_ARG;
codestr->op.arg = (oparg >> 8) & 0xFF;
codestr++;
/* fall through */
case 1:
codestr->op.code = opcode;
codestr->op.arg = oparg & 0xFF;
codestr++;
break;
default:
Py_UNREACHABLE();
}
while (caches--) {
codestr->op.code = CACHE;
codestr->op.arg = 0;
codestr++;
}
}
/* assemble_emit_instr()
Extend the bytecode with a new instruction.
Update lnotab if necessary.
*/
static int
assemble_emit_instr(struct assembler *a, instruction *instr)
{
Py_ssize_t len = PyBytes_GET_SIZE(a->a_bytecode);
_Py_CODEUNIT *code;
int size = instr_size(instr);
if (a->a_offset + size >= len / (int)sizeof(_Py_CODEUNIT)) {
if (len > PY_SSIZE_T_MAX / 2) {
return ERROR;
}
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_bytecode, len * 2));
}
code = (_Py_CODEUNIT *)PyBytes_AS_STRING(a->a_bytecode) + a->a_offset;
a->a_offset += size;
write_instr(code, instr, size);
return SUCCESS;
}
static int
assemble_emit(struct assembler *a, instr_sequence *instrs,
int first_lineno, PyObject *const_cache)
{
RETURN_IF_ERROR(assemble_init(a, first_lineno));
for (int i = 0; i < instrs->s_used; i++) {
instruction *instr = &instrs->s_instrs[i];
RETURN_IF_ERROR(assemble_emit_instr(a, instr));
}
RETURN_IF_ERROR(assemble_location_info(a, instrs, a->a_lineno));
RETURN_IF_ERROR(assemble_exception_table(a, instrs));
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_except_table, a->a_except_table_off));
RETURN_IF_ERROR(_PyCompile_ConstCacheMergeOne(const_cache, &a->a_except_table));
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_linetable, a->a_location_off));
RETURN_IF_ERROR(_PyCompile_ConstCacheMergeOne(const_cache, &a->a_linetable));
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_bytecode, a->a_offset * sizeof(_Py_CODEUNIT)));
RETURN_IF_ERROR(_PyCompile_ConstCacheMergeOne(const_cache, &a->a_bytecode));
return SUCCESS;
}

That's purely internal code. I don't think we need C-API support for this, but a Python implementation would be nice. Or a static method on code objects to build the byte string from a Python list of positions. Given the complexity of the format and the risk of future changes, I think it's in CPython's responsibility to generate this format from something user friendly.

compile.h defines the location like this:

/* source location information */
typedef struct {
    int lineno;
    int end_lineno;
    int col_offset;
    int end_col_offset;
} _PyCompilerSrcLocation;

This looks like a function could help that accepts a sequence of tuples of these four values and maps it to a bytes object with the corresponding line table in it.

We'd probably still have to reimplement it in Cython in order to generate the line table string in "older" Python versions. But we could at least rely on it from CPython 3.13 on.

@scoder
Copy link
Contributor Author

scoder commented Nov 22, 2023

@markshannon, could you please comment whether you consider this the right approach? To me, a line table seems required to generate source level monitoring events.

@scoder
Copy link
Contributor Author

scoder commented Dec 14, 2023

@markshannon any comments? Given that Python 3.12 broke an entire C-API feature, I consider it a blocker for 3.13 to fix it.

@pitrou pitrou added release-blocker 3.13 bugs and security fixes labels Dec 20, 2023
@markshannon
Copy link
Member

What C-API was broken?
AFAICT there was no API to call the tstate->c_tracefunc and tstate->c_profilefunc functions.
How were you calling those functions, and why does it no longer work?

@markshannon
Copy link
Member

C extensions can already register callbacks and set events using the Python API, so I assume we are talking about adding an API for C extensions to fire events.

There will be two parts to such an API:

  1. Specifying the mapping from "code objects" and offsets to full locations.
  2. Firing events from C extension code.

Handling locations

Apart from LINE events, all event callback functions include code: CodeType, offset: int. Let's not force C extensions to build a full code objects, so we should relax code to be a "code like" object. We define exactly what "code like" means later.

We also need a specification for the table mapping offsets to locations. The existing table format is quite fiddly, and can only be searched linearly, so designing a new table format might be a good idea.

Firing events from C extension code

Any potential API has to be more complex than for sys.settrace as we need to handle disabling of event locations and multiple tools.

We will need 16 bits of data for each event location, to track active tools and which tool/location pairs are disabled.

We will need to initialize this data, and to specify which event location maps to which event.

With that in mind, and using the PY_START event as an example, an API could look something like this:

typedef struct _PyMonitoringState {
    uint8_t active;
    uint8_t opaque;
} PyMonitoringState;

typedef struct _PyCodeLikeObject PyCodeLikeObject;

/* Returns a version, which should be stored and passed to the next call of _PyMonitioringEnterScope.
   Arguments:
       previous_version: Points to a per code-like object value. The value must be set to 0 before first call per code-like object.
       state_array: The array of all PyMonitoringStates for this code-like object. 
                    Should be initialized to all zeroes before first call.
       event_types: Array of the event types describing the event type for each state in state array.
       length: The length of the state and event arrays, which must be the same.
*/
void _PyMonitoringEnterScope(uint64_t *previous_version, PyMonitoringState *state_array, uint8_t event_types, uint32_t length);

int _PyMonitoring_FirePyStartEvent(PyCodeLikeObject *codelike, uint32_t offset, PyMonitoringState *state);

If performance is going to be an issue, then the above could be implemented as inline functions.
We would lose ABI compatibility, but would still retain API compatibility across CPython versions.

Example usage.

Suppose we have a code-like object that that should fire two events when called, PY_START then PY_RETURN.

We would need to describe the events:

static const uint8_t EVENTS = { PY_START, PY_RETURN };

And the code-like object would need an array for the state and a version number.

struct MyCodeLike {
    PyObject_HEAD
    PyMonitoringState monitoring_state[2];
    uint64_t version;
    /* Other fields */
};

The C code for the function:

void init_code(MyCodeLike *code)
{
     code->monitoring_state = {0, 0};
     code->version = 0;
}

PyObject *call_code(MyCodeLike *code)
{
    _PyMonitoringEnterScope(&code->version, code->monitoring_state, EVENTS, 2);
    ...
    _PyMonitoring_FirePyStartEvent(code, 0, &code->monitoring_state[0]);
    ...
    _PyMonitoring_FirePyReturnEvent(code, 1, &code->monitoring_state[1]);
    ...
}

@encukou
Copy link
Member

encukou commented Feb 6, 2024

@scoder, would such API work for you?

@scoder
Copy link
Contributor Author

scoder commented Feb 22, 2024

Hmm, looks like I didn't receive any email notifications for this ticket although I created it, am subscribed to it, and got explicitly mentioned. No idea why. Sorry. I've re-subscribed, just in case.

@markshannon, I've done some guesswork to fill in the gaps of your proposal. Please correct me if I misunderstood something.

  • The version is essentially a global counter of listener registrations which lead to different event types being handled (or not) over time.
  • Each monitoring state refers to a specific "code position" that can produce events. The corresponding event type is given by the EVENTS array at the same index as the entry in the monitoring state array.
  • The EnterScope call will look at the version, and if it was updated since the last call with these states, re-initialises the monitoring state array, marking currently audited events as active based on their event type. Otherwise, returns quickly.
  • Each Fire call then receives the code-like and essentially an integer ID as offset for a given "code position", which refers to a specific entry in the monitoring state array.
  • The Fire function looks if the monitoring state is marked as active and if so, creates and pushes an event with that offset ID to the listeners.

Comments from me:

  • I fail to see why you need both the pointer to the array entry and its offset in the Fire calls.
  • The code-like struct would need a file path field as well, to map events to source files (if applicable).
  • Lines are missing from the API so far. They could be stored as a separate array in MyCodeLike, preferably as a pointer since they won't always be used (not every such code-like would refer to a physically existing file).
  • How will listeners know the file and line number from the notification?
  • It looks like MyCodeLike could just be zeroed out with memset() for initialisation. This could be done by a C-API function, given the number of states in the struct. Basically, CPython could provide your init_code() function as PyMonitoring_CreateCodeLike(int states, const int** line_numbers. PyObject *source_file_path). Maybe with a simpler "no file nor lines" variant that only takes the number of states.

@markshannon
Copy link
Member

The version is essentially a global counter of listener registrations which lead to different event types being handled (or not) over time.

More or less, yes. It is changed when the set of events being monitored changes. Listener registrations do not change it.

Each Fire call then receives the code-like and essentially an integer ID as offset for a given "code position", which refers to a specific entry in the monitoring state array.

The ID also serves as an index into the locations table.

The Fire function looks if the monitoring state is marked as active and if so, creates and pushes an event with that offset ID to the listeners.

Broadly, yes. There are up to 8 possible listeners for events, so if the state is non-zero it represents a bit vector of active listeners.

I fail to see why you need both the pointer to the array entry and its offset in the Fire calls.

I'm assuming that the layout of the code-like struct is opaque to the VM, it is up to Cython/pybind11/mypyc, etc. how to lay it out, so we can't deduce the offset from the pointer to the monitoring state. Thus we need both.

How will listeners know the file and line number from the notification?"

From the (as yet unspecified) location table and the offset ID.

The code-like struct would need a file path field as well, to map events to source files (if applicable).

Would one location per code-like object be enough?

It looks like MyCodeLike could just be zeroed out with memset() for initialisation.

MyCodeLike would include location information, but we can design the monitoring state struct so that it should be initialized to zero.

@scoder
Copy link
Contributor Author

scoder commented Feb 26, 2024

Would one location per code-like object be enough?

Probably not. If the API requires a "start scope", "work in scope", "end scope" kind of state keeping, then Cython's include statement would get in the way, which allows injecting source code from other files. And include features are probably common in templating languages etc.

While it's probably possible for user code to juggle with different code-like objects for a single code scope, it would be nicer if the API kept code scope and file position independent from each other.

@iritkatriel
Copy link
Member

typedef struct _PyCodeLikeObject PyCodeLikeObject;

What is _PyCodeLikeObject and where is it defined? I'm confused by this and the later discussions on MyCodeLike (which is user-defined).

@iritkatriel iritkatriel self-assigned this Mar 4, 2024
@markshannon
Copy link
Member

What is _PyCodeLikeObject and where is it defined? I'm confused by this and the later discussions on MyCodeLike (which is user-defined).

It is defined by the third-party code (generator).
It can be any object, but will need to support certain attributes and methods so that it looks like a Code object to coverage, profile, inspect, etc.

@iritkatriel
Copy link
Member

iritkatriel commented Mar 4, 2024

  • The EnterScope call will look at the version, and if it was updated since the last call with these states, re-initialises the monitoring state array, marking currently audited events as active based on their event type. Otherwise, returns quickly.

How does it determine that the version changed since last time (where does it store the previous version)?

(I got an answer offline)

@scoder
Copy link
Contributor Author

scoder commented Apr 15, 2024

I've started working on an implementation in Cython and it seems that frames are still implied to exist for the PY_START, PY_RETURN and PY_YIELD events, according to https://docs.python.org/3/library/sys.monitoring.html#events

I understand that the description makes sense from within CPython when targeting consumers, and Cython has long faked Python frames for its functions, but for external event generators, suggesting that Python execution frames exist just because there's a function start/end event or a generator-like function running seems rather demanding. The Limited C-API doesn't expose frames at all.

Should we just update the documentation here to explain that frames only exist when events originate from the Python eval loop?

I can't really say how consumers would deal with these events. There's always a frame on the stack, just not necessarily the one of the currently running code. That might become confusing.

@scoder
Copy link
Contributor Author

scoder commented Apr 15, 2024

Regarding @encukou's question here, I agree that the states and versions fit more with the idea of a frame. Code objects are static for a given piece of code, whereas the states must be independent for each (parallel) call, and the versions are tied to them.

Also, I noticed at some point that e.g. coverage uses dicts to map code objects to collected runtime state. Thus, the code object really needs to be the identical object across calls, which rules out keeping local execution state in it.
(EDIT: This could be worked around by adding hash+eq to the code-like object implementation – making it even more complex. And assuming that tools really use hash+eq and not identity matches, as would be possible for Python functions.)

@gvanrossum
Copy link
Member

@iritkatriel Is this still a (deferred) release blocker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes deferred-blocker topic-C-API type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

10 participants