New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C-API for signalling monitoring events #111997
Comments
Do you want to propose an API for that? |
PEP-669 seems to discuss a mechanism to modify the bytestream for the purpose of implementing monitoring. Would access to this mechanism also be provided as part of this API? I haven't seen too much mention of this feature elsewhere. Source: https://peps.python.org/pep-0669/#rationale
IMO it would be very useful for an API of this nature to expose low-level interpreter state to monitoring tools. For example, if we are able to see events like |
In order to map source/line/column information to an offset, we need a way to build a Lines 190 to 444 in d67f947
That's purely internal code. I don't think we need C-API support for this, but a Python implementation would be nice. Or a static method on code objects to build the byte string from a Python list of positions. Given the complexity of the format and the risk of future changes, I think it's in CPython's responsibility to generate this format from something user friendly.
/* source location information */
typedef struct {
int lineno;
int end_lineno;
int col_offset;
int end_col_offset;
} _PyCompilerSrcLocation; This looks like a function could help that accepts a sequence of tuples of these four values and maps it to a We'd probably still have to reimplement it in Cython in order to generate the line table string in "older" Python versions. But we could at least rely on it from CPython 3.13 on. |
@markshannon, could you please comment whether you consider this the right approach? To me, a line table seems required to generate source level monitoring events. |
@markshannon any comments? Given that Python 3.12 broke an entire C-API feature, I consider it a blocker for 3.13 to fix it. |
What C-API was broken? |
C extensions can already register callbacks and set events using the Python API, so I assume we are talking about adding an API for C extensions to fire events. There will be two parts to such an API:
Handling locationsApart from We also need a specification for the table mapping offsets to locations. The existing table format is quite fiddly, and can only be searched linearly, so designing a new table format might be a good idea. Firing events from C extension codeAny potential API has to be more complex than for We will need 16 bits of data for each event location, to track active tools and which tool/location pairs are disabled. We will need to initialize this data, and to specify which event location maps to which event. With that in mind, and using the typedef struct _PyMonitoringState {
uint8_t active;
uint8_t opaque;
} PyMonitoringState;
typedef struct _PyCodeLikeObject PyCodeLikeObject;
/* Returns a version, which should be stored and passed to the next call of _PyMonitioringEnterScope.
Arguments:
previous_version: Points to a per code-like object value. The value must be set to 0 before first call per code-like object.
state_array: The array of all PyMonitoringStates for this code-like object.
Should be initialized to all zeroes before first call.
event_types: Array of the event types describing the event type for each state in state array.
length: The length of the state and event arrays, which must be the same.
*/
void _PyMonitoringEnterScope(uint64_t *previous_version, PyMonitoringState *state_array, uint8_t event_types, uint32_t length);
int _PyMonitoring_FirePyStartEvent(PyCodeLikeObject *codelike, uint32_t offset, PyMonitoringState *state); If performance is going to be an issue, then the above could be implemented as inline functions. Example usage.Suppose we have a code-like object that that should fire two events when called, We would need to describe the events: static const uint8_t EVENTS = { PY_START, PY_RETURN }; And the code-like object would need an array for the state and a version number. struct MyCodeLike {
PyObject_HEAD
PyMonitoringState monitoring_state[2];
uint64_t version;
/* Other fields */
}; The C code for the function: void init_code(MyCodeLike *code)
{
code->monitoring_state = {0, 0};
code->version = 0;
}
PyObject *call_code(MyCodeLike *code)
{
_PyMonitoringEnterScope(&code->version, code->monitoring_state, EVENTS, 2);
...
_PyMonitoring_FirePyStartEvent(code, 0, &code->monitoring_state[0]);
...
_PyMonitoring_FirePyReturnEvent(code, 1, &code->monitoring_state[1]);
...
} |
@scoder, would such API work for you? |
Hmm, looks like I didn't receive any email notifications for this ticket although I created it, am subscribed to it, and got explicitly mentioned. No idea why. Sorry. I've re-subscribed, just in case. @markshannon, I've done some guesswork to fill in the gaps of your proposal. Please correct me if I misunderstood something.
Comments from me:
|
More or less, yes. It is changed when the set of events being monitored changes. Listener registrations do not change it.
The ID also serves as an index into the locations table.
Broadly, yes. There are up to 8 possible listeners for events, so if the state is non-zero it represents a bit vector of active listeners.
I'm assuming that the layout of the code-like struct is opaque to the VM, it is up to Cython/pybind11/mypyc, etc. how to lay it out, so we can't deduce the offset from the pointer to the monitoring state. Thus we need both.
From the (as yet unspecified) location table and the offset ID.
Would one location per code-like object be enough?
|
Probably not. If the API requires a "start scope", "work in scope", "end scope" kind of state keeping, then Cython's While it's probably possible for user code to juggle with different code-like objects for a single code scope, it would be nicer if the API kept code scope and file position independent from each other. |
What is |
It is defined by the third-party code (generator). |
(I got an answer offline) |
I've started working on an implementation in Cython and it seems that frames are still implied to exist for the I understand that the description makes sense from within CPython when targeting consumers, and Cython has long faked Python frames for its functions, but for external event generators, suggesting that Python execution frames exist just because there's a function start/end event or a generator-like function running seems rather demanding. The Limited C-API doesn't expose frames at all. Should we just update the documentation here to explain that frames only exist when events originate from the Python eval loop? I can't really say how consumers would deal with these events. There's always a frame on the stack, just not necessarily the one of the currently running code. That might become confusing. |
Regarding @encukou's question here, I agree that the states and versions fit more with the idea of a frame. Code objects are static for a given piece of code, whereas the states must be independent for each (parallel) call, and the versions are tied to them. Also, I noticed at some point that e.g. |
@iritkatriel Is this still a (deferred) release blocker? |
Feature or enhancement
Proposal:
Language implementations for the CPython runtime (Cython, JIT compilers, regular expression engines, template engines, etc.) need a way to signal PEP-669 monitoring events to the registered listeners.
We need a way to create events and inject them into the monitoring system. Since events have more than one signature, we might end up needing more than one C-API function for this.
We need a way to map 3D source code positions (file name, line, character) to 1D integer offsets. Code objects help, but branches might cross source file boundaries, so that's more tricky. For many use cases, a mapping between (line, character) positions and an integer offset would probably suffice, although templating languages usually provide some kind of include commands, as does Cython. There should be some help in the C-API for building up such a mapping.
The reason why I think we need CPython's help for mapping indices is that both sides, listeners and even producers, need to agree on the same mapping. Sadly, the events don't include line/character positions directly but only a single integer. So event producers need a way to produce a number that event listeners like coverage analysers can map back to a source code position.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/61?u=scoder
Linked PRs
The text was updated successfully, but these errors were encountered: