Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The regex package isn't thread safe (tracking issue) #7026

Open
makyen opened this issue May 26, 2022 · 4 comments
Open

The regex package isn't thread safe (tracking issue) #7026

makyen opened this issue May 26, 2022 · 4 comments
Assignees
Labels
area: thread safety status: confirmed Confirmed as something that needs working on. type: bug Aaaah! Kill it!

Comments

@makyen
Copy link
Contributor

makyen commented May 26, 2022

I ended up taking a look at the code for the regex package. There are some issues in it which make it not thread safe. I've created the issue Thread safety: need lock for both read and write (_cache and other global variables) in that repository. This issue is for tracking it here.

@makyen makyen self-assigned this May 26, 2022
@makyen
Copy link
Contributor Author

makyen commented Jun 10, 2022

I've forked the regex package into Charcoal-SE/mrabarnett--mrab-regex and added some changes for thread safety. I need to do more review of the code for the package and coordinate with the author of the package wrt. what changes are needed and getting those changes into the base package.

In the meantime, in PR #7074, I've proposed changing requirements.txt to use the Charcoal-SE version of the package.

@makyen
Copy link
Contributor Author

makyen commented Jun 11, 2022

I've taken a deeper look at the regex package's C code. It appears that a regex that's actually in use is not thread safe (i.e. two threads can't safely perform an operation with the same compiled regex pattern at the same time).

I thought I had accounted for that possibility in the changes I made, which are mentioned above, by always using copy.deepcopy() to copy compiled patterns into and out of the implicit _cache. However, I've now found out that the C code in regex which implements __deepcopy__, the function which is used to perform a copy.deepcopy(), really just returns a reference to the original compiled regex without performing a copy. In other words, making a copy of a regex just gives you a reference to the original regex object. So, the above mentioned changes need some work.

@stale stale bot added the status: stale label Aug 13, 2022
@stale
Copy link

stale bot commented Nov 2, 2022

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

@stale stale bot closed this as completed Nov 2, 2022
@makyen makyen reopened this Nov 2, 2022
@stale stale bot removed the status: stale label Nov 2, 2022
@makyen makyen added the status: confirmed Confirmed as something that needs working on. label Nov 2, 2022
@makyen
Copy link
Contributor Author

makyen commented Nov 2, 2022

I'm not certain at this point that there's actually a thread safety issue within the C code for the regex package. I have put more work into this and taken a deeper dive through the regex package's C code. I have not found a specific thread safety issue which would be affecting us.

What I have seen is that there are substantial errors when running the CI test using a large number of different processes. Those errors go away if workarounds are in place to be quite careful not to use the regex package in a way that could invoke a thread safety issue. OTOH, I haven't demonstrated that errors are not removed as a result of the relatively minor slowdown which the workarounds introduce.

Basically, I still need to perform more investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: thread safety status: confirmed Confirmed as something that needs working on. type: bug Aaaah! Kill it!
Development

No branches or pull requests

1 participant