Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-116738: Make _csv module thread-safe #118344

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

aisk
Copy link
Member

@aisk aisk commented Apr 27, 2024

This extension module has some module state values which may be read or written in different threads:

  • field_limit: A simple integer value to store the maximum field size for the parser;
  • dialects: A dict value to store all the dialects by string names.

field_limit

There is a public function field_size_limit that can read and write it, using a critical section to protect it. For other internal usages, _Py_atomic_load_int32 is used to make it thread safe.

dialects

There are three public functions get_dialect, unregister_dialect and list_dialects that can access this field. For the first two functions, a critical section is used to protect it.

And the list_dialects function just returns the dict's keys with PyDict_Keys. I think its result is readonly and PyDict_Keys itself is thread safe, so we don't need to do anything.
Other internal usages of dialects involve accessing its value with get_dialect_from_registry, which simply calls PyDict_GetItemRef and returns the result. I think we don't need to change it too.

Update: I think these APIs is just call dict APIs which is already thread-safe, and there are no other state changes inside these APIs, so we don't need add locks on them.

Modules/_csv.c Outdated Show resolved Hide resolved
Modules/_csv.c Outdated Show resolved Hide resolved
Modules/_csv.c Outdated
@@ -1582,7 +1585,8 @@ csv_register_dialect(PyObject *module, PyObject *args, PyObject *kwargs)
dialect = _call_dialect(module_state, dialect_obj, kwargs);
if (dialect == NULL)
return NULL;
if (PyDict_SetItem(module_state->dialects, name_obj, dialect) < 0) {
res = PyDict_SetItem(module_state->dialects, name_obj, dialect);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we don't have to use res anymore. :)

Modules/_csv.c Outdated
@@ -34,7 +35,7 @@ typedef struct {
PyTypeObject *dialect_type;
PyTypeObject *reader_type;
PyTypeObject *writer_type;
long field_limit; /* max parsed field size */
int32_t field_limit; /* max parsed field size */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this change needed?

Copy link
Member Author

@aisk aisk Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw there is not a _Py_atomic_load_long so I changed it to int32_t, but now I think we can just add it after you mentioned it, thanks for noticing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants