New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data races in stbir_resize_extended_split by moving the call to s… #1569
base: master
Are you sure you want to change the base?
Conversation
…tbir__update_info_from_resize outside of it.
Yeah, this is slightly more complicated. Are you calling stbir_build_samplers_with_splits() back out, prior to the resizing? What should happen here is that if we need to do a reinit of the coefficients, then we need to fail, the call to stbir_resize_extended_split. So, if you update a parameter that requires resetting up the internal allocs, then it needs to fail. Nikola, can I see the the calls you make to stb_image_resize, so I can see if there is an order mismatch? You want to call stbir_build_samplers_with_splits as close to stbir_resize_extended_split as you can... |
Hmmm, it actually already does fail if you need to rebuild coeffs. I would expect a warning that we are setting the values to the same value when calling from multiple threads, but not sure where there would be any corruption. There might be two things going on here... |
My code is roughly equivalent to this, real code is not starting threads for each resize.
Here's more info on the races. Note that these are all 4 and 8 byte reads and writes that could be torn, so I'm not surprised there's corruption. What's more I'm compiling this with C++ and mere presence of data races is UB in C++, so compiler is free to reorder code in any way possible. I guess what I'm trying to say is that we probably need to focus on the architectural side of things instead of trying to figure out which race exactly causes corruption, they all need to be fixed. My early attempt of just slapping std::atomic on each variable didn't fix the issue. stb_image_resize2.h:7152 Data race in stbir__update_info_from_resize(stbir__info*, STBIR_RESIZE*) at 0x1185b7000 stb_image_resize2.h:7153 Data race in stbir__update_info_from_resize(stbir__info*, STBIR_RESIZE*) at 0x1185b7000 stb_image_resize2.h:7154 Data race in stbir__update_info_from_resize(stbir__info*, STBIR_RESIZE*) at 0x1185b7000 The list goes on, it's also worth noting that most of these variables are read by the functions executed during resize (ie. |
I'm not super up-to-date on threads and data races and current terminology, but I think the point here is that there are two separate issues:
Jeff, please correct me if I'm wrong about this explanation, since I'm just guessing here. |
Each thread is writing and reading the variable. It's writing when |
Correct, but every thread does this - all writes will happen before any read, because every thread is doing them. It's still a data race warning and we'll fix that up - but will not change any behavior. C++ and UB are not relevant here.
Yes, that is correct. The data races can be fixed, but they will have no effect of the behavior of the code. So, if there is corruption, then that would be some other problem... |
Under what condition should Thread1 reading the variable while Thread2 has "partly" written it cause corruption, if Thread2 is writing the exact same value that's already in the variable (it's the same value because that's the value Thread1 wrote)? Because that's what the scenario we've been describing is. |
The fact that both threads are writing the same value is not relevant, what's relevant is that the write itself is not atomic. This can result in a torn write that another thread observes when it reads the value. I'm only speculating on where the corruption comes from, but consider |
Thread sanitizer reports numerous data races, mostly when assigning the
info
struct insidestbir__update_info_from_resize
. I'm also seeing visual corruption when running resizing on multiple threads. My guess would be torn reads and writes.It looks like
stbir__update_info_from_resize
is not touching any of the per-slice data members which suggests it can be called only once instead of being called once per split. I've moved the call intostbir_build_samplers_with_splits
as we need to call that function when setting up the multi-threaded slice resizing. The code can potentially be simpler ifstbir__update_info_from_resize
can be called before the call tostbir__perform_build
. I don't understand the code enough to make this call.