Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acq-rel pairing is not ordered in PSO #88

Open
albertnetymk opened this issue May 20, 2020 · 7 comments
Open

acq-rel pairing is not ordered in PSO #88

albertnetymk opened this issue May 20, 2020 · 7 comments
Labels

Comments

@albertnetymk
Copy link

Running it using nidhuggc -O -std=c++17 -- --pso test.cpp shows assertion failure. I thought with acq-rel MM, I would get SC behavior (no assertion failure).

#include <pthread.h>
#include <atomic>
#include <cassert>

using namespace std;

atomic<int> x = 0;
atomic<int> y = 0;

void t0()
{
  y.store(20, memory_order_release);
  x.store(10, memory_order_release);
}

void t1()
{
  if (x.load(memory_order_acquire) == 10) {
    assert(y.load(memory_order_acquire) == 20);
  }
}

int main()
{
  pthread_t threads[2];
  pthread_create(
      &threads[0],
      nullptr,
      [](void*) -> void* {t0(); return nullptr;},
      nullptr);
  pthread_create(
      &threads[1],
      nullptr,
      [](void*) -> void* {t1(); return nullptr;},
      nullptr);
  for (auto i = 0; i < 2; ++i) {
    pthread_join(threads[i], nullptr);
  }

  return 0;
}
@margnus1
Copy link
Contributor

margnus1 commented Jun 9, 2020

I have confirmed the issue. As far as I can tell, Nidhugg/pso treats release/acquire accesses identically to relaxed accesses.

How to fix this issue is bit more tricky, and potentially contentious. Nidhugg does not really treat its inputs as C (or C++) programs, but more as "C, except with memory model X", for some choices of X. Thus, for this Nidhugg-specific modelling language, the question of what atomic_store_explicit(..., memory_order_release) even means becomes up to us to define. Should we even allow it? Does it mean the same thing as an SC-store, to interpret it as a fence, flushing all the store buffers? Is it more like a barrier, requiring the store to hit memory strictly later than currently buffered stores? Should we try to impose the intersection between TSO and C11 behaviours? All of these are different models.

@kostis
Copy link
Contributor

kostis commented Jun 10, 2020

Thanks @margnus1 for looking into this. Let me also chip in here, because I want to understand the motivation behind this issue and how we can possibly improve on the documentation of Nidhugg.

Running it using nidhuggc -std=c++17 -- --pso test.cpp shows assertion failure. I thought with acq-rel MM, I would get SC behavior (no assertion failure).

As @margnus1 wrote above, currently, Nidhugg does not do analysis under the C/C++11 memory model, but under different (more low-level, architectural) memory models, namely SC, TSO and PSO. Of these, PSO is clearly the more relaxed one, so it's natural to observe more relaxed behaviors compared to SC and TSO in this model. Also, note that Nidhugg does not support the Release-Acquire fragment of C/C++11's memory model.

Given the above, I am wondering what made @albertnetymk expect that "with release-acquire accesses you would get SC behavior" in the first place, and in particular in connection to specifying the --pso flag in the call to nidhuggc, which is clearly not the same as the tool's --sc flag.

Answers to these questions may help us improve the documentation from the users' perspectives.

@albertnetymk
Copy link
Author

By "I thought with acq-rel MM, I would get SC behavior (no assertion failure).", I meant for this example, memory_order_acquire+memory_order_release and memory_order_seq_cst have the same behavior, no assertion failure.

I see two levels of MM here: the first level is specified in C/C++ program and the second level is passed to nidhugg. I interprete the second level of MM as the MM used by the architecture, TSO from x86, PSO from ARM. If the C/C++ program uses a too weak MM, it might still work on an architecture with relatively strong MM; in contrast, if the C/C++ program uses a strong enough MM, it should work on all architectures (strong or weak).

Going back to the original example, since the program uses a strong enough MM, I would expect no assertion failure even with a weaker architecture MM.

Of course, this is merely my personal understanding on how nidhugg is used/works; it might not be the intended way or even correct.

@kostis
Copy link
Contributor

kostis commented Jun 10, 2020

Thanks for the reply.

Let me try to clear some misunderstandings:

I see two levels of MM here: the first level is specified in C/C++ program and the second level is passed to nidhugg.

IMO, this is the main misunderstanding. It is also something we should make it clear(er) in the docs (if it's not already). Currently, in Nidhugg, there are no two levels. Instead, there is only one level of MM: that chosen by the MM in the command-line option. In other words, when one specifies --pso, they basically say: "Dear nidhugg, do an analysis assuming that all loads and stores in my program are as if executed under PSO."

I interpret the second level of MM as the MM used by the architecture, TSO from x86, PSO from ARM.

Hmmm... actually PSO is not ARM! The ARM MM is (quite) different.

Going back to the original example, since the program uses a strong enough MM, I would expect no assertion failure even with a weaker architecture MM.

Since these days I tend to hang out and interact with computer architects a fair bit, I am a bit hazy on which MMs are strong vs weak, but I would definitely not classify RA (release-acquire) as a "strong enough MM." (But, having written that, I realize that the disagreement here may lie in what enough really means.)

@albertnetymk
Copy link
Author

Thank you for the explanation.

Currently, in Nidhugg, there are no two levels. Instead, there is only one level of MM: that chosen by the MM in the command-line option.

I modified the example to use memory_order_seq_cst, then the assertion failure goes away. I think this is why I got the impression about the two levels of MM. If this command line option selects MM, the assertion should still fail, shouldn't it?

Hmmm... actually PSO is not ARM!

Thank you for pointing this out; I forgot what architecture uses PSO.

Even though my understand about this two-level of MM is incorrect, I find this scenario quite practical in testing real code using nidhugg. If this is supported, nidhugg could be more widely adopted, I think.

@margnus1
Copy link
Contributor

margnus1 commented Sep 4, 2020

I modified the example to use memory_order_seq_cst, then the assertion failure goes away. I think this is why I got the impression about the two levels of MM. If this command line option selects MM, the assertion should still fail, shouldn't it?

Well, PSO also offers synchronized accesses to memory, for example by inserting memory fences. Nidhugg offers two way of expressing this in its "modelling language". Either you insert explicit fences using gcc inline assembly, or you use atomic_store. However, it does not impose any special meaning on atomic_(store|load)(..., memory_order_(release|acquire)). Part of the reason is that nidhugg does not implement store-store fences, which are part of the PSO memory model. If it did, probably atomic_store(..., memory_order_release) would mean "sfence followed by store".

This "Two-memory-model" idea does not have the semantics you want. If we wanted to check "well, what can this program do if first compiled with a C11 compiler and then run on memory model X"; this is equivalent to just checking "is this program correct under c11". The point of the language memory model is to abstract the underlying hardware model, and also allow optimisers to rewrite the code. Therefore, behaviours that might by disallowed by the "underlying mm" would still be allowed, since the C11 compiler can legally rewrite the program to always exhibit those behaviours, even on SC hardware.

Still, I can see how nidhugg confuses its users into thinking it implements C11. We've been discussing whether to implement a "best-effort-C11" mode, where we implement SC+RA, and allow relaxed accesses only where synchronised and otherwise barfing (since the standard does not define the exact behaviours allowed for relaxed accesses, and model-checking a conservative overapproximation is an unsolved research problem, and most likely a solution would be crazy slow and memory hungry. Heck, I don't know if it's decidable).

@albertnetymk
Copy link
Author

Well, PSO also offers synchronized accesses to memory...

I see; this makes sense.

This "Two-memory-model" idea does not have the semantics you want...

True. I didn't think this through.

Still, I can see how nidhugg confuses its users into thinking it implements C11...

Since nidhugg is a bug-finding tool on C/C++ programs, I find it handy to test actual code with it, instead of checking models constructed from actual code. The fact that nidhugg specifies the hardware MM but not a language MM does introduce some barrier in testing the actual code directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants