Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in map loop for BPF_MAP_TYPE_PERCPU_HASH #3117

Open
jordalgo opened this issue Apr 13, 2024 · 3 comments
Open

Race condition in map loop for BPF_MAP_TYPE_PERCPU_HASH #3117

jordalgo opened this issue Apr 13, 2024 · 3 comments
Labels
bug Something isn't working reliability Correctness and polish work

Comments

@jordalgo
Copy link
Contributor

BEGIN {
  $i = 0;
  while ($i < 100) {
    @m[1] += 1;
    $i++;
    if ($i == 100) {
      exit();
    }
  }
}

END {
  for ($kv : @m) {
    print(($kv));
  }
  clear(@m);
}

This script seems to always exit with the correct output, namely: (1, 100).

However if you use the count() function instead of +=1, which is functionally equivalent, I often see this as the output: (1,0).

My guess is that because the map used for count is BPF_MAP_TYPE_PERCPU_HASH vs BPF_MAP_TYPE_HASH used for the += operation, there is some issue there but that's as far as I got.

@jordalgo jordalgo added the bug Something isn't working label Apr 13, 2024
@jordalgo jordalgo changed the title Race condition in map loop Race condition in map loop for BPF_MAP_TYPE_PERCPU_HASH Apr 13, 2024
@ajor
Copy link
Member

ajor commented Apr 19, 2024

It looks like the problem occurs when the BEGIN and END probes run on different CPUs:

BEGIN {
  $i = 0;
  while ($i < 100) {
    @m[cpu] = count();
    $i++;
  }
}

END {
  for ($kv : @m) {
    print((cpu, $kv));
  }
  clear(@m);
}

Same CPU, correct result:

(15, (15, 100))

Different CPU, wrong result:

(57, (59, 0))

@ajor
Copy link
Member

ajor commented Apr 19, 2024

Maybe #3126 is the solution for this bug as well? When reading a per-cpu map value, collect data from all CPUs.

@jordalgo
Copy link
Contributor Author

Maybe #3126 is the solution for this bug as well? When reading a per-cpu map value, collect data from all CPUs.

Yeah that makes sense. I'll try to pick up that task soon (if no one beats me to it).

@ajor ajor added the reliability Correctness and polish work label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reliability Correctness and polish work
Projects
None yet
Development

No branches or pull requests

2 participants