Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition somewhere in log_platform_message #153

Closed
thiagomacieira opened this issue Sep 9, 2022 · 1 comment
Closed

Possible race condition somewhere in log_platform_message #153

thiagomacieira opened this issue Sep 9, 2022 · 1 comment
Assignees

Comments

@thiagomacieira
Copy link
Contributor

THIS IS AN UNOPTIMIZED BUILD: DON'T TRUST TEST TIMING!
command-line: 'opendcdiag --selftests -e selftest_log_platform --quick -o -'
version: opendcdiag-861692fe19b0-dirty
os: Linux 5.19.2-1-default, glibc 2.36
timing: { duration: 1000.000, timeout: 300000.000 }
cpu-info:
  0: { logical: 0, package: 0, core: 0, thread: 0, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
  1: { logical: 1, package: 0, core: 1, thread: 0, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
  2: { logical: 2, package: 0, core: 2, thread: 0, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
  3: { logical: 3, package: 0, core: 3, thread: 0, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
  4: { logical: 4, package: 0, core: 0, thread: 1, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
  5: { logical: 5, package: 0, core: 1, thread: 1, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
  6: { logical: 6, package: 0, core: 2, thread: 1, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
  7: { logical: 7, package: 0, core: 3, thread: 1, family: 6, model: 0x8c, stepping: 1, microcode: 0xa4, ppin: null }
tests:
- test: selftest_log_platform
  details: { quality: beta, description: "Logs platform messages (not related to the test)" }
  state: { seed: 'AES:3eda399208fbab11b49dfa85ce79e91dc125c66df70454ee4b62057a318616e2', iteration: 0, retry: false }
  time-at-start: { elapsed:      0.000, now: !!timestamp '2022-09-09T21:16:48Z' }
  result: pass
  time-at-end:   { elapsed:     16.000, now: !!timestamp '2022-09-09T21:16:48Z' }
  test-runtime: 15.493
  threads:
  - thread: main
    messages:
    - { level: info, text: 'BI> Platform issue: This is an informational platform message from CPU 0' }
    - { level: info, text: 'I> Platform issue: This is an informational platform message from CPU 2' }
    - { level: debug, text: '> Platform issue: This is an informational platform message from CPU 5' }
    - { level: info, text: 'I> Platform issue: This is an informational platform message from CPU 3' }
    - { level: info, text: 'I> Platform issue: This is an informational platform message from CPU 4' }
# Loop iteration 1 finished, average time 15.9571 ms, total 15.9571 ms
exit: pass

Where are the other CPUs?

We don't use this much yet, but it could be a problem in the future.

@thiagomacieira thiagomacieira self-assigned this Sep 9, 2022
thiagomacieira added a commit to thiagomacieira/opendcdiag that referenced this issue Sep 10, 2022
log_data() is probably only used by tests, so there's no race problem
there. But log_message() may be used on the main thread's messaging,
which could cause a race condition.

Helps opendcdiag#153.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
thiagomacieira added a commit to thiagomacieira/opendcdiag that referenced this issue Sep 10, 2022
This merges the two functions for the most part and makes them use a
single write() call to write the content, minimising the race condition
in logging platform messages.

Unfortunately, it looks like the race condition remains inside the Linux
kernel now. The strace shows:

[pid 377003] write(8, "BPlatform issue: I> This is an i"..., 86 <unfinished ...>
[pid 377005] write(8, "BPlatform issue: I> This is an i"..., 85 <unfinished ...>
[pid 377001] write(8, "BPlatform issue: I> This is an i"..., 84 <unfinished ...>
[pid 376999] write(8, "BPlatform issue: I> This is an i"..., 86 <unfinished ...>

But in at least one run, I got the output:

    - { level: info, text: 'Platform issue: I> This is an informational platform message from CPU 4 (1726088530)BBPlatform issue: I> This is an informational platform message from CPU 6 (1707520692)BBPlatform issue: I> This is an informational platform message from CPU 3 (1368564819)' }
    - { level: info, text: 'Platform issue: I> This is an informational platform message from CPU 2 (1799845445)' }

This was done with a modified self test to print an extra random number so the messages would have different lengths, showing the problem. Fixes opendcdiag#153.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
@thiagomacieira
Copy link
Contributor Author

log_platform_message is not thread-safe.

thiagomacieira added a commit to thiagomacieira/opendcdiag that referenced this issue Feb 22, 2023
log_platform_message() is not thread-safe. See opendcdiag#153, which is closed as
"won't fix".

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
thiagomacieira added a commit to thiagomacieira/opendcdiag that referenced this issue Feb 22, 2023
This merges the two functions for the most part and makes them use a
single writev() call to write the content.

This was originally done to fix opendcdiag#153, but that fails because the Linux
kernel does not guarantee atomicity of write() calls and I wouldn't care
to make a guess what other OSes do. If we want to have
log_platform_message() be thread-safe, we'll need to insert a mutex.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
thiagomacieira added a commit to thiagomacieira/opendcdiag that referenced this issue Feb 22, 2023
This merges the two functions for the most part and makes them use a
single writev() call to write the content.

This was originally done to fix opendcdiag#153, but that fails because the Linux
kernel does not guarantee atomicity of write() calls and I wouldn't care
to make a guess what other OSes do. If we want to have
log_platform_message() be thread-safe, we'll need to insert a mutex.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
thac0 pushed a commit that referenced this issue Feb 22, 2023
log_platform_message() is not thread-safe. See #153, which is closed as
"won't fix".

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
thiagomacieira added a commit to thiagomacieira/opendcdiag that referenced this issue Jul 12, 2023
This merges the two functions for the most part and makes them use a
single writev() call to write the content.

This was originally done to fix opendcdiag#153, but that fails because the Linux
kernel does not guarantee atomicity of write() calls and I wouldn't care
to make a guess what other OSes do. If we want to have
log_platform_message() be thread-safe, we'll need to insert a mutex.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant