Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConfigPackageUtility misses atomic file operations and #10055

Open
julianbrost opened this issue May 8, 2024 · 0 comments
Open

ConfigPackageUtility misses atomic file operations and #10055

julianbrost opened this issue May 8, 2024 · 0 comments
Labels
area/api REST API area/configuration DSL, parser, compiler, error handling area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working ref/IP

Comments

@julianbrost
Copy link
Contributor

In https://github.com/Icinga/icinga2/blob/v2.14.2/lib/remote/configpackageutility.cpp, files are written in-place using a basic std::ofstream. So if the Icinga 2 process is terminated while a file is being written, this can result in an incomplete file that will still be picked up when the configuration is loaded, resulting in a possibly incomplete or otherwise broken configuration. Additionally, as no fsync() or similar is done, other corruption may happen if the whole system crashes (I think we hat such problems in the past on Windows in particular).

Some of the std::ofstream should probably simply be replaced by our own AtomicFile, for example when writing files like /var/lib/icinga2/api/packages/director/active.conf.

However, if AtomicFile would be used everywhere, especially where the actual config package contents are written, this will probably slow down writing all the files quite a bit as it would enforce a sequential "write one file, sync it to disk, write the next one, ..." pattern. Keeping the std::ofstream there and syncing all files afterwards should allow the operating system to combine the sync operations, making them cheaper.1 The whole config stage directory tree can't be written atomically, so the atomic commit at the end should be replacing the active stage.

There have been a few reports of broken config packages in the past, the problem described here probably is their cause.

ref/IP/532402

Footnotes

  1. I did not specifically test this, but I expect that if fsync() calls are bundled at the end, some of them may not even have to do anything as the file was flushed in the meantime already. At least that optimization is possible, unlike when AtomicFile is used sequentially.

  2. I noticed this problem while looking into this support ticket, but I couldn't confirm it as the likely cause.

@julianbrost julianbrost added bug Something isn't working area/configuration DSL, parser, compiler, error handling area/distributed Distributed monitoring (master, satellites, clients) area/api REST API ref/IP labels May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/configuration DSL, parser, compiler, error handling area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working ref/IP
Projects
None yet
Development

No branches or pull requests

1 participant