Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command line flag to skip hot restart stats transfer #34069

Merged
merged 3 commits into from May 13, 2024

Conversation

ravenblackx
Copy link
Contributor

@ravenblackx ravenblackx commented May 9, 2024

Commit Message: Add command line flag to skip hot restart stats transfer
Additional Description: We had a number of issues with production servers having their main thread stuck trying to get stats from the parent instance. We're not entirely sure what the cause was, but we know stats can get big, we know that's where it was stuck, and we know there was 3 minutes of spare time during which the drain should have completed before the parent instance was hard-killed. It's possible that an out of memory occurred, it's possible the stream had a problem with the volume of data, it's possible there's a bug that causes the parent instance to lock up or crash during the stats retrieval, but one thing we can be confident of is that if we don't go down that branch at all then none of those things will happen.
Risk Level: Extremely small; no change to the usual behavior, only a change if a new command-line flag is set.
Testing: Minimal. The coverage is the same as it was before, or very slightly better.
Docs Changes: Added documentation for the command line option.
Release Notes: Yes.
Platform Specific Features: Not relevant to Windows because hot restart isn't supported there.
Relevant to: #33446

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #34069 was opened by ravenblackx.

see: more, trace.

@ravenblackx ravenblackx removed their assignment May 9, 2024
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @abeyad
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #34069 was opened by ravenblackx.

see: more, trace.

@abeyad
Copy link
Contributor

abeyad commented May 10, 2024

/wait

Signed-off-by: Raven Black <ravenblack@dropbox.com>
@ravenblackx ravenblackx marked this pull request as ready for review May 10, 2024 14:20
@ravenblackx
Copy link
Contributor Author

/retest

abeyad
abeyad previously approved these changes May 13, 2024
Copy link
Contributor

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@abeyad
Copy link
Contributor

abeyad commented May 13, 2024

/assign @yanavlasov

@abeyad
Copy link
Contributor

abeyad commented May 13, 2024

nevermind, unassigned Yan as I realized Joshua is already on the reviewer list

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo help-text suggestion.

One possible cleanup would be to add HotRestartOptions as a class and pass that around to all the hot-restart classes rather than all the settings as separate params.

That would not make this PR smaller but would make the next ones smaller.

@@ -90,6 +90,12 @@ class Options {
*/
virtual bool skipHotRestartOnNoParent() const PURE;

/**
* @return bool don't get stats from the parent. If there are a lot of stats, getting them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need to specify the type in a doxygen return declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not changed, for consistency with the doxygen comments nearby which say @return bool.

"", "skip-hot-restart-parent-stats",
"When hot restarting, by default the child instance copies stats from the parent"
" instance periodically during the draining period. This can potentially be an"
" expensive operation; set this to true to just use new stats.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/just use new stats/reset all stats in child process/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Raven Black <ravenblack@dropbox.com>
@ravenblackx
Copy link
Contributor Author

One possible cleanup would be to add HotRestartOptions as a class and pass that around to all the hot-restart classes rather than all the settings as separate params.

Yeah, I think 3 options is the threshold at which I would do that, so, next time if there is one!

@ravenblackx ravenblackx merged commit f0c2329 into envoyproxy:main May 13, 2024
53 checks passed
@ravenblackx ravenblackx deleted the hot_restart_stats branch May 13, 2024 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants