You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When stopping an application (POST /v3/apps/:guid/actions/stop), CC sets the desired app state to STOPPED, triggers the LRP deletion at Diego for all application process instances and returns with 200 (i.e. a synchronous api request).
However, the actual LRPs (= app instances) may continue to run after the stop request finished with 200 because of the graceful_shutdown_interval_in_seconds that Diego grants to running processes.
There is no way for users to find out when the app process instances have really stopped (beside waiting for the graceful shutdown time and some extra time). GET /v3/apps/:guid/processes/:type/stats returns immediately a status DOWN after stopping the app even though the instances are still running.
This can lead to issues during graceful shutdown e.g. when a deployment procedure directly unbinds service instances after stopping the application. Depending on the service, app instances can lose access to the service instances immediately which leads to unintended failures during graceful shutdown.
Context
Observed on foundations that use a longer graceful shutdown interval than the default 10s.
push an application that ignores SIGINT and SIGTERM, e.g. this python example
import os
import http.server
import socketserver
import signal
def ignore_signal(signum, frame):
print(f"Signal handler called with signal {signal.strsignal(signum)}. Ignoring.")
signal.signal(signal.SIGINT, ignore_signal)
signal.signal(signal.SIGTERM, ignore_signal)
if __name__ == "__main__":
port = int(os.getenv("PORT", 8080))
# port = 8001
with socketserver.TCPServer(("", port), http.server.SimpleHTTPRequestHandler) as httpd:
print("serving at port", port)
httpd.serve_forever()
stop the running app: cf stop
observe that cf stop returns immediately and that the process stats return state DOWN
check application logs to validate that the app instance containers got destroyed after checking the process stats and when graceful_shutdown_interval_in_seconds expired
Expected result
GET /v3/apps/:guid/processes/:type/stats returns DOWN only when the app process instances are not running anymore. During graceful shutdown, a state of RUNNING or maybe STOPPING should be reported.
Current result
GET /v3/apps/:guid/processes/:type/stats returns DOWN immediately after stopping the app, even though the app process instances are still running during graceful shutdown period.
Possible Fix
instances_stats_reporter.rb should not simply report DOWN for an app instance when the desired LRP is not found but it should request the actual LRP additionally to determine the instance state.
When the desired LRP doesn't exist but an actual LRP still exists, the state could be set to STOPPING.
The text was updated successfully, but these errors were encountered:
Issue
When stopping an application (
POST /v3/apps/:guid/actions/stop
), CC sets the desired app state toSTOPPED
, triggers the LRP deletion at Diego for all application process instances and returns with 200 (i.e. a synchronous api request).However, the actual LRPs (= app instances) may continue to run after the stop request finished with 200 because of the graceful_shutdown_interval_in_seconds that Diego grants to running processes.
There is no way for users to find out when the app process instances have really stopped (beside waiting for the graceful shutdown time and some extra time).
GET /v3/apps/:guid/processes/:type/stats
returns immediately a statusDOWN
after stopping the app even though the instances are still running.This can lead to issues during graceful shutdown e.g. when a deployment procedure directly unbinds service instances after stopping the application. Depending on the service, app instances can lose access to the service instances immediately which leads to unintended failures during graceful shutdown.
Context
Observed on foundations that use a longer graceful shutdown interval than the default 10s.
Steps to Reproduce
cf stop
cf stop
returns immediately and that the process stats return stateDOWN
Expected result
GET /v3/apps/:guid/processes/:type/stats
returnsDOWN
only when the app process instances are not running anymore. During graceful shutdown, a state of RUNNING or maybe STOPPING should be reported.Current result
GET /v3/apps/:guid/processes/:type/stats
returnsDOWN
immediately after stopping the app, even though the app process instances are still running during graceful shutdown period.Possible Fix
instances_stats_reporter.rb should not simply report DOWN for an app instance when the desired LRP is not found but it should request the actual LRP additionally to determine the instance state.
When the desired LRP doesn't exist but an actual LRP still exists, the state could be set to
STOPPING
.The text was updated successfully, but these errors were encountered: