Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on_client_offline not triggered by shutdown anymore #2003

Open
bohlenc opened this issue Jun 13, 2022 · 4 comments
Open

on_client_offline not triggered by shutdown anymore #2003

bohlenc opened this issue Jun 13, 2022 · 4 comments

Comments

@bohlenc
Copy link

bohlenc commented Jun 13, 2022

Environment

  • VerneMQ Version: 1.12.x
  • OS: using Docker image vernemq/vernemq:1.12.x
  • VerneMQ configuration (vernemq.conf):
allow_anonymous = off
allow_register_during_netsplit = off
allow_publish_during_netsplit = off
allow_subscribe_during_netsplit = off
allow_unsubscribe_during_netsplit = off
allow_multiple_sessions = off
coordinate_registrations = on
max_inflight_messages = 20
max_online_messages = 1000
max_offline_messages = 1000
max_message_size = 0
upgrade_outgoing_qos = off
listener.max_connections = 10000
listener.nr_of_acceptors = 10
listener.tcp.default = 127.0.0.1:1883
listener.vmq.clustering = 0.0.0.0:44053
listener.http.default = 127.0.0.1:8888    
systree_enabled = on
systree_interval = 20000
graphite_enabled = off
graphite_host = localhost
graphite_port = 2003
graphite_interval = 20000
shared_subscription_policy = prefer_local
plugins.vmq_passwd = on
plugins.vmq_acl = on
plugins.vmq_diversity = off
plugins.vmq_webhooks = off
plugins.vmq_bridge = off
metadata_plugin = vmq_plumtree
vmq_acl.acl_file = ./etc/vmq.acl
vmq_acl.acl_reload_interval = 10
vmq_passwd.password_file = ./etc/vmq.passwd
vmq_passwd.password_reload_interval = 10
vmq_diversity.script_dir = ./share/lua
vmq_diversity.auth_postgres.enabled = off
vmq_diversity.postgres.ssl = off
vmq_diversity.postgres.password_hash_method = crypt
vmq_diversity.auth_cockroachdb.enabled = off
vmq_diversity.cockroachdb.ssl = on
vmq_diversity.cockroachdb.password_hash_method = bcrypt
vmq_diversity.auth_mysql.enabled = off
vmq_diversity.mysql.password_hash_method = password
vmq_diversity.auth_mongodb.enabled = off   
vmq_diversity.mongodb.ssl = off
vmq_diversity.auth_redis.enabled = off 
vmq_bcrypt.pool_size = 1
log.console = file
log.console.level = info
log.console.file = ./log/console.log
log.error.file = ./log/error.log
log.syslog = off
log.crash = on
log.crash.file = ./log/crash.log
log.crash.maximum_message_size = 64KB
log.crash.size = 10MB
log.crash.rotation = $D0
log.crash.rotation.keep = 5
nodename = VerneMQ@127.0.0.1
distributed_cookie = vmq
erlang.async_threads = 64
erlang.max_ports = 262144
leveldb.maximum_memory.percent = 70
  • Cluster size/standalone: standalone

Expected behaviour

I expect the on_client_offline hook to be called when a client session is terminated because of a node shutdown.
This was the case up until and including version 1.11.0.

Actual behaviour

The on_client_offline hook is not called anymore in version 1.12.0 in later when the node is shutdown.

Additional information

I compared the VerneMQ log output of versions 1.11.0 and 1.12.3, when shutting down a node (i.e. by killing the container it runs in).

(The [info] messages are printed using on_client_wakeup and on_client_offline hooks)

Output with version 1.11.0:

14:16:27.815 [info] Client some_test_client_id woke up.
14:16:27.816 [info] Client some_test_client_id connected.
14:16:28.879 [debug] stop due to disconnect
14:16:28.879 [debug] session stopped due to shutdown
14:16:28.880 [info] Client some_test_client_id is offline.
14:16:28.881 [info] Client some_test_client_id disconnected.

(-> on_client_offline hook is triggered)

Output with version 1.12.3:

14:21:22.406 [info] Client some_test_client_id woke up.
14:21:22.406 [info] Client some_test_client_id connected.
14:21:23.425 [debug] session normally stopped

(-> on_client_offline hook is not triggered)

So it seems 1.12.0 introduced a new behavior regarding how sessions are terminated on shutdown, or maybe how a SIGTERM is handled.

@ioolkos
Copy link
Contributor

ioolkos commented Jun 14, 2022

@bohlenc Nothing was implemented to change behaviour within VerneMQ. So this must be related to Kubernetes (where there were changes to the start script).

This is somewhat difficult to reason about. Verne, hooks, plugin all run in the same VM. Question of Shutdown order, time, guarantees.
Maybe this is also related to cluster leave vs node stop.

You could test what vmq-admin listener stop and then vmq-admin listener delete guarantees in terms of on_client_offline hook.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@bohlenc
Copy link
Author

bohlenc commented Jun 14, 2022

Just to be clear: I see this issue also in a simple Docker container, so it does not seem to be Kubernetes related - but I will check in the direction of docker-vernemq as well.

@bohlenc
Copy link
Author

bohlenc commented Jun 14, 2022

I found that before cluster leave was called on sigterm, in addition to node stop. It seems that the way client sessions are terminated when issuing cluster leave differs from the way it does on node stop.

Should the hook not be triggered regardless of how the session is terminated?

c.f. vernemq/docker-vernemq#315

@bohlenc
Copy link
Author

bohlenc commented Jun 14, 2022

I assume client sessions are not terminated on node stop to avoid rebalancing clients to other cluster nodes. Would that be correct?
If so, would you recommend keeping active client sessions on the node, e.g. during K8s rolling updates?

Nevertheless, until the node is back up, the clients are actually offline and shouldn't in that case trigger the on_client_offline hook?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants