Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tests for MQTT retained messages in various cluster/domain conf… #5082

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

levb
Copy link
Contributor

@levb levb commented Feb 14, 2024

This is a Test- and CI-only PR. No server changes.

See also ConnectEverything/mqtt-test#2.

@levb levb requested a review from a team as a code owner February 14, 2024 13:34
@levb levb requested review from wallyqs and Jarema February 14, 2024 13:42
target := topo.makef(t)
t.Cleanup(target.Shutdown)

// TODO (levb): if we do not pre-initialize the streams and
Copy link
Contributor Author

@levb levb Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NS: cc @derekcollison @kozlovic This test uncovered a rather inconsistent behavior in multi-domain scenarios. If there are 2 domains, A and B, and the MQTT JetStream assets are first initialized in A, B will only receive and replay retained messages after its own retained message assets are created. Touching the servers with an MQTT connection here ensures the success of the tests; otherwise A<->B fail 1/2 of the time.

Not quite sure if this is "by design".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that if they have different domains, messages should not even be visible in the other domain, but I have to say I don't really understand that concept. What I notice is that you don't specify js_domain: in the mqtt{} block of the config, but if I try to add those, then the test fails early since the lookup of a stream fails with a timeout since the JS APIs are rewritten with the domain name, like $JS.HUBD.API.STREAM.INFO.$MQTT_sess and for some reason, the server does not listen to those, even though you have specified domain in the JS config too. I don't really see in JS code where the server listens to that...

Copy link
Contributor Author

@levb levb Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't specify js_domain: in the mqtt{} block of the config

👍 I did not (and still do not) fully understand the intended functionality of this option, would love to add tests for it. Will read the code and try to understand better. The customer that was having issues did not have it set, so I did not prioritize it.

derekcollison added a commit that referenced this pull request Feb 15, 2024
#5082 was supposed to fix
this, but it is not reviewed yet.

`mqtt-test` got updated; and the `@latest` reference is causing the new,
incompatible, version to be used. This PR will fix it, until #5082 is
merged.
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but unfortunately regarding the domains, I am not sure how that works and can't really help there.

for _, c := range t.clusters {
c.shutdown()
}
for _, s := range t.servers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in Reload()

tb.Helper()

for _, c := range t.clusters {
c.restartAllSamePorts()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a mistake that I use to make: restartAllSamePorts() does NOTHING if the cluster has not been shutdown first. Basically that code goes through servers in the cluster and restart IF they are not currently running. So you need to add c.shutdown() prior to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And adding that shows that TestMQTTExRetainedMessages/cluster are now failing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should call c.stopAll(), not c.shutdown() since in that case the config/store would be wiped out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need after the restart to add something like:

for _, stream := range []string{mqttSessStreamName, mqttStreamName, mqttQoS2IncomingMsgsStreamName, mqttOutStreamName, mqttRetainedMsgsStreamName} {
    c.waitOnStreamLeader("ONE", stream)
}

Without something like that, then it is possible that a connect fails because one of the stream lookup fails with a timeout (since after a restart, it may take some time to elect a leader for those streams).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch... fixing.


for i, s := range t.servers {
o := s.getOpts()
s.Shutdown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that mqttExTarget will either have t.clusters or t.servers set but not both? Or at least that t.servers will not be servers that are in t.clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the case at the moment, but you're right that it may not be the case in the future (a cluster with a single leaf node immediately comes to mind).

TBH, this was just a "minimal viable" implementation for the current requirements, I didn't want to cut/paste nor to go too abstract here (it's just an "extra" test for the time being). If we keep using MQTTEx going forward more then yes, the "target" implementation should be refactored.

server/mqtt_ex_test_test.go Outdated Show resolved Hide resolved
sub []mqttExDial
}

func TestMQTTExCompliance(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to exclude this file from running the normal tests, say from server repo something like:

go test -race -v -run=TestMQTT ./server

Now fails if you don't have all the things that you do in GithubActions setup. So you may want to use some build flags or something to exclude this file. Maybe something like:

//go:build !skip_mqtt_tests && run_mqtt_ex
// +build !skip_mqtt_tests,run_mqtt_ex

Of course, that means adding -tags=run_mqtt_ex in the go command to run those tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but passed on a separate build tag. All the MQTTEx tests (should) do a t.Skip if mqtt-test is not installed in CI, so it's ok to invoke them in the main tag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I did not realize that this is what was happening. I installed mqtt-test, etc.. and added it to my path, but then when I was trying to run the "normal" MQTT tests, those 2 would run and I thought that was new.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will separate.

server/mqtt_ex_test_test.go Show resolved Hide resolved
target := topo.makef(t)
t.Cleanup(target.Shutdown)

// TODO (levb): if we do not pre-initialize the streams and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that if they have different domains, messages should not even be visible in the other domain, but I have to say I don't really understand that concept. What I notice is that you don't specify js_domain: in the mqtt{} block of the config, but if I try to add those, then the test fails early since the lookup of a stream fails with a timeout since the JS APIs are rewritten with the domain name, like $JS.HUBD.API.STREAM.INFO.$MQTT_sess and for some reason, the server does not listen to those, even though you have specified domain in the JS config too. I don't really see in JS code where the server listens to that...

server/mqtt_ex_test_test.go Outdated Show resolved Hide resolved
server/mqtt_ex_test_test.go Outdated Show resolved Hide resolved
tb.Helper()

for _, c := range t.clusters {
c.restartAllSamePorts()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should call c.stopAll(), not c.shutdown() since in that case the config/store would be wiped out.

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to make sure that the cluster is really restarted. And I have found then that tests are quite flappy. I think you need to have to maybe use testMQTTConnectRetry() with a clean session and dummy ID (or no ID specified) until it succeeds before having the real test in mqttexRunTest(). Just waiting on the cluster to be ready proved not enough in tests I was running last week.

sub []mqttExDial
}

func TestMQTTExCompliance(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I did not realize that this is what was happening. I installed mqtt-test, etc.. and added it to my path, but then when I was trying to run the "normal" MQTT tests, those 2 would run and I thought that was new.

@levb
Copy link
Contributor Author

levb commented Feb 20, 2024

Still need to make sure that the cluster is really restarted.

@kozlovic sorry my commit message was non-descriptive, I squashed from another branch. Anyway, I am still working on the restart a another potential issue I ran into, will mark you for review when done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants