Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry sending specific messages #4969

Merged
merged 2 commits into from May 1, 2024
Merged

retry sending specific messages #4969

merged 2 commits into from May 1, 2024

Conversation

qfrank
Copy link
Contributor

@qfrank qfrank commented Mar 22, 2024

Relate mobile issue:

Major changes include:

  • Added retry functionality for specific message types, such as:
    ApplicationMetadataMessage_COMMUNITY_REQUEST_TO_JOIN
    ApplicationMetadataMessage_COMMUNITY_EDIT_SHARED_ADDRESSES
    ApplicationMetadataMessage_COMMUNITY_CANCEL_REQUEST_TO_JOIN
    ApplicationMetadataMessage_COMMUNITY_REQUEST_TO_JOIN_RESPONSE
    ApplicationMetadataMessage_COMMUNITY_REQUEST_TO_LEAVE
    
  • Replaced ResendAutomatically with ResendType and ResendMethod.
  • Fixed the rawMessageByID function, which incorrectly calculated Recipients.

@qfrank qfrank self-assigned this Mar 22, 2024
@status-im-auto
Copy link
Member

status-im-auto commented Mar 22, 2024

Jenkins Builds

Click to see older builds (84)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 6c35a8f #1 2024-03-22 11:01:33 ~3 min linux 📦zip
✔️ 6c35a8f #1 2024-03-22 11:02:48 ~5 min ios 📦zip
✔️ 6c35a8f #1 2024-03-22 11:03:22 ~5 min android 📦aar
✖️ 6c35a8f #1 2024-03-22 11:18:22 ~20 min tests 📄log
✔️ 0d167d2 #2 2024-03-22 13:41:20 ~2 min android 📦aar
✖️ 0d167d2 #2 2024-03-22 13:42:00 ~2 min tests 📄log
✔️ 0d167d2 #2 2024-03-22 13:42:27 ~3 min linux 📦zip
✔️ 0d167d2 #2 2024-03-22 13:45:20 ~6 min ios 📦zip
edac513 #3 2024-03-28 13:54:47 ~24 sec ios 📄log
edac513 #3 2024-03-28 13:56:32 ~2 min android 📄log
edac513 #3 2024-03-28 13:56:52 ~2 min linux 📄log
✖️ edac513 #3 2024-03-28 13:58:32 ~4 min tests 📄log
✔️ d598e49 #4 2024-03-28 14:14:36 ~3 min linux 📦zip
✔️ d598e49 #4 2024-03-28 14:14:44 ~3 min ios 📦zip
✔️ d598e49 #4 2024-03-28 14:16:53 ~5 min android 📦aar
✖️ d598e49 #4 2024-03-28 14:49:11 ~38 min tests 📄log
✖️ d598e49 #5 2024-03-28 15:49:43 ~36 min tests 📄log
✔️ 6baff87 #5 2024-03-30 09:44:42 ~2 min linux 📦zip
✔️ 6baff87 #5 2024-03-30 09:46:22 ~3 min ios 📦zip
✔️ 6baff87 #5 2024-03-30 09:46:53 ~4 min android 📦aar
✖️ 6baff87 #6 2024-03-30 10:16:41 ~34 min tests 📄log
✖️ 2fbae93 #7 2024-03-31 02:45:26 ~51 sec tests 📄log
✔️ 2fbae93 #6 2024-03-31 02:46:15 ~1 min linux 📦zip
✔️ 2fbae93 #6 2024-03-31 02:47:18 ~2 min android 📦aar
✔️ 2fbae93 #6 2024-03-31 02:48:04 ~3 min ios 📦zip
✔️ c884f3c #7 2024-03-31 02:59:18 ~1 min linux 📦zip
✔️ c884f3c #7 2024-03-31 02:59:55 ~2 min android 📦aar
✔️ c884f3c #7 2024-03-31 03:00:53 ~3 min ios 📦zip
✔️ c884f3c #8 2024-03-31 03:38:21 ~40 min tests 📄log
✔️ 421c58b #8 2024-04-01 02:00:32 ~1 min linux 📦zip
✔️ 421c58b #8 2024-04-01 02:01:29 ~2 min android 📦aar
✔️ 421c58b #8 2024-04-01 02:02:56 ~3 min ios 📦zip
✔️ 421c58b #9 2024-04-01 02:39:20 ~40 min tests 📄log
✖️ 74ee7ef #10 2024-04-01 04:05:52 ~1 min tests 📄log
✔️ 74ee7ef #9 2024-04-01 04:06:35 ~2 min android 📦aar
✔️ 74ee7ef #9 2024-04-01 04:06:37 ~2 min linux 📦zip
✔️ 74ee7ef #9 2024-04-01 04:07:43 ~3 min ios 📦zip
✔️ d234dd0 #10 2024-04-01 04:16:57 ~1 min android 📦aar
✔️ d234dd0 #10 2024-04-01 04:17:18 ~2 min linux 📦zip
✔️ d234dd0 #10 2024-04-01 04:18:07 ~2 min ios 📦zip
✔️ d234dd0 #11 2024-04-01 04:54:14 ~39 min tests 📄log
✔️ f43babd #11 2024-04-01 08:24:29 ~1 min linux 📦zip
✔️ f43babd #11 2024-04-01 08:24:40 ~1 min android 📦aar
✔️ f43babd #11 2024-04-01 08:26:23 ~3 min ios 📦zip
✔️ f43babd #12 2024-04-01 09:03:06 ~40 min tests 📄log
✖️ b17aa49 #13 2024-04-06 02:21:41 ~1 min tests 📄log
✔️ b17aa49 #12 2024-04-06 02:22:01 ~1 min linux 📦zip
✔️ b17aa49 #12 2024-04-06 02:22:44 ~2 min android 📦aar
✔️ b17aa49 #12 2024-04-06 02:23:49 ~3 min ios 📦zip
✖️ 17ea9ed #14 2024-04-06 02:24:21 ~56 sec tests 📄log
✔️ 17ea9ed #13 2024-04-06 02:24:56 ~1 min linux 📦zip
✔️ 17ea9ed #13 2024-04-06 02:25:31 ~2 min android 📦aar
✔️ 17ea9ed #13 2024-04-06 02:27:22 ~3 min ios 📦zip
✔️ 3865002 #14 2024-04-06 02:32:20 ~1 min linux 📦zip
✔️ 3865002 #14 2024-04-06 02:32:34 ~1 min android 📦aar
✔️ 3865002 #14 2024-04-06 02:33:51 ~2 min ios 📦zip
✔️ 3865002 #15 2024-04-06 03:11:05 ~40 min tests 📄log
✔️ d487673 #15 2024-04-06 23:48:52 ~1 min linux 📦zip
✔️ d487673 #15 2024-04-06 23:49:11 ~2 min android 📦aar
✔️ d487673 #15 2024-04-06 23:50:40 ~3 min ios 📦zip
✔️ d487673 #16 2024-04-07 00:27:28 ~40 min tests 📄log
✔️ 3672759 #16 2024-04-08 13:01:06 ~1 min linux 📦zip
✔️ 3672759 #16 2024-04-08 13:03:49 ~4 min android 📦aar
✔️ 3672759 #17 2024-04-08 13:40:26 ~41 min tests 📄log
✔️ a238ba3 #17 2024-04-09 12:22:43 ~3 min linux 📦zip
✔️ a238ba3 #17 2024-04-09 12:24:40 ~5 min android 📦aar
✖️ a238ba3 #18 2024-04-09 12:25:07 ~6 min tests 📄log
✔️ a238ba3 #19 2024-04-09 13:12:11 ~40 min tests 📄log
✔️ 03a1800 #18 2024-04-10 01:59:18 ~3 min linux 📦zip
✔️ 03a1800 #18 2024-04-10 02:00:37 ~5 min ios 📦zip
✔️ 03a1800 #18 2024-04-10 02:00:54 ~5 min android 📦aar
✖️ 03a1800 #20 2024-04-10 02:00:56 ~5 min tests 📄log
✔️ a3071bd #19 2024-04-10 02:20:34 ~1 min android 📦aar
✔️ a3071bd #19 2024-04-10 02:21:03 ~2 min linux 📦zip
✔️ a3071bd #19 2024-04-10 02:22:03 ~3 min ios 📦zip
✔️ a3071bd #21 2024-04-10 02:58:28 ~39 min tests 📄log
✖️ fa3a38a #22 2024-04-15 08:35:11 ~58 sec tests 📄log
✔️ fa3a38a #20 2024-04-15 08:38:12 ~4 min ios 📦zip
✔️ fa3a38a #20 2024-04-15 08:38:54 ~4 min linux 📦zip
✔️ fa3a38a #20 2024-04-15 08:40:09 ~6 min android 📦aar
✔️ 694e4a2 #21 2024-04-15 08:57:09 ~4 min linux 📦zip
✔️ 694e4a2 #21 2024-04-15 08:57:39 ~4 min ios 📦zip
✔️ 694e4a2 #21 2024-04-15 08:57:50 ~4 min android 📦aar
✔️ 694e4a2 #23 2024-04-15 09:41:10 ~47 min tests 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ ca414a1 #22 2024-04-30 07:08:37 ~3 min linux 📦zip
✔️ ca414a1 #22 2024-04-30 07:09:58 ~5 min ios 📦zip
✔️ ca414a1 #22 2024-04-30 07:10:35 ~5 min android 📦aar
✔️ ca414a1 #24 2024-04-30 07:47:39 ~42 min tests 📄log
✔️ afbb823 #23 2024-05-01 00:44:07 ~2 min linux 📦zip
✔️ afbb823 #23 2024-05-01 00:44:22 ~2 min android 📦aar
✔️ afbb823 #23 2024-05-01 00:45:31 ~3 min ios 📦zip
✔️ afbb823 #25 2024-05-01 01:24:21 ~42 min tests 📄log

@qfrank qfrank force-pushed the feat/retry_send_message branch 5 times, most recently from 2fbae93 to c884f3c Compare March 31, 2024 02:57
@qfrank qfrank marked this pull request as ready for review April 1, 2024 01:42
@qfrank qfrank requested a review from osmaczko April 1, 2024 01:42
@qfrank qfrank force-pushed the feat/retry_send_message branch 4 times, most recently from d234dd0 to f43babd Compare April 1, 2024 08:22
@qfrank
Copy link
Contributor Author

qfrank commented Apr 6, 2024

thanks to @osmaczko , leave chat history with @osmaczko here as reference later cc @cammellos :

Hi Patryk, maybe you know something about this:

I'm struggling with SendMessageToControlNode , it will use SendCommunityMessage or SendPrivate depend on !community.PublicKey().Equal(community.ControlNode()) , we have 4 message types:

ApplicationMetadataMessage_COMMUNITY_REQUEST_TO_JOIN
ApplicationMetadataMessage_COMMUNITY_EDIT_SHARED_ADDRESSES
ApplicationMetadataMessage_COMMUNITY_CANCEL_REQUEST_TO_JOIN
ApplicationMetadataMessage_COMMUNITY_REQUEST_TO_LEAVE

which use SendMessageToControlNode , if the user is ControlNode, are we allowing ControlNode to send these 4 type of messages? should not allow IMO, so we can just use SendPrivate directly rather than SendMessageToControlNode, because the user use SendPrivate is not ControlNode ?

Relate PR: #4969

Relate code/comment:

_, err := m.sender.SendPrivate(context.Background(), member, &rawMessage)

//TODO(frank): need support resending following message? if yes then we need to ensure SendMessageToControlNode

//TODO(frank): need support resending following message? if yes then we need to ensure SendMessageToControlNode

//TODO(frank): need support resending following message? if yes then we need to ensure SendMessageToControlNode

//TODO(frank): need support resending following message? if yes then we need to ensure SendMessageToControlNode

Hey Frank,

if the user is ControlNode, are we allowing ControlNode to send these 4 type of messages?
I don't think ControlNode should ever send these types of messages, as they are to be triggered by users. I don't think there is an explicit disallowance in status-go, but there is no way in the client to send a request to join if one is a control node already.

so we can just use SendPrivate directly rather than SendMessageToControlNode
For communities without owner token minted, all messages should be sent to community pubkey rather than to certain user. That's why we have SendMessageToControlNode helper, it determines, whether to send to the user (onwer) directly or to community id.

About the related comment: "TODO(frank): need support resending following message? if yes then we need to ensure SendMessageToControlNode will invoke SendPrivate internally"

If there are privileged members in the community, this should always be true: if !community.PublicKey().Equal(community.ControlNode()) {. That's because only holder of owner-token can nominate privileged members.

Btw. Why do we need to ensure SendMessageToControlNode will invoke SendPrivate?

Hi Patryk , thank you for your explaination!

Why do we need to ensure SendMessageToControlNode will invoke SendPrivate?

Because when i was trying to implement the resending logical, considering what the ResendMethod of a RawMessage is the key, we already have SendPrivate and SendCommunityMessage and i want to reuse these 2 functions, if SendMessageToControlNode is equal to SendPrivate , implementation would be much simpler.

Let's take following code as an example to discuss,

_, err := m.sender.SendPrivate(context.Background(), member, &rawMessage)

we can see that it used SendPrivate to send the raw message, and it reused the same variable rawMessage as param which means they will have the same value on rawMessage.ID, we have a field named Recipients for RawMessage which is used to determine who is the receiver(could be multi) of the raw message. If the same raw message always use SendPrivate to send, it's okay, but.. now think about this, we have a raw message that ID is 0x1, it could be sent with SendPrivate and SendCommunityMessage at the same time because SendMessageToControlNode could use SendPrivate or SendCommunityMessage! What are we going to do with such case?

we have a raw message that ID is 0x1, it could be sent with SendPrivate and SendCommunityMessage at the same time because SendMessageToControlNode could use SendPrivate or SendCommunityMessage! What are we going to do with such case?

What I tried to say last time, is that, it should not happen with the current flow. If SendMessageToControlNode uses SendCommunityMessage then privilegedMembers should be empty. In other words, if we use SendCommunityMessage, then sendPrivate is not used. If it is, then this is a bug most likely.

Let me explain more in deep:
Community is simply a keypair. When we create it, all messages to Community are sent through SendCommunityMessage, where recipient is Community pubkey. At that point, the sendPrivate should not be used. This is for historical reasons, because before, we couldn't deterministically obtain the owner pubkey of the community, so all users sent requests to join to Community pubkey, rather than to certain member. Also the transfer of ownership was different, we was able to re-import community by using another account.

Right now, with owner-token concept, the owner of the community is the one who holds the token and we have a mechanism to deterministically obtain the chat key of that owner. Also, the private key of the community is gone once the owner-token is minted. That's why we sendPrivate directly to owners in such communities.

Since we dropped import by private key, we could, in theory, use sendPrivate in communities without owner-token minted and send messages to the user rather than Community pubkey. Although, I wouldn't go that direction, because it will most likely break a lot of things on the way.

@qfrank qfrank force-pushed the feat/retry_send_message branch 2 times, most recently from 3865002 to d487673 Compare April 6, 2024 23:46
@qfrank
Copy link
Contributor Author

qfrank commented Apr 7, 2024

You may noticed messenger_raw_message_resend_test.go is placed in package api, this looks like a mess, however, if i moved it into package protocol, it would trigger import cycle issue (it's about the usage of backend and messenger). I prefer fix such mess in a separate PR, WDYT?

Copy link
Member

@vitvly vitvly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beast of a PR! Left some questions. And thanks a lot for very nice inline documentation.

protocol/messenger_raw_message_resend.go Show resolved Hide resolved
protocol/common/raw_message.go Show resolved Hide resolved
protocol/common/raw_message.go Outdated Show resolved Hide resolved
@qfrank
Copy link
Contributor Author

qfrank commented Apr 9, 2024

When run TestMessengerRawMessageResendTestSuite/TestMessageSent locally for 100 times, there is a possible to get following error:

        	Error:      	Received unexpected error:
        	            	MessengerResponse data not received
        	Test:       	TestMessengerRawMessageResendTestSuite/TestMessageSent

relate test code:

	waitOnMessengerResponse(&s.Suite, func(r *protocol.MessengerResponse) error {
		for _, m := range r.Messages() {
			if m.GetContentType() == protobuf.ChatMessage_CONTACT_REQUEST {
				contactRequest = m
				return nil
			}
		}
		return errors.New("contact request not received")
	}, s.bobMessenger)

After checking code and geth.log , I found the relate error:

ERROR[04-09|13:38:50.563|github.com/status-im/status-go/wakuv2/waku.go:1003]                                could not send message                   envelopeHash=0x2fad60891798d51c94298ae6cdc9ca1d54be4d8d6c52c5739c08bf77388bbeed pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xaeffe08b/rfc26 timestamp=1,712,641,130,562,699,000 error="not enough peers to publish"
image image

If you know something on this, do tell me, thank you! @vitvly cc @cammellos

@vitvly vitvly self-requested a review April 9, 2024 10:23
@qfrank
Copy link
Contributor Author

qfrank commented Apr 9, 2024

When run TestMessengerRawMessageResendTestSuite/TestMessageSent locally for 100 times, there is a possible to get following error:

        	Error:      	Received unexpected error:
        	            	MessengerResponse data not received
        	Test:       	TestMessengerRawMessageResendTestSuite/TestMessageSent

relate test code:

	waitOnMessengerResponse(&s.Suite, func(r *protocol.MessengerResponse) error {
		for _, m := range r.Messages() {
			if m.GetContentType() == protobuf.ChatMessage_CONTACT_REQUEST {
				contactRequest = m
				return nil
			}
		}
		return errors.New("contact request not received")
	}, s.bobMessenger)

After checking code and geth.log , I found the relate error:

ERROR[04-09|13:38:50.563|github.com/status-im/status-go/wakuv2/waku.go:1003]                                could not send message                   envelopeHash=0x2fad60891798d51c94298ae6cdc9ca1d54be4d8d6c52c5739c08bf77388bbeed pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xaeffe08b/rfc26 timestamp=1,712,641,130,562,699,000 error="not enough peers to publish"

image image
If you know something on this, do tell me, thank you! @vitvly cc @cammellos

Solved by custom DiscV5BootstrapNode, if you have better solution, welcome! cc @richard-ramos

Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the best solution. As we discussed, having your own custom network (like you do here) removes the dependency on the fleet reducing the flakiness of this test.
Good work!

@qfrank
Copy link
Contributor Author

qfrank commented Apr 10, 2024

Unfortunately, the error still can be reproduced, but the frequency seems lower than before(6/100). @richard-ramos

    messenger_raw_message_resend_test.go:292: response error: contact request not received
    messenger_raw_message_resend_test.go:298: 
        	Error Trace:	/Users/frank/development/go/src/github.com/status-im/status-go/api/messenger_raw_message_resend_test.go:298
        	            				/Users/frank/development/go/src/github.com/status-im/status-go/api/messenger_raw_message_resend_test.go:260
        	            				/Users/frank/development/go/src/github.com/status-im/status-go/api/messenger_raw_message_resend_test.go:61
        	            				/Users/frank/development/go/src/github.com/status-im/status-go/vendor/github.com/stretchr/testify/suite/suite.go:187
        	Error:      	Received unexpected error:
        	            	MessengerResponse data not received
        	Test:       	TestMessengerRawMessageResendTestSuite/TestMessageSent

@richard-ramos
Copy link
Member

I have run this a couple of times succesfully:

go test -run TestMessengerRawMessageResendTestSuite/TestMessageSent github.com/status-im/status-go/api -count 100 -v -timeout 10000s

please share the logs for when there's a failure. I'm thinking that adding a 2s sleep in the test after initializing the nodes should help, since WakuRelay takes some time to form a mesh

@qfrank
Copy link
Contributor Author

qfrank commented Apr 10, 2024

I have run this a couple of times succesfully:

go test -run TestMessengerRawMessageResendTestSuite/TestMessageSent github.com/status-im/status-go/api -count 100 -v -timeout 10000s

please share the logs for when there's a failure. I'm thinking that adding a 2s sleep in the test after initializing the nodes should help, since WakuRelay takes some time to form a mesh

image emm.. this time got less, only 2/100.

geth.log.zip
@richard-ramos

@qfrank
Copy link
Contributor Author

qfrank commented Apr 12, 2024

more specific and smaller range logs:
cut_console.log
cut_geth.log
@richard-ramos

@qfrank
Copy link
Contributor Author

qfrank commented Apr 30, 2024

FYI, append these lines into createAliceBobBackendAndLogin fixed issue:

        aliceWaku := s.aliceBackend.StatusNode().WakuV2Service()
	bobWaku := s.bobBackend.StatusNode().WakuV2Service()
	// NOTE: default MaxInterval is 10s, which is too short for the test
	err = tt.RetryWithBackOff(func() error {
		if len(aliceWaku.Peerstore().Addrs(bobWaku.PeerID())) > 0 {
			return nil
		}
		s.T().Logf("alice don't know bob's addresses")
		return errors.New("alice don't know bob's addresses")
	}, func(b *backoff.ExponentialBackOff){b.MaxInterval = 20 * time.Second})
	s.Require().NoError(err)
	err = tt.RetryWithBackOff(func() error {
		if len(bobWaku.Peerstore().Addrs(aliceWaku.PeerID())) > 0 {
			return nil
		}
		s.T().Logf("bob don't know alice's addresses")
		return errors.New("bob don't know alice's addresses")
	}, func(b *backoff.ExponentialBackOff){b.MaxInterval = 20 * time.Second})
	s.Require().NoError(err)
	s.Require().NoError(aliceWaku.DialPeerByID(bobWaku.PeerID().String()))
	s.Require().NoError(bobWaku.DialPeerByID(aliceWaku.PeerID().String()))

cc @richard-ramos @cammellos @vitvly @chaitanyaprem cheers!

@qfrank qfrank merged commit 22bea87 into develop May 1, 2024
7 checks passed
@qfrank qfrank deleted the feat/retry_send_message branch May 1, 2024 21:40
func (b *GethStatusBackend) CreateAccountAndLogin(request *requests.CreateAccount) (*multiaccounts.Account, error) {
// CreateAccountAndLogin creates a new account and logs in with it.
// NOTE: requests.CreateAccount is used for public, params.Option maybe used for internal usage.
func (b *GethStatusBackend) CreateAccountAndLogin(request *requests.CreateAccount, opts ...params.Option) (*multiaccounts.Account, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this changes expected here? I remember I added opts and later removed it.
Then this PR reintroduce it, and I don't see anywhere is using it. @qfrank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kaichaosun , it's been used here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants