Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Ignore community descriptions from ourselves if we are the control node #5155

Draft
wants to merge 1 commit into
base: release/0.177.x
Choose a base branch
from

Conversation

MishkaRogachev
Copy link
Contributor

@MishkaRogachev MishkaRogachev commented May 13, 2024

For status-im/status-desktop#14700

My assumption: controlNode receives and processes its own protobuf.CommunityDescription which can lead to a hang of the receiver thread. It might explain why one community's prolonged reevaluation would cause another community to freeze up.

@status-im-auto
Copy link
Member

status-im-auto commented May 13, 2024

Jenkins Builds

Click to see older builds (12)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 2c68cbf #1 2024-05-13 21:46:39 ~3 min linux 📦zip
✔️ 2c68cbf #1 2024-05-13 21:48:03 ~5 min ios 📦zip
✔️ 2c68cbf #1 2024-05-13 21:48:42 ~5 min android 📦aar
✔️ 2c68cbf #1 2024-05-13 22:24:18 ~41 min tests 📄log
✖️ 49736df #2 2024-05-14 17:48:50 ~1 min tests 📄log
✔️ 49736df #2 2024-05-14 17:49:13 ~1 min linux 📦zip
✔️ 49736df #2 2024-05-14 17:49:44 ~2 min android 📦aar
✔️ 49736df #2 2024-05-14 17:54:37 ~7 min ios 📦zip
✔️ 16aa2ca #3 2024-05-14 17:55:50 ~1 min linux 📦zip
✔️ 16aa2ca #3 2024-05-14 17:56:12 ~1 min android 📦aar
✔️ 16aa2ca #3 2024-05-14 17:57:57 ~3 min ios 📦zip
✖️ 16aa2ca #3 2024-05-14 18:30:37 ~36 min tests 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ a9f454d #4 2024-05-14 20:49:38 ~1 min android 📦aar
✔️ a9f454d #4 2024-05-14 20:49:47 ~1 min linux 📦zip
✔️ a9f454d #4 2024-05-14 20:51:00 ~2 min ios 📦zip
✖️ a9f454d #4 2024-05-14 21:24:01 ~35 min tests 📄log
✖️ a9f454d #5 2024-05-14 22:48:38 ~35 min tests 📄log
✔️ 7c5bcf8 #5 2024-05-15 19:42:00 ~1 min linux 📦zip
✔️ 7c5bcf8 #5 2024-05-15 19:42:14 ~1 min android 📦aar
✔️ 7c5bcf8 #5 2024-05-15 19:43:33 ~3 min ios 📦zip
✖️ 7c5bcf8 #6 2024-05-15 20:16:43 ~36 min tests 📄log

@jrainville
Copy link
Member

because shards are being saved for each accepted member

I think this is the issue, not the lock. Why would we save the shard when a new member is added? This seems like an oversight

@MishkaRogachev
Copy link
Contributor Author

I think this is the issue, not the lock. Why would we save the shard when a new member is added? This seems like an oversight

I reread the description and realized I didn't quite put it right. SaveCommunityShard will be called on the community member side, but not on the control node. I did some more tests and now I'm more inclined to believe that the problem is a hang in the receiving thread. Now I'm trying to reproduce it in the test

@jrainville
Copy link
Member

I reread the description and realized I didn't quite put it right. SaveCommunityShard will be called on the community member side, but not on the control node.

Ah ok yeah that makes more sense. I doubt it's the root cause of the issue though.

I did some more tests and now I'm more inclined to believe that the problem is a hang in the receiving thread. Now I'm trying to reproduce it in the test

Yeah sadly it's more probable. Have you asked Iuri if he can receive messages on the control node?

@MishkaRogachev MishkaRogachev force-pushed the fix/community-membership-request-pending-stuck branch from 2c68cbf to 49736df Compare May 14, 2024 17:47
@MishkaRogachev MishkaRogachev changed the title Fix: Remove community lock from save and delete community shard Fix: Ignore community descriptions from ourselves if we are the control node May 14, 2024
community, err := m.GetByID(id)
if err != nil && err != ErrOrgNotFound {
return nil, err
}

// We should not process the description from ourselves if we are the control node
if community != nil && community.IsControlNode() && common.IsPubKeyEqual(community.ControlNode(), signer) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osmaczko i think a clock check is extra here

@MishkaRogachev MishkaRogachev marked this pull request as ready for review May 14, 2024 17:52
@MishkaRogachev MishkaRogachev force-pushed the fix/community-membership-request-pending-stuck branch 2 times, most recently from 16aa2ca to a9f454d Compare May 14, 2024 20:47
@MishkaRogachev
Copy link
Contributor Author

@osmaczko @mprakhov many tests fail because they check both the response on call and a signal from their change of community description. If it is confirmed that the fix helps, I can fix these tests, but I have a little doubt whether it will break the app. Are there places in the desktop code where instead of a response from the rpc call we expect a signal with a changed community description from ourselves?

community, err := m.GetByID(id)
if err != nil && err != ErrOrgNotFound {
return nil, err
}

// We should not process the description from ourselves if we are the control node
if community != nil && community.IsControlNode() && common.IsPubKeyEqual(community.ControlNode(), signer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if community != nil && community.IsControlNode() && common.IsPubKeyEqual(community.ControlNode(), signer) {
if community != nil && community.IsControlNode() {

Should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the token master or another administrator updates the description? I think in this case the description should be handled by the management node, so checking signer of received message make sense for me

Copy link
Contributor

@osmaczko osmaczko May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be only one control node (one device) at given time.

@@ -3596,16 +3611,10 @@ func (m *Manager) GetCommunityShard(communityID types.HexBytes) (*shard.Shard, e
}

func (m *Manager) SaveCommunityShard(communityID types.HexBytes, shard *shard.Shard, clock uint64) error {
m.communityLock.Lock(communityID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it really be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible not to remove locs here, but add a similar check to the HandleCommunityPublicShardInfo and HandleCommunityShardKey, not sure if it is cleaner

return m.persistence.SaveCommunityShard(communityID, shard, clock)
}

func (m *Manager) DeleteCommunityShard(communityID types.HexBytes) error {
m.communityLock.Lock(communityID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it really be removed?

@osmaczko
Copy link
Contributor

Are there places in the desktop code where instead of a response from the rpc call we expect a signal with a changed community description from ourselves?

I wouldn't be surprised. If that's the case, then it is wrong. Although, it wouldn't be an issue for control node, because we do signal changes done to our own community on publish:

// signal client with published community
if m.config.messengerSignalsHandler != nil {
if recentlyPublishedOrg == nil || community.Clock() > recentlyPublishedOrg.Clock() {
response := &MessengerResponse{}
response.AddCommunity(community)
m.config.messengerSignalsHandler.MessengerResponse(response)
}
}

The problem may exist for token-masters and admins though.

@MishkaRogachev MishkaRogachev force-pushed the fix/community-membership-request-pending-stuck branch from a9f454d to 7c5bcf8 Compare May 15, 2024 19:40
Comment on lines +1806 to +1809
// We should not process the community description on the control node
if community != nil && community.IsControlNode() {
return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid question: what if control node restore his account by a seed phrase? Won't we loose CommunityDescriptions which were not added to the backup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! We won't merge this PR, we need to take a more comprehensive approach for the next milestone.

@MishkaRogachev MishkaRogachev marked this pull request as draft May 16, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants