Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete presence info and guild members #137

Open
InterLinked1 opened this issue Feb 6, 2023 · 15 comments
Open

Incomplete presence info and guild members #137

InterLinked1 opened this issue Feb 6, 2023 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@InterLinked1
Copy link
Contributor

InterLinked1 commented Feb 6, 2023

Opening a new ticket for this, there was some initial discussion of this on #135

I'm now able to get presence data, but it seems presence->size can be smaller than members->size in the event.

From looking at http.log, it seems the JSON array really only is that large, so perhaps this is an API issue:

discord/discord-api-docs#666

discord/discord-api-docs@12b414e

Since this was initially "undocumented", I wonder whether this is also an "undocumented limitation". It's also not consistent, now when I last tried it, I only got 28 presences, instead of 34 like before (still far under 77). I can confirm there's only one chunk received, so it's not as if the presences are getting split.

One workaround may be to retry any users we didn't get presence data in another request, filtered to such user(s), although it would be nice if the API response worked properly in the first place... oh well. It's very possible there's not much that can be done in this case.

Returning to the other thing I had mentioned, this seems like something that may not be quite right:

			struct discord_channel dchannel;
			struct discord_ret_channel ret2 = { .sync = &dchannel };
			cp->channel_id = channel->id;
			code = discord_get_channel(client, channel->id, &ret2);
			if (code != CCORD_OK) {
				continue;
			}
			/* At this point dchannel.member is NULL */

(It's hard to parse the official APIs, but this suggests that accessing recipients is the right way to get channel members: https://stackoverflow.com/questions/50840580/how-to-retrieve-a-list-of-discord-users-in-a-channel)

I thought perhaps maybe member is not set by default (like how the other API requires .presences = true in order to get presence data, but there doesn't seem to be any such boolean flag, so I would assume members should always be returned. Some of the other data is set, but this one is NULL.

This is from the HTTP logs:

17:42:25 TRACE discord-rest_ratelimit.c:252: [DISCORD_RATELIMIT] [null] Couldn't match known buckets to ':1:channels:REDACTED'
17:42:25 DEBUG discord-rest_ratelimit.c:301: [DISCORD_RATELIMIT] [miss] Match ':1:channels:REDACTED' to bucket
17:42:25 DEBUG discord-rest_ratelimit.c:357: [DISCORD_RATELIMIT] [miss] Remaining = 1 | Reset = 0
[2023-02-06 17:42:25.817]  ERROR[653098]: mod_discord.c:401 fetch_channels: Failed to fetch member list for channel REDACTED

Edit: I realize I should be using recipients rather than members, but both are actually NULL.

Looking at http.log, unfortunately this data doesn't seem to be present, either, which is incongruent with their API documentation to me.

So I'm not sure if either of these is very actionable, but I know there was a fix for something, so I guess that can be linked against this issue instead now.

@InterLinked1 InterLinked1 added the bug Something isn't working label Feb 6, 2023
@InterLinked1
Copy link
Contributor Author

Here is another thing that I am more sure is a bug, because I see the info in the HTTP request:

owner_id in struct discord_guild * also always appears to be 0. It seems it's not getting set properly from the JSON.

WS_RCV_TEXT [WEBSOCKETS] - 2023-02-6T22:42:25.901Z - wss://gateway.discord.gg/?v=10&encoding=json
{"t":"GUILD_CREATE","s":2,"op":0,"d":{"emojis":[],"owner_id":"1234567",
static void load_channels(struct discord *client, struct discord_guilds *guilds)
{
	int i, numguilds = guilds->size;
	struct discord_guild *gs = guilds->array;

	for (i = 0; i < numguilds; i++) {
		struct discord_guild *g = &gs[i];
		/* g->owner_id is 0 here */
	}
}

Perhaps something similar to the last fix?

@InterLinked1
Copy link
Contributor Author

Thankfully, retrying discord_request_guild_members over and over again mostly works: each time, the presence size is less than the member size, but I'm able to get it down to 2 failures after 5 or 6 retries in this way.

I am getting a memory leak when I allocate snowflakes for retrying guild members:

missed = calloc(1, sizeof(*missed));
struct discord_request_guild_members params = { 
				.guild_id = u->guild_id,
				.query = "", /* Empty string to return all members */
				.limit = presencefails, /* All members */
				.user_ids = missed, /* Filter to only users whose status we failed to get the first time */
				.presences = true, /* Include presences */
			};

I assumed since discord_request_guild_members is asynchronous, it would take care of freeing user_ids, since there's no way that I can, and there's no destroy callback or anything like that for this. Looking at the code, I don't see any indication that it would be freed, but is this a reasonable assumption or should this be freed another way?

@lcsmuller
Copy link
Collaborator

Here is another thing that I am more sure is a bug, because I see the info in the HTTP request

I don't think it's a bug.. the payload shown there is for when a GUILD CREATE event is triggered - but the callback you got there is for a discord_get_current_user_guilds() (if I'm not mistaken). The latter differs in the fact that it will return an array of partial guild objects.. Luckily you can enable caching for guild, so you should be able to retrieve that same guild object from the GUILD CREATE event

@InterLinked1
Copy link
Contributor Author

InterLinked1 commented Feb 7, 2023

It's actually for discord_set_on_ready(discord_client, &on_ready);, if that makes a difference.
I added discord_cache_enable(discord_client, DISCORD_CACHE_GUILDS);, but the guild owner still seems to be 0.

Another thing that seems to be happening is when I retry the presence requests to try to get them all, the gateway will close:

[2023-02-06 21:11:43.173] WARNING[665646]: mod_discord.c:378 on_guild_members_chunk: Guild REDACTED chunk has 77 members, but only 31 presences? Failed to fetch 46 presences
[2023-02-06 21:11:43.174]   DEBUG[665646]: mod_discord.c:417 on_guild_members_chunk: Retrying 46 presence requests
21:11:43 INFO  discord-gateway_dispatch.c:312: [DISCORD_GATEWAY] SEND REQUEST_GUILD_MEMBERS (1019 bytes) [@@@_22_@@@]
21:11:43 WARN  discord-gateway.c:354: [DISCORD_GATEWAY] CLOSE DISCORD_GATEWAY_CLOSE_REASON_DECODE_ERROR (code: 4002, 29 bytes): 'Error while decoding payload.'
21:11:43 WARN  discord-gateway.c:404: [DISCORD_GATEWAY] Gateway will attempt to reconnect and resume current session
21:11:43 INFO  discord-gateway.c:805: [DISCORD_GATEWAY] Reconnect attempt #1
21:11:43 TRACE discord-rest_ratelimit.c:247: [DISCORD_RATELIMIT] [41f9] Found a bucket match for ':1:gateway:bot'!
21:11:43 DEBUG discord-rest_ratelimit.c:357: [DISCORD_RATELIMIT] [41f9] Remaining = 0 | Reset = 1675735906241
21:11:43 INFO  discord-gateway.c:337: [DISCORD_GATEWAY] Connected, WS-Protocols: ''
21:11:43 TRACE discord-gateway.c:476: [DISCORD_GATEWAY] RCV DISCORD_GATEWAY_HELLO (124 bytes) [@@@_27_@@@]
21:11:43 INFO  discord-gateway_dispatch.c:213: [DISCORD_GATEWAY] SEND RESUME (140 bytes) [@@@_28_@@@]
21:11:43 TRACE discord-gateway.c:476: [DISCORD_GATEWAY] RCV DISCORD_GATEWAY_DISPATCH -> RESUMED (319 bytes) [@@@_29_@@@]
21:11:43 INFO  discord-gateway.c:237: [DISCORD_GATEWAY] Succesfully resumed a Discord session!
21:11:43 INFO  discord-gateway_dispatch.c:269: [DISCORD_GATEWAY] SEND HEARTBEAT (14 bytes) [@@@_30_@@@]
21:11:43 TRACE discord-gateway.c:476: [DISCORD_GATEWAY] RCV DISCORD_GATEWAY_HEARTBEAT_ACK (36 bytes) [@@@_31_@@@]
21:11:43 TRACE discord-gateway.c:324: [DISCORD_GATEWAY] PING: 27 ms

This seems to be the last thing in http.log before that happens, assuming the CONNECT corresponds to the reconnect:

{"url": "wss://gateway.discord.gg", "shards": 1, "session_start_limit": {"total": 1000, "remaining": 924, "reset_after": 41264047, "max_concurrency": 1}}
@@@_25_@@@
WS_RCV_CONNECT [WEBSOCKETS] - 2023-02-7T02:13:58.482Z - wss://gateway-us-east1-d.discord.gg/?v=10&encoding=json

Sometimes I'm able to get a few retries in when doing this.
I thought at first perhaps I'm doing the retries too fast, and need to add a delay to avoid rate limiting, but since the error is consistently Error while decoding payload., that suggests otherwise.

The most problematic aspect about the reconnect is mainly that my retires stop, so I never end up getting all the presences.

@lcsmuller
Copy link
Collaborator

Looking at the code, I don't see any indication that it would be freed, but is this a reasonable assumption or should this be freed another way?

Indeed, discord_request_guild_members() function should be augmented with an extra parameter just like the struct discord_ret_* from request's functions. Then you would be able to pass your cleanup methods and the likes.. the reason why this parameter is missing is because discord_request_guild_members() is actually a WebSockets function and not a HTTP request one - atm only the latter have a struct discord_ret_*

Can you solve this issue of yours without calloc'ing? Once the function is called it will immediately serialize the struct into a JSON, so concord doesn't need to keep a dynamic pointer around.. Alternatively, you could just do this:

missed = calloc(1, sizeof(*missed));
struct discord_request_guild_members params = { 
				.guild_id = u->guild_id,
				.query = "", /* Empty string to return all members */
				.limit = presencefails, /* All members */
				.user_ids = missed, /* Filter to only users whose status we failed to get the first time */
				.presences = true, /* Include presences */
			};
discord_request_guild_members(client, &params);
free(missed); // there we go

@InterLinked1
Copy link
Contributor Author

Okay, if it's done being used when it returns, then that's fine, I'll actually make it stack-allocated then, and just free the array, since that's what I was going to do initially before I presumed that wouldn't work. So I don't think the API needs to be changed, thanks for the clarification.

I think with that, I've been able to figure out or work around most things. I can work around channel members using permissions as a proxy, I can retry the presence requests, and I think the only things I'm stuck on are the guild owner being 0 (perhaps discord_set_on_ready can't be used for that?), the gateway connection closing when I do presence retries, and the segfault issue from #135 which happens every time I try to end the process. Thanks again for the help!

@lcsmuller lcsmuller self-assigned this Feb 7, 2023
@lcsmuller
Copy link
Collaborator

Another thing that seems to be happening is when I retry the presence requests to try to get them all, the gateway will close:

Honestly, never got that from Discord before. I'll see if I can replicate your issue when I find some time.

@InterLinked1
Copy link
Contributor Author

InterLinked1 commented Feb 7, 2023

Okay, so the decoding error is actually on their end, and now I see why:
Seems like this might occur when trying to add more user IDs than a buffer somewhere allows?
Looks like there's truncation and no closing JSON syntax, causing the API to whine and close the connection on us.

I removed some of the details in the middle, but this is from the log:

WS_SEND_TEXT [WEBSOCKETS] - 2023-02-7T12:02:40.133Z - wss://gateway.discord.gg/?v=10&encoding=json
{"op":8,"d":{"guild_id":"REDACTED","query":"","limit":58,"presences":true,"user_ids":["REDACTED", ..... lots of IDs removed here ....,"REDACTED"
@@@_22_@@@
WS_RCV_CLOSE(4002) [WEBSOCKETS] - 2023-02-7T12:02:40.160Z - wss://gateway.discord.gg/?v=10&encoding=json
Error while decoding payload.

If I limit my retry requests to 25 maximum per retry, instead of, say, 58, as in the example above, then this issue doesn't happen, so it smells a lot like buffer truncation somewhere to me.

@lcsmuller
Copy link
Collaborator

Okay, so the decoding error is actually on their end

It is actually a concord issue, atm we are doing a very naive approach of using fixed-sized buffers for storing our JSONs, hence the truncation you're experiencing... We will be changing all those cases to dynamically-sized buffers (in a recyclable manner), which will certainly fix this issue you're experiencing! Could you please open another issue for this? Thank you so much for spotting all these bugs on such short notice!

@HackerSmacker
Copy link
Collaborator

Hey, I went ahead and enlarged that buffer a bit; recompile on the dev branch and you should be good. We'll have a more proper fix out soon, though.

@InterLinked1
Copy link
Contributor Author

@HackerSmacker Thanks, did you mean to change the buffer size in discord_gateway_send_request_guild_members instead?

Increasing the buffer size in discord_gateway_send_identify didn't make any difference, but increasing from 1024 to 4096 there made it work for me. It was failing between 40 and 45 members for me so I think 2056 would have been sufficient for my use case as well. But in a larger guild, 2056 would probably be insufficient too.

@HackerSmacker
Copy link
Collaborator

I went ahead and did both, but, in theory, the buffer size that holds that identify-payload probably needs to be dynamically generated somewhere, to avoid running out of room on account of that exact issue.

@InterLinked1
Copy link
Contributor Author

Thanks!

Another strange thing I've noticed... my retries aren't actually working the way I thought they were. My algorithm is for each chunk of members we get, keep track of all the members for whom we fail to get presence data. We'll then make another request just for those members, and repeat over and over until we have all presences, or we make a request that returns no new presence data (so as not to infinite loop on this).

Looking at http.log, bizarrely the API responses contain presence info for users that weren't even in the request. So, this seems like it would be a Discord API bug, rather than a concord bug, since it's completely doing the opposite of what was asked of it. It's almost as if it's obeying the limit field but ignoring the user_ids array in the request, since they seem to have no connection. As a result, I generally end up getting back presence data I already have, typically for members near the beginning of the list.

Is it possible that the request is invalid, particularly user_ids? That's the only thing I can think of, and that seems to be the part that isn't being obeyed here.

I'll go ahead and open a new issue now for the buffer size.

@InterLinked1
Copy link
Contributor Author

This is 100% speculation, but just an idea after reading the documentation here: https://ptb.discord.com/developers/docs/topics/gateway-events#request-guild-members

It says the guild_id and user_ids should be snowflakes. Right now, they are both surrounded by strings. It seems to work for guild ID anyways, but I wonder if removing the quotes around the user_ids would fix this. Honestly, I have no idea though, just a thought. I don't know if snowflakes are supposed to be surrounded in quotes in JSON or if they can be lone like integers.

This happens even in the simplest case of limiting the query to a single user (and only providing one for user_ids). Something is definitely wrong, but not sure where.

@lcsmuller
Copy link
Collaborator

lcsmuller commented Feb 7, 2023

I don't know if snowflakes are supposed to be surrounded in quotes in JSON or if they can be lone like integers.

They are actually supposed to be wrapped in quotes, and this is due to a Javascript limitation, their number type is actually a double underneath meaning that for large enough integers we can no longer reliably represent those with a double (they would need something like uint64_t).

This is detailed here, concord actually receives snowflakes as strings and then we perform a conversion to uint64_t

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants