Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRR Client Crashes "Serialized message too large" #978

Open
bprykhodchenko opened this issue May 30, 2022 · 7 comments
Open

GRR Client Crashes "Serialized message too large" #978

bprykhodchenko opened this issue May 30, 2022 · 7 comments
Assignees

Comments

@bprykhodchenko
Copy link

Environment

  • The GRR is installed on a VM running on ESXi on-prem. The VM runs Ubuntu 18.04 and GRR was installed from DEB (using the official documentation)
  • GRR Version is 3.4.6.0
  • Ubuntu 18.04
  • Windows 10

Describe the issue
When I do the Memory Dump of All processes except GRR it works fine for some time, but at some point in time I get this message:

CRITICAL:2022-05-30 11:50:07,252 fleetspeak_client:117] Fatal error occurred:
Traceback (most recent call last):
File "site-packages\grr_response_client\fleetspeak_client.py", line 111, in _RunInLoop
File "site-packages\grr_response_client\fleetspeak_client.py", line 209, in _SendOp
File "site-packages\grr_response_client\fleetspeak_client.py", line 176, in _SendMessages
File "site-packages\fleetspeak\client_connector\connector.py", line 144, in Send
File "site-packages\fleetspeak\client_connector\connector.py", line 154, in _SendImpl
ValueError: Serialized message too large, size must be at most 2097152, got 2323650

So it doesn't like the message size. Now, the question, where this limit can be increased?

@ITPro17

This comment was marked as spam.

@max-vogler
Copy link
Member

Thanks for your report. This looks like a legit issue on the GRR client side, we'll look into it. Increasing this limit on the client side likely creates more problems on the server side, so changing the chunking logic or similar is probably the way forward.

@bprykhodchenko
Copy link
Author

Thanks for your report. This looks like a legit issue on the GRR client side, we'll look into it. Increasing this limit on the client side likely creates more problems on the server side, so changing the chunking logic or similar is probably the way forward.

So I have tried to decrease the Chunk size to 2000000, which is less then the agent able to receive and the same issue occured:

CRITICAL:2022-06-01 10:20:24,761 fleetspeak_client:117] Fatal error occurred:
Traceback (most recent call last):
File "site-packages\grr_response_client\fleetspeak_client.py", line 111, in _RunInLoop
File "site-packages\grr_response_client\fleetspeak_client.py", line 209, in _SendOp
File "site-packages\grr_response_client\fleetspeak_client.py", line 176, in _SendMessages
File "site-packages\fleetspeak\client_connector\connector.py", line 144, in Send
File "site-packages\fleetspeak\client_connector\connector.py", line 154, in _SendImpl
ValueError: Serialized message too large, size must be at most 2097152, got 2579672

So it is definitely something to be fixed in GRR.

@mbushkov mbushkov self-assigned this Jun 1, 2022
@mbushkov
Copy link
Collaborator

mbushkov commented Jun 7, 2022

Ok, so what happens here is pretty interesting. The issue, most definitely, happens on the client side and has nothing to do with how the server database is set up.

When working through Fleetspeak, the GRR client runs as a subprocess of the Fleetspeak client. They communicate through shared file descriptors. When a GRR client wants to send a message to its server, it sends a message to the Fleetspeak client on the same machine through the shared fd. Now, Fleetspeak client has a hard message size limit of 2mb:
https://github.com/google/fleetspeak/blob/93b2b9a40808306722875abbd5434af4634c6531/fleetspeak/src/client/channel/channel.go#L32

The issue happens because GRR tries to send a message that's bigger than 2mb. There's a dedicated check for this in the GRR client fleetspeak connector code (MAX_SIZE is set to 2Mb):
https://github.com/google/fleetspeak/blob/master/fleetspeak_python/fleetspeak/client_connector/connector.py#L151

GRR should be careful enough to chunk the messages. Not sure why chunking failed in this case - will investigate further.

@bprykhodchenko Could you please specify the exact flow arguments you used to reproduce the issue?

@mbushkov
Copy link
Collaborator

mbushkov commented Jun 7, 2022

I looked at the YaraProcessDump client action. It dumps the memory on disk and the sends back a data structure with information about all the processes:
https://github.com/google/grr/blob/master/grr/client/grr_response_client/client_actions/memory.py#L767

What this means: if the result proto is larger than 2Mb in the serialized form, the client action will fail. If the machine has a lot of memory and a lot of processes, then growing over 2Mb is likely possible. We need to look into either:

  • Chunking response, or
  • Increasing the limit from 2Mb to a higher value. I have to check what's the motivation for the 2Mb limit is.

@bprykhodchenko
Copy link
Author

Hello,
as for your question - I just run the YARA Memory Dump from UI, I do not use CLI to include specific command line arguments.

As for the solution,

  1. I was changing the chunk size to smaller then the client can "eat" (in flow parameters), but I was running into the same issue.
  2. Should I wait for a fixed version, OR
  3. Should I download the source code, change the MAX_SIZE in connect.py file to, say, 4 MB, and install the server from the source code?

@mbushkov
Copy link
Collaborator

mbushkov commented Jun 8, 2022

A few comments:

  1. The issue is related to how many processes you dump at the same time with a single flow. GRR client tries to send a data structure with memory regions map to the server and if this data structure is too big, you get the failure . One workaround option is to, for example, run 2 flows with process regexes, one matching processes with names from a to k, and the other one matching processes with names from l to z. That will likely help.
  2. The right fix is to make YaraProcessDump client actions chunk its output. I will look into this next week - unfortunately, can't provide an eta until I start working on it.
  3. Changing MAX_SIZE on the GRR side is only a part of the solution. 2Mb limit is also hardcoded on the Fleetspeak client side. Fleetspeak is written in Go and is shipped with GRR in binary form (see https://pypi.org/project/fleetspeak-client-bin/). You'd need to recompile the fleetspeak-client Go binary and replace the fleetspeak-client-bin package (see) in order for the fix to work. It's not exactly straight-forward, but if you're feeling adventurous, you can try it. Point to the relevant place in the Fleetspeak code: https://github.com/google/fleetspeak/blob/master/fleetspeak/src/client/channel/channel.go#L48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants