Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage #623

Open
bearprevent opened this issue Sep 8, 2022 · 3 comments
Open

Memory usage #623

bearprevent opened this issue Sep 8, 2022 · 3 comments
Assignees
Labels
2023-triage api: logging Issues related to the googleapis/python-logging API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@bearprevent
Copy link

Environment details

  • OS type and version: Ubuntu 22.04.1
  • Python version: 3.10.4
  • pip version: 22.0.2
  • google-cloud-logging version: 3.2.2

Steps to reproduce

Loading the logging library already uses more than 40MB.
This is quite a burden for smaller systems.

Is there any alternative way of using this library without such a memory footprint?

  1. python -c 'import os; print(os.system(f"ps -q {os.getpid()} -o rss="))
    - 7888 KB
  2. python -c 'import os; import google.cloud.logging; print(os.system(f"ps -q {os.getpid()} -o rss="))'
    - 50424 KB
@product-auto-label product-auto-label bot added the api: logging Issues related to the googleapis/python-logging API. label Sep 8, 2022
@losalex losalex added type: question Request for information or clarification. Not an issue. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Sep 8, 2022
@daniel-sanche
Copy link
Contributor

We don't currently have a smaller-footprint version of the library, but I'll try to investigate ways to slim it down

@daniel-sanche daniel-sanche added priority: p3 Desirable enhancement or fix. May not be included in next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Nov 23, 2022
@pnico
Copy link

pnico commented May 3, 2023

We have an appengine app running entirely on F1 instances (384 MB limit). We often get "Exceeded memory limit" warnings and/or instances are killed if one of our endpoints uses too much memory, so avoiding/mitigating this is a routine part of development for us. Can I assume this library isn't officially recommended for our use case?

Edit: hm, I suppose if I am going to ask that, I might want to actually describe our use case besides just "limited resources" :) and once I do that, my question might answer itself anyway. but I did some checking that might be interesting for others..

We use appengine (originally in Python 2, >10 years ago), and we do all our analysis of appengine logs through BigQuery with a log sink. So for us, use of the cloud logging console is not that important, and we don't do any other calls to the cloud logging API. We just want structured logging like we had in the old days of Python 2. If it can't be exactly the same (protoPayload.line.logMessage etc) that's fine as long as we have basically the same capabilities. It turns out that for this, we don't need this library at all, because we can just do a custom LogFormatter that returns a json string as described here (with the exception that we aren't using print(), instead we return the string from LogFormatter.format()) and this seems to work just fine - GCP is detecting that we are emitting a json string and does the rest for us, the logs can be correlated and the structure is correctly preserved in BigQuery through the log sink (as well as in the cloud logging console, of course).

However, we do also use some other google cloud services. Currently we use google-api-python-client for these, which is now deprecated, discouraged, shunned, shamed etc :D - so we're thinking ok, might be time to update to the shiny new python clients for these, and if we're doing this anyway, probably they have similar dependencies. For that matter, even googleapiclient shares some dependencies with them. So if we're already using other google API client libraries, we might already be importing much of this stuff anyway, in which case adding more now might not have much added impact. Right?

After some checking, and going back to the critical part of our use case where RAM is precious, my conclusion for now is we should just keep using the discovery APIs for everything unless there is something we really can't do with them or Google decides to force us to stop using it (against our will and requirements):

>>> import psutil
>>> psutil.Process().memory_info().rss / (1024 * 1024)
12.46875
>>> import google.cloud.logging
>>> psutil.Process().memory_info().rss / (1024 * 1024)
54.390625

>>> import psutil
>>> psutil.Process().memory_info().rss / (1024 * 1024)
12.53515625
>>> import google.cloud.storage
>>> psutil.Process().memory_info().rss / (1024 * 1024)
39.2460937

>>> import psutil
>>> psutil.Process().memory_info().rss / (1024 * 1024)
12.43359375
>>> import googleapiclient.discovery
>>> psutil.Process().memory_info().rss / (1024 * 1024)
28.1015625
>>> import google.cloud.storage
>>> psutil.Process().memory_info().rss / (1024 * 1024)
43.59375
>>> import google.cloud.logging
>>> psutil.Process().memory_info().rss / (1024 * 1024)
58.4765625

We do get some socket errors with googleapiclient, for now we're getting by with just retrying with backoff although it would be better obviously to not have them. The google-api-python-client project is in maintenance mode, but now it looks like there might actually be one final release that could fix some of these issues: https://github.com/googleapis/google-api-python-client/milestone/1 One can dream, anyway.

If every google-cloud client library is going to tack on additional MBs of memory usage that we can't afford, and googleapiclient lets us access them all with just one import (maybe not with the first-class experience afforded to those with the means to upgrade), it seems like a pretty solid reason for us not to migrate to the newer libraries.

@daniel-sanche
Copy link
Contributor

Hey pnico,

I also responded this in a different thread but in case it's helpful here as well: App Engine and serverless GCP environments can use the StructuredLogHandler provided by this library to take advantage of the environment's built-in stdout capture. Importing and this Handler alone should significantly cut down on memory usage, since you can avoid the overhead needed for making network calls.

Or alternatively, you can use the StructuredLogHandler as a template and make an even more slimmed-down version, if you're only interested in certain logging fields.

Let me know if code samples related to this would be helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023-triage api: logging Issues related to the googleapis/python-logging API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

5 participants