Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compress function to return object with reduced memory usage #123

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

TysonAndre
Copy link

Objects are modified in place, arrays are replaced with an array that
only has exactly the amount of capacity needed.

This is useful in cases where the polygons will be used for a long time.
By default, arrays are reserved with extra capacity that won't be used.
(The empty array starts with a capacity of 16 elements by now,
which is inefficient for decoded points of length 2)
slice() allocates a new array, seemingly with shrunken capacity
according to process.memoryUsage.

This has an optional option to deduplicate identical points,
which may be useful for collections of polygons sharing points as well
as for calling compress multiple times with different objects.
It's only safe for read-only uses, so it is disabled by default.

For example, in node-geo-tz issue 131, I saw this change to memory usage
and decoding time on Linux (time zone polygons for the entire world map). This is useful for long-running processes
that repeatedly use the objects.

  1. No Override: 1.280 GB (1.8 seconds)
  2. Defaults for cache(no numericArrayCache): 0.708 GB (3.4 seconds)
  3. Adding the second Map (numericArrayCache): 0.435 GB (6.7 seconds)

Note that if the object is not kept around, there's wouldn't be a reason to call compress.

What are your thoughts about adding an optional boolean to decode(pbf, compressData = false), and calling compress if compressData === true)
(strict equality to guard against accidentally passing extra parameters from Array.prototype.forEach)?

Closes #122

Objects are modified in place, arrays are replaced with an array that
only has exactly the amount of capacity needed.

This is useful in cases where the polygons will be used for a long time.
By default, arrays are reserved with extra capacity that won't be used.
(The empty array starts with a capacity of 16 elements by now,
which is inefficient for decoded points of length 2)
slice() allocates a new array, seemingly with shrunken capacity
according to process.memoryUsage.

This has an optional option to deduplicate identical points,
which may be useful for collections of polygons sharing points as well
as for calling compress multiple times with different objects.
It's only safe for read-only uses, so it is disabled by default.

For example, in node-geo-tz issue 131, I saw this change to memory usage
and decoding time on Linux. This is useful for long-running processes
that repeatedly use the objects.

1. No Override:                               1.280 GB (1.8 seconds)
2. Defaults for cache(no numericArrayCache):  0.708 GB (3.4 seconds)
3. Adding the second Map (numericArrayCache): 0.435 GB (6.7 seconds)

Closes mapbox#122
@TysonAndre TysonAndre force-pushed the compress branch 2 times, most recently from 91511be to 1714a03 Compare January 17, 2022 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue: Points from geobuf polygons use more array capacity than needed, wasting memory.
1 participant