Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue: Points from geobuf polygons use more array capacity than needed, wasting memory. #122

Open
TysonAndre opened this issue Jan 13, 2022 · 0 comments · May be fixed by #123

Comments

@TysonAndre
Copy link

Arrays in nodejs need to be able to quickly add elements without resizing frequently, so they have both a size and a capacity.

For example, in the geo-tz module (providing time zone data for the entire world), geobuf will create 'Polygon' objects with readLinePart, and those arrays will be created with size 2, and excess capacity(16) that is never freed.

Replacing coords.push(p) with coords.push(p.slice()) in node_modules/geobuf/decode.js resulted in memory use of loading the entire quad tree from 1,282,134,016 to 528,089,088 for me (1.28GB to 0.53GB) in 64-bit node.js - the latter does not have excess capacity

  • slice has been available for longer and was suggested for compatibility (node.js 0.10.0, internet explorer 4)

From babel/issues/6233

In V8, an empty array gets a buffer of 16 elements. This gives it a little bit of room to grow without needing reallocation. Once you add a 17th element, the buffer expands by 50%. This formula continues after that every time reallocation is needed.

Note that new Array(size) would be worse for performance(runtime) due to js needing more arrays to represent arrays with mixes of types and the optimizer not being able to generate more efficient code. That should be avoided.

Related to evansiroky/node-geo-tz#131

TysonAndre added a commit to TysonAndre/geobuf that referenced this issue Jan 17, 2022
Objects are modified in place, arrays are replaced with an array that
only has exactly the amount of capacity needed.

This is useful in cases where the polygons will be used for a long time.
By default, arrays are reserved with extra capacity that won't be used.
(The empty array starts with a capacity of 16 elements by now,
which is inefficient for decoded points of length 2)
slice() allocates a new array, seemingly with shrunken capacity
according to process.memoryUsage.

This has an optional option to deduplicate identical points,
which may be useful for collections of polygons sharing points as well
as for calling compress multiple times with different objects.
It's only safe for read-only uses, so it is disabled by default.

For example, in node-geo-tz issue 131, I saw this change to memory usage
and decoding time on Linux. This is useful for long-running processes
that repeatedly use the objects.

1. No Override:                               1.280 GB (1.8 seconds)
2. Defaults for cache(no numericArrayCache):  0.708 GB (3.4 seconds)
3. Adding the second Map (numericArrayCache): 0.435 GB (6.7 seconds)

Closes mapbox#122
TysonAndre added a commit to TysonAndre/geobuf that referenced this issue Jan 17, 2022
Objects are modified in place, arrays are replaced with an array that
only has exactly the amount of capacity needed.

This is useful in cases where the polygons will be used for a long time.
By default, arrays are reserved with extra capacity that won't be used.
(The empty array starts with a capacity of 16 elements by now,
which is inefficient for decoded points of length 2)
slice() allocates a new array, seemingly with shrunken capacity
according to process.memoryUsage.

This has an optional option to deduplicate identical points,
which may be useful for collections of polygons sharing points as well
as for calling compress multiple times with different objects.
It's only safe for read-only uses, so it is disabled by default.

For example, in node-geo-tz issue 131, I saw this change to memory usage
and decoding time on Linux. This is useful for long-running processes
that repeatedly use the objects.

1. No Override:                               1.280 GB (1.8 seconds)
2. Defaults for cache(no numericArrayCache):  0.708 GB (3.4 seconds)
3. Adding the second Map (numericArrayCache): 0.435 GB (6.7 seconds)

Closes mapbox#122
TysonAndre added a commit to TysonAndre/geobuf that referenced this issue Jan 17, 2022
Objects are modified in place, arrays are replaced with an array that
only has exactly the amount of capacity needed.

This is useful in cases where the polygons will be used for a long time.
By default, arrays are reserved with extra capacity that won't be used.
(The empty array starts with a capacity of 16 elements by now,
which is inefficient for decoded points of length 2)
slice() allocates a new array, seemingly with shrunken capacity
according to process.memoryUsage.

This has an optional option to deduplicate identical points,
which may be useful for collections of polygons sharing points as well
as for calling compress multiple times with different objects.
It's only safe for read-only uses, so it is disabled by default.

For example, in node-geo-tz issue 131, I saw this change to memory usage
and decoding time on Linux. This is useful for long-running processes
that repeatedly use the objects.

1. No Override:                               1.280 GB (1.8 seconds)
2. Defaults for cache(no numericArrayCache):  0.708 GB (3.4 seconds)
3. Adding the second Map (numericArrayCache): 0.435 GB (6.7 seconds)

Closes mapbox#122
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant