Discrepancy Between Documented and Actual Memory Usage for ML Model Allocations in Elasticsearch #107829

oldcodeoberyn · 2024-04-24T10:42:09Z

Description

The current Elasticsearch documentation describes that scaling throughput by adding more allocations to a deployment allows for more parallel inference requests and that all allocations assigned to a node share the same copy of the model in memory.

Throughput can be scaled by adding more allocations to the deployment; it increases the number of inference requests that can be performed in parallel. All allocations assigned to a node share the same copy of the model in memory. The model is loaded into memory in a native process that encapsulates libtorch, which is the underlying machine learning library of PyTorch. The number of allocations setting affects the amount of model allocations across all the machine learning nodes. Model allocations are distributed in such a way that the total number of used threads does not exceed the allocated processors of a node.

However, in practice, each additional allocation requires extra memory, and this increase appears to be linear with the number of allocations. and finally, we will reach the memory limitation by scale up allocation

elasticsearchmachine · 2024-04-24T10:44:48Z

Pinging @elastic/ml-core (Team:ML)

oldcodeoberyn added >enhancement needs:triage Requires assignment of a team area label :ml Machine learning and removed needs:triage Requires assignment of a team area label >enhancement labels Apr 24, 2024

elasticsearchmachine added the Team:ML Meta label for the ML team label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy Between Documented and Actual Memory Usage for ML Model Allocations in Elasticsearch #107829

Discrepancy Between Documented and Actual Memory Usage for ML Model Allocations in Elasticsearch #107829

oldcodeoberyn commented Apr 24, 2024 •

edited

elasticsearchmachine commented Apr 24, 2024

Discrepancy Between Documented and Actual Memory Usage for ML Model Allocations in Elasticsearch #107829

Discrepancy Between Documented and Actual Memory Usage for ML Model Allocations in Elasticsearch #107829

Comments

oldcodeoberyn commented Apr 24, 2024 • edited

Description

elasticsearchmachine commented Apr 24, 2024

oldcodeoberyn commented Apr 24, 2024 •

edited