feat: support lazy loading the lora module for reducing the loading p… #434

thincal · 2024-04-23T07:13:25Z

What does this PR do?

Fixes #433

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Was this discussed/approved via a Github issue or the discord / slack channel? Please add a link
to it if that's the case.
Did you write any new necessary tests?

Who can review?

thincal · 2024-04-24T02:40:48Z

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

…lace

thincal · 2024-05-19T02:25:54Z

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

Still keep cache the filenames instead of filehandles, since that 1) safe_open needs the device info which differs during loading the lora modules, 2) safe_open is lazy loading until specific tensor loaded by get_tensor invoked, which is already the optimized behavior for our case.

thincal · 2024-05-19T02:27:18Z

@tgaddair could you help review this change ?

tgaddair

Looks great @thincal, thanks for the PR, and apologies for the slow review!

I had one question about the file handle, but happy to land this and iterate on it to see if there's any room to further optimize.

server/lorax_server/utils/adapter.py

thincal · 2024-05-25T12:22:43Z

Looks great @thincal, thanks for the PR, and apologies for the slow review!

I had one question about the file handle, but happy to land this and iterate on it to see if there's any room to further optimize.

It is fine to land it firstly, since the safe_open is already lazy behavior and main overhead is about reading out the specific tensor.

tgaddair · 2024-05-25T20:13:23Z

@thincal I noticed there's a failing test:

FAILED server/tests/adapters/test_medusa.py::test_batched_medusa_weights - safetensors_rust.SafetensorError: device cpu is invalid

Would you be able to take a look before we merge? We should be good to go once that's resolved.

thincal marked this pull request as draft April 23, 2024 10:55

thincal marked this pull request as ready for review April 23, 2024 13:36

tgaddair mentioned this pull request May 3, 2024

Improve async load for adapters to avoid main thread lockups in server #457

Open

LS added 3 commits May 19, 2024 10:03

feat: support lazy loading the lora module for reducing the loading p…

be06bc8

…lace

fix: store the layer:filename pair in module_map for lazy loading

9ca9cab

fix: add missing imports

9b1ac96

thincal force-pushed the feat/support-lazy-loading-lora-module branch from bad816f to 9b1ac96 Compare May 19, 2024 02:03

tgaddair approved these changes May 23, 2024

View reviewed changes

server/lorax_server/utils/adapter.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support lazy loading the lora module for reducing the loading p… #434

feat: support lazy loading the lora module for reducing the loading p… #434

thincal commented Apr 23, 2024

thincal commented Apr 24, 2024 •

edited

thincal commented May 19, 2024 •

edited

thincal commented May 19, 2024

tgaddair left a comment

thincal commented May 25, 2024

tgaddair commented May 25, 2024

feat: support lazy loading the lora module for reducing the loading p… #434

Are you sure you want to change the base?

feat: support lazy loading the lora module for reducing the loading p… #434

Conversation

thincal commented Apr 23, 2024

What does this PR do?

Before submitting

Who can review?

thincal commented Apr 24, 2024 • edited

thincal commented May 19, 2024 • edited

thincal commented May 19, 2024

tgaddair left a comment

Choose a reason for hiding this comment

thincal commented May 25, 2024

tgaddair commented May 25, 2024

thincal commented Apr 24, 2024 •

edited

thincal commented May 19, 2024 •

edited