New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: load_state_dict should take filenames #1686
Comments
the same applies to optimizer state_dicts. for some optimizers like Adagrad, the checkpoints are large, and we can have the same memory pressure situation. optimizers dont even have a I ran into this while helping @aszlam today. |
If |
Me and @szagoruyko are fans of HDF5 format for serialized models, maybe if it could get along nicely with this proposal |
…b18ba1 (pytorch#15739) Summary: Pull Request resolved: pytorch#15739 Previous import was 765f5ee823a67a866f4bd28a9860e81f3c811ce8 Included changes: - **[8384c78](onnx/onnx@8384c78)**: add constantofshape (pytorch#1582) <Rui Zhu> - **[9afc06c](onnx/onnx@9afc06c)**: Set symbol visibility to hidden for non-Windows (pytorch#1707) <Paul Jesse Hellemn> - **[6f8a9f0](onnx/onnx@6f8a9f0)**: Revert "Add NonMaxSupression operator (pytorch#1695)" (pytorch#1702) <Lu Fang> - **[8b89544](onnx/onnx@8b89544)**: Add NonMaxSupression operator (pytorch#1695) <Hector Li> - **[0a7cc48](onnx/onnx@0a7cc48)**: Add bfloat16 support. (pytorch#1699) <Dmitri Smirnov> - **[da7c50c](onnx/onnx@da7c50c)**: ONNX does not maintain versions for experimental ops (pytorch#1696) <Ke Zhang> - **[0c8d857](onnx/onnx@0c8d857)**: Correct type of value_info in Graph (pytorch#1694) <Maik Riechert> - **[f612532](onnx/onnx@f612532)**: Fix typos (pytorch#1686) <Eundoo Song> Reviewed By: zrphercule Differential Revision: D13581674 fbshipit-source-id: a961667184b09d2822815ba5d3fa4198a4c57e88
…b18ba1 (#15739) Summary: Pull Request resolved: #15739 Previous import was 765f5ee823a67a866f4bd28a9860e81f3c811ce8 Included changes: - **[8384c78](onnx/onnx@8384c78)**: add constantofshape (#1582) <Rui Zhu> - **[9afc06c](onnx/onnx@9afc06c)**: Set symbol visibility to hidden for non-Windows (#1707) <Paul Jesse Hellemn> - **[6f8a9f0](onnx/onnx@6f8a9f0)**: Revert "Add NonMaxSupression operator (#1695)" (#1702) <Lu Fang> - **[8b89544](onnx/onnx@8b89544)**: Add NonMaxSupression operator (#1695) <Hector Li> - **[0a7cc48](onnx/onnx@0a7cc48)**: Add bfloat16 support. (#1699) <Dmitri Smirnov> - **[da7c50c](onnx/onnx@da7c50c)**: ONNX does not maintain versions for experimental ops (#1696) <Ke Zhang> - **[0c8d857](onnx/onnx@0c8d857)**: Correct type of value_info in Graph (#1694) <Maik Riechert> - **[f612532](onnx/onnx@f612532)**: Fix typos (#1686) <Eundoo Song> Reviewed By: zrphercule Differential Revision: D13581674 fbshipit-source-id: 8f8ee86a05a86fe99bf94509148c559ea3df1464
…b18ba1 (pytorch#15739) Summary: Pull Request resolved: pytorch#15739 Previous import was 765f5ee823a67a866f4bd28a9860e81f3c811ce8 Included changes: - **[8384c78](onnx/onnx@8384c78)**: add constantofshape (pytorch#1582) <Rui Zhu> - **[9afc06c](onnx/onnx@9afc06c)**: Set symbol visibility to hidden for non-Windows (pytorch#1707) <Paul Jesse Hellemn> - **[6f8a9f0](onnx/onnx@6f8a9f0)**: Revert "Add NonMaxSupression operator (pytorch#1695)" (pytorch#1702) <Lu Fang> - **[8b89544](onnx/onnx@8b89544)**: Add NonMaxSupression operator (pytorch#1695) <Hector Li> - **[0a7cc48](onnx/onnx@0a7cc48)**: Add bfloat16 support. (pytorch#1699) <Dmitri Smirnov> - **[da7c50c](onnx/onnx@da7c50c)**: ONNX does not maintain versions for experimental ops (pytorch#1696) <Ke Zhang> - **[0c8d857](onnx/onnx@0c8d857)**: Correct type of value_info in Graph (pytorch#1694) <Maik Riechert> - **[f612532](onnx/onnx@f612532)**: Fix typos (pytorch#1686) <Eundoo Song> Reviewed By: zrphercule Differential Revision: D13581674 fbshipit-source-id: 8f8ee86a05a86fe99bf94509148c559ea3df1464
In high memory pressure situations, the following is a common occurrence:
Because of memory pressure, a common workaround is to first do:
And then load
s
intomodel
.This is a very common scenario that we should be able to avoid, and this scenario might have some pitfalls: what happens on part-GPU part-CPU models, what happens on multi-GPU models...
if load_state_dict took a filename directly, it can delete it's existing parameter storages and set them to the new one on the fly, thereby requiring no extra memory.
The text was updated successfully, but these errors were encountered: