torch.uniform_() is single-threaded on CPU #125223
Labels
module: cpu
CPU specific problem (e.g., perf, algorithm)
module: distributions
Related to torch.distributions
module: nn
Related to torch.nn
module: performance
Issues related to performance, either of kernel code or framework glue
needs design
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Describe the bug
The
uniform_
function is single-threaded and slow, causingnn.Linear
to be slow during initialization. Since the first step to loading a model is to create an empty model, the creation of an empty LlamaModelForCausalLM alone costs 90 seconds while loading the state dict costs just a second. I expected that the creation ofnn.Linear
be as fast astorch.randn
.Versions
cc @msaroufim @fritzo @neerajprad @alicanb @nikitaved @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10
The text was updated successfully, but these errors were encountered: