Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev mv code from modules to functional #10420

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

lihuizhao
Copy link
Contributor

@lihuizhao lihuizhao commented Jan 25, 2024

移动nn.modules下的代码到nn.functional下

代码移动:

  1. 将interpolate、affine_grid、grid_sample、linear、layer_norm、group_norm、embedding、sparse_softmax_cross_entropy、upsample、relu6函数从nn.modules下移动到nn.functional。
  2. 将对应的modules类中的forward()函数修改为调用nn.functional下的函数的形式。
  3. 在nn.modules中新增Interpolate、AffineGrid、GridSample、SparseSoftmaxCrossEntropy类,并实现代码功能

测试代码:

  1. 在test_upsample.py文件中添加upsample函数测试代码
  2. 在test_interpolate.py中添加Interpolate类测试代码,pytorch中没有Interpolate类
  3. 在test_affine_grid.py中添加AffineGrid类测试代码,pytorch中没有AffineGrid类
  4. pytorch中没有GridSample类
  5. pytorch中没有SparseSoftmaxCrossEntropy类
  6. 在test_layer_norm.py文件中添加LayerNorm类的测试代码

Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.5ms (= 4351.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.4ms (= 5736.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.4ms / 43.5ms)

OneFlow resnet50 time: 26.2ms (= 2624.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 3738.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.42 (= 37.4ms / 26.2ms)

OneFlow resnet50 time: 18.7ms (= 3735.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.4ms (= 7076.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.89 (= 35.4ms / 18.7ms)

OneFlow resnet50 time: 16.3ms (= 3261.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 34.0ms (= 6803.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 2.09 (= 34.0ms / 16.3ms)

OneFlow resnet50 time: 17.4ms (= 3481.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.7ms (= 5742.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.65 (= 28.7ms / 17.4ms)

OneFlow swin dataloader time: 0.202s (= 40.333s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.642s / 200, num_workers=1)
Relative speed: 0.636 (= 0.128s / 0.202s)

OneFlow swin dataloader time: 0.055s (= 10.962s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.517s / 200, num_workers=4)
Relative speed: 0.594 (= 0.033s / 0.055s)

OneFlow swin dataloader time: 0.031s (= 6.155s / 200, num_workers=8)
PyTorch swin dataloader time: 0.016s (= 3.297s / 200, num_workers=8)
Relative speed: 0.536 (= 0.016s / 0.031s)

❌ OneFlow resnet50 time: 49.1ms (= 4905.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.3ms (= 6432.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 64.3ms / 49.1ms)

OneFlow resnet50 time: 35.8ms (= 3582.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.6ms (= 4560.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 45.6ms / 35.8ms)

OneFlow resnet50 time: 28.2ms (= 5633.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.1ms (= 7811.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.39 (= 39.1ms / 28.2ms)

OneFlow resnet50 time: 24.9ms (= 4976.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.4ms (= 7680.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 38.4ms / 24.9ms)

OneFlow resnet50 time: 24.3ms (= 4852.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.9ms (= 7175.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.48 (= 35.9ms / 24.3ms)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件名为啥叫 sparse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原来这个函数是放在nn/modules/sparse.py中的,它就命名为sparse,我把它迁移到nn/functional下,没有改原文件名

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以跟 interpolate 合并到一起

Copy link
Contributor

Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.9ms (= 4391.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.2ms (= 5718.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.30 (= 57.2ms / 43.9ms)

OneFlow resnet50 time: 26.3ms (= 2628.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 3807.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.45 (= 38.1ms / 26.3ms)

OneFlow resnet50 time: 18.2ms (= 3647.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.2ms (= 7249.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.99 (= 36.2ms / 18.2ms)

OneFlow resnet50 time: 17.1ms (= 3424.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 34.3ms (= 6851.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 2.00 (= 34.3ms / 17.1ms)

OneFlow resnet50 time: 16.3ms (= 3266.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 30.2ms (= 6045.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.85 (= 30.2ms / 16.3ms)

OneFlow swin dataloader time: 0.200s (= 39.981s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.400s / 200, num_workers=1)
Relative speed: 0.635 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.053s (= 10.515s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.429s / 200, num_workers=4)
Relative speed: 0.611 (= 0.032s / 0.053s)

OneFlow swin dataloader time: 0.031s (= 6.179s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.370s / 200, num_workers=8)
Relative speed: 0.545 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 49.1ms (= 4906.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.7ms (= 6471.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 64.7ms / 49.1ms)

OneFlow resnet50 time: 36.6ms (= 3660.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.4ms (= 4644.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 46.4ms / 36.6ms)

OneFlow resnet50 time: 28.1ms (= 5615.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.1ms (= 8227.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.47 (= 41.1ms / 28.1ms)

OneFlow resnet50 time: 25.1ms (= 5026.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.7ms (= 7747.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 38.7ms / 25.1ms)

OneFlow resnet50 time: 23.5ms (= 4699.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.2ms (= 7231.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 36.2ms / 23.5ms)

Copy link
Contributor

Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.6ms (= 4363.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 58.0ms (= 5798.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.33 (= 58.0ms / 43.6ms)

OneFlow resnet50 time: 26.7ms (= 2669.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.7ms (= 3866.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.45 (= 38.7ms / 26.7ms)

OneFlow resnet50 time: 18.9ms (= 3772.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.0ms (= 7191.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.91 (= 36.0ms / 18.9ms)

OneFlow resnet50 time: 15.5ms (= 3096.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.1ms (= 6219.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 2.01 (= 31.1ms / 15.5ms)

OneFlow resnet50 time: 15.9ms (= 3188.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.5ms (= 5898.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.85 (= 29.5ms / 15.9ms)

OneFlow swin dataloader time: 0.202s (= 40.367s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.525s / 200, num_workers=1)
Relative speed: 0.632 (= 0.128s / 0.202s)

OneFlow swin dataloader time: 0.054s (= 10.752s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.700s / 200, num_workers=4)
Relative speed: 0.623 (= 0.033s / 0.054s)

OneFlow swin dataloader time: 0.031s (= 6.247s / 200, num_workers=8)
PyTorch swin dataloader time: 0.016s (= 3.264s / 200, num_workers=8)
Relative speed: 0.522 (= 0.016s / 0.031s)

❌ OneFlow resnet50 time: 49.1ms (= 4906.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.5ms (= 6553.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 65.5ms / 49.1ms)

OneFlow resnet50 time: 35.7ms (= 3570.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.7ms (= 4567.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 45.7ms / 35.7ms)

OneFlow resnet50 time: 28.3ms (= 5666.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8266.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.46 (= 41.3ms / 28.3ms)

OneFlow resnet50 time: 25.8ms (= 5163.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.7ms (= 8136.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 40.7ms / 25.8ms)

OneFlow resnet50 time: 24.5ms (= 4898.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.9ms (= 7185.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.47 (= 35.9ms / 24.5ms)

return res


def layer_norm(input, normalized_shape: tuple, weight=None, bias=None, eps=1e-05):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以加一下文档,参照上面的 group norm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以及 type hinting

norm_type=2.0,
scale_grad_by_freq=False,
sparse=False,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样,这里加上 type hinting,另外文档中有 example 代码,结尾要加上

if __name__ == "__main__":
    import doctest

    doctest.testmod(raise_on_error=True)

import oneflow as flow


def linear(input, weight, bias=None):
Copy link
Contributor

@marigoold marigoold Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hinting 以及结尾的测试

Copy link
Contributor

github-actions bot commented Feb 2, 2024

Copy link
Contributor

github-actions bot commented Feb 5, 2024

Copy link
Contributor

@levi131 levi131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hinting 部分没有问题

Copy link
Contributor

Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.3ms (= 4328.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.7ms (= 5769.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.33 (= 57.7ms / 43.3ms)

OneFlow resnet50 time: 26.6ms (= 2659.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.0ms (= 3795.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.43 (= 38.0ms / 26.6ms)

OneFlow resnet50 time: 19.1ms (= 3823.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.4ms (= 7284.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.91 (= 36.4ms / 19.1ms)

OneFlow resnet50 time: 17.8ms (= 3561.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 32.4ms (= 6475.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.82 (= 32.4ms / 17.8ms)

OneFlow resnet50 time: 16.9ms (= 3387.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.1ms (= 5821.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.72 (= 29.1ms / 16.9ms)

OneFlow swin dataloader time: 0.214s (= 42.828s / 200, num_workers=1)
PyTorch swin dataloader time: 0.130s (= 25.928s / 200, num_workers=1)
Relative speed: 0.605 (= 0.130s / 0.214s)

OneFlow swin dataloader time: 0.056s (= 11.215s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.462s / 200, num_workers=4)
Relative speed: 0.576 (= 0.032s / 0.056s)

OneFlow swin dataloader time: 0.032s (= 6.413s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.319s / 200, num_workers=8)
Relative speed: 0.518 (= 0.017s / 0.032s)

❌ OneFlow resnet50 time: 49.3ms (= 4925.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 63.9ms (= 6390.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 63.9ms / 49.3ms)

OneFlow resnet50 time: 36.8ms (= 3684.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 48.5ms (= 4850.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 48.5ms / 36.8ms)

OneFlow resnet50 time: 28.2ms (= 5648.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 44.5ms (= 8901.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 44.5ms / 28.2ms)

OneFlow resnet50 time: 26.2ms (= 5241.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.9ms (= 8182.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 40.9ms / 26.2ms)

OneFlow resnet50 time: 24.2ms (= 4847.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.4ms (= 7687.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 38.4ms / 24.2ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants