Skip to content
Tao Luo edited this page Dec 9, 2019 · 1 revision

wangkuiyi

Yu Yang

Number of GPUs 1 2 3 4
Image/Sec 18.639 27.8863 39.3787 52.9688
Speed Up N/A 1.4961264 2.11270454 2.84182628
  • Helping debuging C++ readers.

wuyi

luotao

Yan Xu

ranqiu

kexinzhao

Chenxi

tangwei

-[RDMA/GPUDriect] https://github.com/PaddlePaddle/Paddle/issues/9405

-Benchmark image_classification

Weixing

Doc:

tonyyang-svail

PR

Issue:

wanghaoshuang

Dang qingqing

Yibing Liu

Code Review:

Liu Yiqun

  • Inference Framework
    • Verify the correctness of resnet50
    • Analysis the profiling data of Fluid and TensorRT
    • Start the work of integrating TensorRT
  • Mobile
    • Support the MDL group

guosheng

zhaochengduo

qiaolongfei

fluid

fengjiayi

  • Profiling of C++ Reader:

instance/sec

Net Config Simple Demo Net VGG16
V2 Reader 819.11 57.49
V2 Reader with cache - 58.9
C++ Reader 1629.88 61.44
C++ Reader with DoubleBuffer 2382.13 DOING

gongweibao

Xin Pan

  • Improve LayerNorm speed by 3x-4x. transformer speed up 15%~20%
  • Follow up on P40 machines and configuration
    • Have enough machine to develop and evaluate performance
    • Have same configuration as Paddle Cloud machines
    • Have 1 machine for continuous model evaluation.
  • Follow up on 5.1 Paddle Cloud goals
  • Review ParallelExecutor and ParallelGPUExecutor and profile speed

yangyaming

abhinavarora

Yan Chunwei

dongzhihong

helinwang

cs2be(thuan)

PR:

Discussions:

jetfuel(Jeff Wang)

PR:

Research and Demo

nickyfantasy

PR:

daming-lu

Clone this wiki locally