imgproc: add optimized warpAffine kernels for 8U/16U/32F + C1/C3/C4 inputs #25984

fengyuentau · 2024-08-02T10:19:22Z

Merge wtih opencv/opencv_extra#1198.
Merge with opencv/opencv_contrib#3787.

Perf:

M2 (16GB ram, with fp16 vector intrinsics support)

Geometric mean (ms)

                                 Name of Test                                   base  patch   patch   
                                                                                                vs    
                                                                                               base   
                                                                                            (x-factor)
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1)      0.276 0.102    2.71   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC1)     0.338 0.137    2.46   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC1)     0.293 0.116    2.54   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC3)      0.447 0.140    3.19   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC3)     0.525 0.287    1.83   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC3)     0.391 0.258    1.52   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4)      0.405 0.155    2.62   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC4)     0.613 0.348    1.76   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC4)     0.420 0.317    1.33   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1)     0.274 0.095    2.89   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC1)    0.330 0.141    2.35   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC1)    0.294 0.167    1.76   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC3)     0.439 0.135    3.26   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC3)    0.528 0.284    1.86   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC3)    0.382 0.266    1.43   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4)     0.411 0.162    2.53   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC4)    0.624 0.353    1.77   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC4)    0.423 0.344    1.23   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC1)     0.516 0.178    2.90   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC1)    0.563 0.262    2.15   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC1)    0.516 0.219    2.36   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC3)     0.838 0.286    2.93   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC3)    0.902 0.552    1.63   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC3)    0.738 0.523    1.41   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC4)     0.825 0.305    2.70   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC4)    1.050 0.677    1.55   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC4)    0.815 0.649    1.26   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC1)    0.719 0.616    1.17   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC1)   0.750 0.700    1.07   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC1)   0.677 0.652    1.04   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC3)    1.167 0.737    1.58   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC3)   1.172 1.056    1.11   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC3)   0.907 0.989    0.92   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC4)    1.242 0.793    1.57   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC4)   1.419 1.223    1.16   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC4)   1.013 1.136    0.89   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    1.024 0.285    3.60   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   1.020 0.365    2.79   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   1.002 0.329    3.05   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    1.641 0.384    4.28   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   1.704 0.714    2.39   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   1.450 0.717    2.02   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    1.652 0.422    3.92   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   1.867 0.870    2.15   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   1.680 1.049    1.60   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   1.734 2.364    0.73   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  1.661 2.484    0.67   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  1.567 2.461    0.64   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   2.781 2.635    1.06   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  2.545 3.059    0.83   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  2.086 2.983    0.70   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   3.061 2.799    1.09   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  3.050 3.449    0.88   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  2.373 3.222    0.74

Intel i7-12700K

Geometric mean (ms)

                                 Name of Test                                   base  patch   patch
                                                                                                vs
                                                                                               base
                                                                                            (x-factor)
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1)      0.149 0.047    3.17
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC1)     0.181 0.046    3.96
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC1)     0.163 0.036    4.50
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC3)      0.196 0.115    1.70
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC3)     0.374 0.107    3.48
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC3)     0.228 0.108    2.11
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4)      0.180 0.146    1.23
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC4)     0.446 0.141    3.17
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC4)     0.298 0.140    2.13
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1)     0.144 0.047    3.08
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC1)    0.182 0.046    3.97
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC1)    0.162 0.036    4.50
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC3)     0.196 0.117    1.67
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC3)    0.366 0.108    3.40
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC3)    0.227 0.105    2.15
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4)     0.180 0.148    1.22
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC4)    0.446 0.141    3.16
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC4)    0.300 0.148    2.03
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC1)     0.314 0.105    2.99
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC1)    0.377 0.103    3.65
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC1)    0.353 0.085    4.15
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC3)     0.401 0.241    1.66
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC3)    0.617 0.231    2.67
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC3)    0.866 0.230    3.77
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC4)     0.373 0.301    1.24
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC4)    1.087 0.344    3.17
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC4)    0.763 0.300    2.55
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC1)    0.406 0.244    1.66
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC1)   0.435 0.245    1.77
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC1)   0.401 0.230    1.74
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC3)    0.561 0.418    1.34
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC3)   0.759 0.400    1.90
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC3)   0.572 0.395    1.45
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC4)    0.590 0.484    1.22
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC4)   0.933 0.477    1.95
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC4)   0.669 0.462    1.45
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    0.547 0.165    3.32
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   0.554 0.163    3.39
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   0.547 0.143    3.82
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    0.629 0.322    1.96
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   0.847 0.308    2.75
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   2.616 0.333    7.86
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    0.606 0.390    1.55
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   2.548 0.380    6.71
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   1.766 0.866    2.04
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   0.847 0.880    0.96
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  0.885 0.862    1.03
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  0.759 0.829    0.92
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   1.384 1.239    1.12
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  1.594 1.169    1.36
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  1.375 1.126    1.22
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   1.546 1.272    1.22
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  1.909 1.260    1.51
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  1.396 1.317    1.06

Khadas VIM3 (A311D, 4xA73+2xA53, no fp16 vector intrinsics support)

Geometric mean (ms)

                                 Name of Test                                    base  patch    patch
                                                                                                  vs
                                                                                                 base
                                                                                              (x-factor)
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1)      2.052  0.579     3.54
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC1)     3.237  0.523     6.20
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC1)     2.603  0.592     4.40
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC3)      4.153  1.048     3.96
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC3)     5.225  1.327     3.94
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC3)     3.388  1.665     2.03
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4)      3.746  1.353     2.77
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC4)     6.727  1.784     3.77
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC4)     4.006  2.308     1.74
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1)     1.875  0.578     3.24
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC1)    3.046  0.587     5.19
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC1)    2.577  0.533     4.83
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC3)     4.072  1.041     3.91
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC3)    4.907  1.327     3.70
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC3)    3.512  1.656     2.12
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4)     3.689  1.361     2.71
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC4)    6.403  1.795     3.57
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC4)    4.335  2.317     1.87
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC1)     2.869  1.249     2.30
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC1)    4.047  1.274     3.18
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC1)    3.358  1.770     1.90
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC3)     5.454  2.365     2.31
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC3)    6.603  2.949     2.24
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC3)    6.177  4.540     1.36
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC4)     6.166  2.988     2.06
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC4)    7.685  3.993     1.92
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC4)    7.637  6.011     1.27
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC1)    4.427  2.940     1.51
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC1)   6.068  2.981     2.04
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC1)   4.316  2.946     1.47
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC3)    7.169  5.075     1.41
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC3)   8.606  5.576     1.54
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC3)   7.533  5.995     1.26
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC4)    8.751  5.985     1.46
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC4)   15.192 6.713     2.26
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC4)   7.056  7.726     0.91
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    8.686  2.083     4.17
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   10.927 2.132     5.13
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   7.764  3.213     2.42
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    14.588 3.600     4.05
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   17.251 4.896     3.52
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   15.349 8.764     1.75
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    14.623 4.480     3.26
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   19.076 6.394     2.98
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   18.237 10.422    1.75
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   12.835 10.112    1.27
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  17.167 10.208    1.68
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  12.486 9.994     1.25
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   22.759 16.134    1.41
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  29.616 16.531    1.79
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  22.666 16.851    1.35
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   27.193 18.530    1.47
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  33.831 18.593    1.82
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  20.932 19.754    1.06

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=Linux OpenCL

modules/imgproc/src/warp_affine.simd.hpp

modules/imgproc/perf/perf_warp.cpp

modules/imgproc/CMakeLists.txt

fengyuentau · 2024-08-14T07:03:43Z

Decided to drop warpPerspective kernels from this PR for now. Will go working on adding kernels for 8UC1/C4 and 32FC1/C3/C4. Will consider 16U as well.

fengyuentau · 2024-08-27T02:08:28Z

Not sure why OCL tests failed since this PR should touch nothing about OCL.

Example: https://github.com/opencv/opencv/actions/runs/10557877242/job/29246289055?pr=25984

modules/core/include/opencv2/core/hal/intrin_neon.hpp

asmorkalov · 2024-08-27T07:05:12Z

modules/core/include/opencv2/core/hal/intrin_neon.hpp

+#if CV_NEON_AARCH64
+    return v_int32x4(vcvtmq_s32_f32(a.val));
+#else


The instruction is available on aarch32 too https://developer.arm.com/architectures/instruction-sets/intrinsics/vcvtmq_s32_f32

I was struggled at armv7 support. There is no existing macros indicating if target supports only armv7.

There is manual "CAROTENE_NEON_ARCH" option. Looks like we need build check or handle march option correctly.

need build check or handle march option correctly.

That sounds like a dedicated PR. How about guard it (as well as the above one) with CV_NEON_AARCH64 for now?

Please use __ARM_ARCH > 7 check. I have armv7 board and test it regularly.

modules/imgproc/perf/perf_warp.cpp

modules/imgproc/src/warp_kernels.simd.hpp

modules/imgproc/test/test_imgwarp_strict.cpp

asmorkalov · 2024-08-27T07:17:52Z

I propose to backport TS related changes and rounding/floor optimization to 4.x to reduce merge conflicts and bring the global optimization too.

asmorkalov · 2024-08-28T09:30:10Z

@fengyuentau I benchmarked your code (couple of commits ago). See details in archive. There are single thread benchmarks also. The patch looks good for ARM, but there are some stable regressions on x86. Please take a look.
perf-warpAffine.zip

…e accurate algo in asift

fengyuentau · 2024-09-12T08:36:45Z

There is strange degradation on Jetson Orin for 32FC3 and 32FC4 and 4K resolution. It's not visible on other platforms, so I suspect it's some cache effect. Could you check it with Mac?

@asmorkalov Performance degradation happens on BORDER_REPLICATE.

WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    3.694  0.955     3.87   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   3.393  1.006     3.37   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   3.366  0.993     3.39   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    5.894  1.063     5.54   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   5.409  1.500     3.61   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   5.591  2.310     2.42   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    6.274  1.419     4.42   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   5.935  2.024     2.93   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   5.916  3.704     1.60   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   7.342  12.895    0.57   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  6.965  12.967    0.54   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  6.318  12.725    0.50   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   11.902 14.625    0.81   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  10.698 15.114    0.71   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  8.697  14.357    0.61   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   13.588 15.516    0.88   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  12.654 16.278    0.78   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  9.618  15.251    0.63

fengyuentau · 2024-09-12T08:37:32Z

@vpisarev @asmorkalov I updated code with the use of algo hint.

fengyuentau · 2024-09-24T06:48:21Z

@asmorkalov @vpisarev Could you review this PR? I already proceeded with this branch and implemented the new warpPerspective kernels. Want to create a new PR for that.

asmorkalov · 2024-09-26T06:57:01Z

modules/imgproc/perf/opencl/perf_imgwarp.cpp

    const Size srcSize = get<0>(params);
    const int type = get<1>(params), interpolation = get<2>(params);
-    const double eps = CV_MAT_DEPTH(type) <= CV_32S ? 1 : interpolation == INTER_CUBIC ? 2e-3 : 1e-4;
+    const double eps = CV_MAT_DEPTH(type) <= CV_32S ? 2 : interpolation == INTER_CUBIC ? 2e-3 : 3e-2;


Looks like the change does not make sense as soon as you introduced AlgorithmHint. I reverted the change and no not see regressions with Intel OpenCL (iGPU) and NVIDIA GF 1080.

asmorkalov · 2024-09-26T07:22:15Z

modules/imgproc/test/ocl/test_warp.cpp

 {
    for (int j = 0; j < test_loop_times; j++)
    {
-        double eps = depth < CV_32F ? 0.04 : 0.06;


I propose to convert it to conditions with explicit types. We have fp16, int64, bool that comes after fp64. Also it's hard to understand the condition.

asmorkalov · 2024-09-26T07:22:29Z

modules/imgproc/test/ocl/test_warp.cpp

 {
    for (int j = 0; j < test_loop_times; j++)
    {
-        double eps = depth < CV_32F ? 0.04 : 0.06;


The same proposal here.

modules/imgproc/test/test_imgwarp_strict.cpp

slightly alter threshold for warpAffine optimization #3787 Merge with opencv/opencv#25984 New`onfusionMatrixes[1]` is ``` [[45 0 0 0 0 0 0 0 0 0] [ 0 57 0 0 0 0 0 0 0 0] [ 0 0 58 2 0 0 0 0 1 0] [ 0 0 0 43 0 0 0 1 0 0] [ 0 0 0 0 39 0 0 0 0 1] [ 0 0 0 1 0 49 0 0 1 0] [ 0 0 0 0 0 0 52 0 0 0] [ 0 0 1 0 0 0 0 54 0 0] [ 0 0 0 0 0 0 0 0 47 0] [ 0 1 0 1 0 0 0 0 2 44]] ``` which is about of pixel value 1 shift in each 4x or 5x pixel value. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

fengyuentau added the category: imgproc label Aug 2, 2024

fengyuentau added this to the 5.0 milestone Aug 2, 2024

fengyuentau requested review from asmorkalov and vpisarev August 2, 2024 10:19

fengyuentau force-pushed the imgproc/warpaffine_opt branch 3 times, most recently from a3f2942 to bb00632 Compare August 3, 2024 12:53

fengyuentau commented Aug 5, 2024

View reviewed changes

modules/imgproc/src/warp_affine.simd.hpp Outdated Show resolved Hide resolved

fengyuentau force-pushed the imgproc/warpaffine_opt branch from 4a641e1 to 3fb43dd Compare August 6, 2024 06:34

fengyuentau commented Aug 6, 2024

View reviewed changes

modules/imgproc/perf/perf_warp.cpp Show resolved Hide resolved

fengyuentau commented Aug 6, 2024

View reviewed changes

modules/imgproc/perf/perf_warp.cpp Show resolved Hide resolved

This was referenced Aug 6, 2024

Update perf data for warpAffine opencv/opencv_extra#1197

Closed

Update perf data for warpAffine opencv/opencv_extra#1198

Merged

fengyuentau marked this pull request as ready for review August 6, 2024 10:11

This comment was marked as resolved.

Sign in to view

fengyuentau force-pushed the imgproc/warpaffine_opt branch from 0af05d4 to 7abaafe Compare August 7, 2024 03:39

This comment was marked as outdated.

Sign in to view

vpisarev reviewed Aug 9, 2024

View reviewed changes

modules/imgproc/CMakeLists.txt Outdated Show resolved Hide resolved

fengyuentau added the optimization label Aug 14, 2024

This comment was marked as resolved.

Sign in to view

asmorkalov reviewed Aug 27, 2024

View reviewed changes

fengyuentau added 2 commits September 12, 2024 16:21

add new warpAffine kernel; improve tests

c285918

use functions instead of class for the new kernels; add algo hint; us…

abdaeab

…e accurate algo in asift

fengyuentau force-pushed the imgproc/warpaffine_opt branch from 2e30217 to abdaeab Compare September 12, 2024 08:25

fengyuentau added 2 commits September 12, 2024 17:52

split macros

fe04e3e

fix a bug in the 8UC4 kernel

1f1bbed

add missing tail processings

2e5ad6d

asmorkalov reviewed Sep 26, 2024

View reviewed changes

asmorkalov modified the milestones: 5.0, 5.0-alpha Sep 27, 2024

asmorkalov added 2 commits October 1, 2024 15:02

Code review fixes.

02ac60c

Added fp64 capability check before warping call.

ecddf69

asmorkalov self-assigned this Oct 1, 2024

Code review fix.

8ad0d8e

asmorkalov approved these changes Oct 1, 2024

View reviewed changes

vpisarev self-requested a review October 1, 2024 13:48

vpisarev approved these changes Oct 1, 2024

View reviewed changes

asmorkalov mentioned this pull request Oct 2, 2024

New CPU warpAffine INTER_LINEAR diverges with OpenCL implementation #26235

Open

4 tasks

Added bug reference.

397b6e7

This was referenced Oct 3, 2024

incorrect result of cv2.remap #25937

Closed

[Draft] WarpAffine optimized with updated opencl kernel #26242

Closed

asmorkalov merged commit 97681bd into opencv:5.x Oct 3, 2024
26 of 27 checks passed

asmorkalov mentioned this pull request Oct 4, 2024

Features2d_Detectors.regression test fails after the new warpAffine integration opencv/opencv_contrib#3804

Open

fengyuentau deleted the imgproc/warpaffine_opt branch October 9, 2024 03:51

vpisarev mentioned this pull request Oct 9, 2024

New shiny Imgproc module for OpenCV 5.0 #25012

Open

Uh oh!

imgproc: add optimized warpAffine kernels for 8U/16U/32F + C1/C3/C4 inputs #25984

imgproc: add optimized warpAffine kernels for 8U/16U/32F + C1/C3/C4 inputs #25984

Uh oh!

Conversation

fengyuentau commented Aug 2, 2024 • edited by opencv-alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

fengyuentau commented Aug 14, 2024

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

fengyuentau commented Aug 27, 2024

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asmorkalov commented Aug 27, 2024

Uh oh!

asmorkalov commented Aug 28, 2024

Uh oh!

fengyuentau commented Sep 12, 2024

Uh oh!

fengyuentau commented Sep 12, 2024

Uh oh!

fengyuentau commented Sep 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fengyuentau commented Aug 2, 2024 •

edited by opencv-alalek

Loading