FIX: Keep the intermediate values in float64 #414

rdbisme · 2022-07-02T20:23:58Z

No description provided.

rdbisme · 2022-07-02T20:33:42Z

This has a some impact on performance, I'd like to have the opinion on if we should merge it or not. @stroxler?

       before           after         ratio
     [4d6b551c]       [54cea238]
     <v1.3.5^0>       <float64-intermediate>
            22.2M            22.4M     1.01  memory.Memory.peakmem_nanmedian
+     4.16±0.02μs      4.84±0.01μs     1.16  move.Time1DMove.time_move_max('float64', (1000,), 10)
+        2.12±0μs      2.83±0.03μs     1.33  move.Time1DMove.time_move_mean('float32', (1000,), 10)
+       160±0.2μs          232±2μs     1.45  move.Time1DMove.time_move_mean('float32', (100000,), 10)
+     19.6±0.01ms      25.4±0.09ms     1.30  move.Time1DMove.time_move_mean('float32', (10000000,), 10)
-     4.85±0.01μs      4.38±0.01μs     0.90  move.Time1DMove.time_move_min('float32', (1000,), 10)
+     4.42±0.01μs      5.42±0.02μs     1.23  move.Time1DMove.time_move_std('float32', (1000,), 10)
+       381±0.4μs        491±0.6μs     1.29  move.Time1DMove.time_move_std('float32', (100000,), 10)
+     41.7±0.03ms      52.7±0.03ms     1.26  move.Time1DMove.time_move_std('float32', (10000000,), 10)
+        1.88±0μs      2.72±0.02μs     1.45  move.Time1DMove.time_move_sum('float32', (1000,), 10)
+       134±0.6μs          216±2μs     1.61  move.Time1DMove.time_move_sum('float32', (100000,), 10)
+     17.0±0.01ms      23.8±0.01ms     1.40  move.Time1DMove.time_move_sum('float32', (10000000,), 10)
+     3.62±0.01μs      4.26±0.03μs     1.18  move.Time1DMove.time_move_var('float32', (1000,), 10)
+       306±0.3μs        378±0.3μs     1.23  move.Time1DMove.time_move_var('float32', (100000,), 10)
+     34.2±0.02ms      41.4±0.02ms     1.21  move.Time1DMove.time_move_var('float32', (10000000,), 10)
+     8.21±0.02ms       9.74±0.3ms     1.19  move.Time2DMove.time_move_argmax('int64', (1000, 1000), 'C', 0, 10)
-      9.22±0.4ms       7.74±0.3ms     0.84  move.Time2DMove.time_move_argmax('int64', (1000, 1000), 'F', 1, 10)
+     11.4±0.03ms       12.9±0.3ms     1.13  move.Time2DMove.time_move_argmin('float64', (1000, 1000), 'C', 0, 10)
+     8.33±0.02ms       9.79±0.3ms     1.18  move.Time2DMove.time_move_argmin('int64', (1000, 1000), 'C', 0, 10)
+     10.8±0.04ms       12.5±0.3ms     1.16  move.Time2DMove.time_move_max('float64', (1000, 1000), 'C', 0, 10)
+      10.6±0.1ms       11.9±0.5ms     1.13  move.Time2DMove.time_move_max('float64', (1000, 1000), 'F', 1, 10)
+     7.87±0.01ms       9.44±0.3ms     1.20  move.Time2DMove.time_move_max('int64', (1000, 1000), 'C', 0, 10)
+        1.60±0ms         2.18±0ms     1.36  move.Time2DMove.time_move_mean('float32', (1000, 1000), 'C', 1, 10)
+     4.05±0.04ms      6.68±0.05ms     1.65  move.Time2DMove.time_move_mean('float64', (1000, 1000), 'C', 0, 10)
+     3.87±0.01ms      6.49±0.06ms     1.68  move.Time2DMove.time_move_mean('int64', (1000, 1000), 'C', 0, 10)
+     11.1±0.04ms       12.4±0.3ms     1.12  move.Time2DMove.time_move_min('float64', (1000, 1000), 'C', 0, 10)
+     8.00±0.01ms       9.55±0.3ms     1.19  move.Time2DMove.time_move_min('int64', (1000, 1000), 'C', 0, 10)
-      8.08±0.4ms      7.26±0.03ms     0.90  move.Time2DMove.time_move_min('int64', (1000, 1000), 'F', 1, 10)
+        3.84±0ms         4.93±0ms     1.29  move.Time2DMove.time_move_std('float32', (1000, 1000), 'C', 1, 10)
+     7.41±0.03ms       9.04±0.1ms     1.22  move.Time2DMove.time_move_std('float64', (1000, 1000), 'C', 0, 10)
+      7.43±0.2ms       9.01±0.2ms     1.21  move.Time2DMove.time_move_std('int64', (1000, 1000), 'C', 0, 10)
+        1.35±0ms         2.02±0ms     1.50  move.Time2DMove.time_move_sum('float32', (1000, 1000), 'C', 1, 10)
+     3.94±0.02ms      6.57±0.06ms     1.67  move.Time2DMove.time_move_sum('float64', (1000, 1000), 'C', 0, 10)
+     3.79±0.01ms      6.54±0.04ms     1.73  move.Time2DMove.time_move_sum('int64', (1000, 1000), 'C', 0, 10)
-      3.64±0.3ms      2.88±0.03ms     0.79  move.Time2DMove.time_move_sum('int64', (1000, 1000), 'F', 1, 10)
+        3.10±0ms         3.82±0ms     1.23  move.Time2DMove.time_move_var('float32', (1000, 1000), 'C', 1, 10)
+     7.24±0.05ms       8.00±0.2ms     1.10  move.Time2DMove.time_move_var('float32', (1000, 1000), 'F', 1, 10)
+     6.22±0.07ms       8.17±0.1ms     1.31  move.Time2DMove.time_move_var('float64', (1000, 1000), 'C', 0, 10)
+     5.91±0.05ms      7.95±0.08ms     1.34  move.Time2DMove.time_move_var('int64', (1000, 1000), 'C', 0, 10)
+      5.47±0.1ms      6.21±0.06ms     1.14  move.Time2DMove.time_move_var('int64', (1000, 1000), 'F', 1, 10)
-         757±5ns          634±3ns     0.84  nonreduce_axis.Time1DNonreduceAxis.time_push('int32', (1000,))
-        881±10ns          742±4ns     0.84  nonreduce_axis.Time1DNonreduceAxis.time_push('int64', (1000,))
+     1.35±0.01ms      1.54±0.07ms     1.14  reduce.Time2DReductions.time_nanargmax('int64', (1000, 1000), 'C', 0)
-         391±3ns          354±2ns     0.91  reduce.TimeAllNan2D.time_allnan('float32', (1000, 1000), 'C', None, 'fast')
-         398±2ns          351±3ns     0.88  reduce.TimeAllNan2D.time_allnan('float64', (1000, 1000), 'C', None, 'fast')
-         598±4ns          524±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 0, 'fast')
-         595±4ns          524±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 0, 'slow')
-         598±2ns          525±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 1, 'fast')
-         597±4ns          522±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 1, 'slow')
-         599±2ns          527±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 0, 'fast')
-         597±2ns          524±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 0, 'slow')
-         598±2ns          526±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 1, 'fast')
-         600±4ns          525±2ns     0.87  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 1, 'slow')
-         599±3ns          524±4ns     0.87  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 0, 'fast')
-         596±2ns          525±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 0, 'slow')
-         600±1ns          524±3ns     0.87  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 1, 'fast')
-         599±5ns          524±2ns     0.87  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 1, 'slow')
-         601±4ns          526±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 0, 'fast')
-         595±4ns          525±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 0, 'slow')
-         598±2ns          525±4ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 1, 'fast')
-         597±6ns          524±5ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 1, 'slow')
-         388±1ns        349±0.7ns     0.90  reduce.TimeAnyNan2D.time_anynan('float32', (1000, 1000), 'C', None, 'fast')

jkadbear · 2022-07-06T07:30:57Z

This commit has a little overhead for float32 but does affect float64/int64/int32 calculations. However, it seems the performance also varies for float64/int64/int32 from the results?

jkadbear · 2022-07-06T08:15:41Z

Besides, this commit only changes float32 version of move_sum, move_mean, move_std and move_var. These operations contain intermediate moving window summations which are not accurate enough under float32. Other functions like move_max need not to be modified because it has no summation.

The performace test uses sequence length of 10000000. The following example compares the current move_std accuracy for such long sequence:

import numpy as np
import bottleneck as bn
x = np.random.random(10000000)
s64 = bn.move_std(x, 10)
s32 = bn.move_std(x.astype('float32'), 10)

print(s64)
print(s32)
print(np.nan_to_num(s64-s32).max())

The output on my machine is

[       nan        nan        nan ... 0.33617391 0.31365248 0.31293675]
[       nan        nan        nan ... 0.3361321  0.31345436 0.31280586]
0.003174458068280983

The max difference between float64 and float32 is about 3e-3 when data is ideal (sampled from a uniform distribution). I do not think it is a reliable result because the real world time series data would be stranger (leading to larger float errors).

rdbisme · 2022-11-11T09:37:47Z

@qwhelan, maybe you have time to provide an opinion on this? :)

FIX: Keep the intermediate values in float64

5ffa5bf

rdbisme mentioned this pull request Jul 2, 2022

Keep the intermediate values in float64. #407

Closed

Merge branch 'master' into float64-intermediate

3048df4

rdbisme mentioned this pull request Feb 23, 2024

[BUG] bottleneck gives erroneous standard deviation in Pandas with float32 array. #443

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Keep the intermediate values in float64 #414

FIX: Keep the intermediate values in float64 #414

rdbisme commented Jul 2, 2022

rdbisme commented Jul 2, 2022

jkadbear commented Jul 6, 2022

jkadbear commented Jul 6, 2022

rdbisme commented Nov 11, 2022

FIX: Keep the intermediate values in float64 #414

Are you sure you want to change the base?

FIX: Keep the intermediate values in float64 #414

Conversation

rdbisme commented Jul 2, 2022

rdbisme commented Jul 2, 2022

jkadbear commented Jul 6, 2022

jkadbear commented Jul 6, 2022

rdbisme commented Nov 11, 2022