Running time exponential in number of operators on stream #127

tfwnicholson · 2018-01-23T19:16:59Z

I am currently evaluating Spreads (and therefore implicitly F#) for potential use in our core business logic.

In trying to see how runtime scales with the number of operations on a stream, I am simple applying the lag operator to a series. My code is roughly as follows:

let quotes : Series<System.DateTime, float> = // loaded from file       

let rec lagByN n (series: Series<_,_>) =
    if n=0u then (series) else lagByN (n-1u) (series.Lag(1u))

let lagged = lagByN num_lag quotes
 
lagged_quotes.Fold(0.0, (+))

For different values of num_lag I notice an exponential scaling of run time.

Am I doing something stupid? Or is this an inherent problem with Spreads?

Thanks

The text was updated successfully, but these errors were encountered:

buybackoff · 2018-01-23T19:31:28Z

Why not just series.Lag(n)? What are you trying to achieve? A sum of all values or a running/moving sum?

You are calculating series.Lag(1).Lag(1).Lag(1).Lag(1)... n times.
Lag is very optimized for allocations and all possible method calls inlining, you are just facing algorithmic complexity by applying lag N times. Each new lag has to deal with a cursor from the inner series and this cursor becomes heavier and heavier recursively, that is why performance should drop faster than linear in this particular case with recursive lag application.

There is also moving window and moving sum (plus other simple stats) if you need this and not just lag. #108

tfwnicholson · 2018-01-23T19:45:49Z

It is a rather contrived example. We just want to see how speed scales with applying operators repeatedly on a sequence. Effectively building a large but simple graph of operations that resembles, in number of operations, a more complex real-world strategy.

buybackoff · 2018-01-23T19:58:47Z

Even chained series.Lag(1).Lag(1).Lag(1).Lag(1)... should have O(1) MoveNext() and the goal of the lib is to optimize for it (this is the real-time case when new data points arrive). But lag(100) should be as fast as lag(1). It should be possible to identify chained lags and fuse them, but it's not a common pattern to make it a priority. Have you compared the absolute performance numbers with Linq or any other library?

It is natural that the system takes more time to performs more operations. I would be surprise if performance drop is really exponential, could you share your numbers please? Or maybe the simple case n = 1 is so fast and the calculations are so trivial that virtual method calls and similar things start to play important role.

buybackoff · 2018-01-23T20:13:37Z

Actually lagged cursor is quite complex now and keeps track of current and lagged position, instead of just an offset by N, so it require two cursor moves. Nesting such lags will lead to exponential time growth by current construction.

CurrentValue

MoveNext implementation

It should be simple to create a Lag projection that just uses a single cursor with offset by N, I believe I had such implementation but refactored and optimized most of the cursor code logic and reimplemented projections in generic way at first. There are many optimization opportunities in terms of cursor moves design. I was focusing on allocations and method inlining first.

tfwnicholson · 2018-01-24T18:11:17Z

It seems to no just be lagged, getting the same exponential slowdown when implementing lag in terms of scan thus:
replacing series.lag(1u) above with lagged series where:

let lagged (series: Series<_, float>) =
    let somed_lagged = series.Map(Some).Scan((None, None), (fun (old, current) _ recent -> (current, recent))).Map(snd)
    somed_lagged.FilterMap((fun _ (x: float option) -> x.IsSome), (fun x -> x.Value))

Again, I know this is a really silly way of doing this, but just want to see how things scale with graph size.

We have currently compared this against building on top of Reactive.jl, which seems to behave non-exponentially.

buybackoff · 2018-01-24T18:17:16Z

Do you have absolute numbers for your benchmarks?

buybackoff · 2018-01-24T18:23:11Z

I really do not understand from your code what are you trying to achieve? How comes that Scan replaces Lag?

Also, since you are using the built-in F# Option type then performance is probably your last concern... Option is reference type, if you using it for numerous data points its overhead and allocations will be more visible than Spreads' ones.

tfwnicholson · 2018-01-24T18:53:27Z

Times are:
lag_size,time_in_seconds
1,0.065
2,0.064
3,0.064
4,0.064
5,0.064
6,0.065
7,0.066
8,0.066
9,0.070
10,0.077
11,0.087
12,0.124
13,0.193
14,0.377
15,0.752
16,1.706
17,3.488
18,7.744
19,17.289
20,36.141
21,90.253

As mentioned above we are "stupidly" implementing simple strategies that have the same graph size as eventual more complex strategies to see how Spreads performs and scale. Initially we wanted to test using the Lag operator of Series, but encountering the exponential nature of it, we decided to replace it with a Scan based version just for benchmarking.

buybackoff · 2018-01-24T19:22:44Z

Window is by far the most useful operator, have you tried it? I optimized it carefully first. Also Spreads is not only Rx, it has unique Zip feature I haven't seen in other places yet. Zip is the most useful thing for real irregular data with different time frames.

…

On Jan 24, 2018 9:53 PM, "TomNicholson" ***@***.***> wrote: Times are: lag_size,time_in_seconds 1,0.065s 2,0.064s 3,0.064s 4,0.064s 5,0.064s 6,0.065s 7,0.066s 8,0.066s 9,0.070s 10,0.077s 11,0.087s 12,0.124s 13,0.193s 14,0.377s 15,0.752s 16,1.706s 17,3.488s 18,7.744s 19,17.289s 20,36.141s 21,90.253s As mentioned above we are "stupidly" implementing simple strategies that have the same graph size as eventual more complex strategies to see how Spreads performs and scale.s Initially we wanted to test the Lag operator of Series, but encountering the exponential nature of it, we decided to replace it with a Scan based version just for benchmarking. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#127 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAd5fK92LPIlWxmmFUHbmEyJcrNqdD7oks5tN3wngaJpZM4RqK1d> .

buybackoff mentioned this issue Feb 5, 2018

CursorSeries implementation #108

Closed

42 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running time exponential in number of operators on stream #127

Running time exponential in number of operators on stream #127

tfwnicholson commented Jan 23, 2018

buybackoff commented Jan 23, 2018 •

edited

tfwnicholson commented Jan 23, 2018

buybackoff commented Jan 23, 2018 •

edited

buybackoff commented Jan 23, 2018

tfwnicholson commented Jan 24, 2018

buybackoff commented Jan 24, 2018

buybackoff commented Jan 24, 2018

tfwnicholson commented Jan 24, 2018 •

edited

buybackoff commented Jan 24, 2018 via email

Running time exponential in number of operators on stream #127

Running time exponential in number of operators on stream #127

Comments

tfwnicholson commented Jan 23, 2018

buybackoff commented Jan 23, 2018 • edited

tfwnicholson commented Jan 23, 2018

buybackoff commented Jan 23, 2018 • edited

buybackoff commented Jan 23, 2018

tfwnicholson commented Jan 24, 2018

buybackoff commented Jan 24, 2018

buybackoff commented Jan 24, 2018

tfwnicholson commented Jan 24, 2018 • edited

buybackoff commented Jan 24, 2018 via email

buybackoff commented Jan 23, 2018 •

edited

buybackoff commented Jan 23, 2018 •

edited

tfwnicholson commented Jan 24, 2018 •

edited