Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running time exponential in number of operators on stream #127

Open
tfwnicholson opened this issue Jan 23, 2018 · 9 comments
Open

Running time exponential in number of operators on stream #127

tfwnicholson opened this issue Jan 23, 2018 · 9 comments

Comments

@tfwnicholson
Copy link

I am currently evaluating Spreads (and therefore implicitly F#) for potential use in our core business logic.

In trying to see how runtime scales with the number of operations on a stream, I am simple applying the lag operator to a series. My code is roughly as follows:

let quotes : Series<System.DateTime, float> = // loaded from file       

let rec lagByN n (series: Series<_,_>) =
    if n=0u then (series) else lagByN (n-1u) (series.Lag(1u))

let lagged = lagByN num_lag quotes
 
lagged_quotes.Fold(0.0, (+))

For different values of num_lag I notice an exponential scaling of run time.

Am I doing something stupid? Or is this an inherent problem with Spreads?

Thanks

@buybackoff
Copy link
Member

buybackoff commented Jan 23, 2018

Why not just series.Lag(n)? What are you trying to achieve? A sum of all values or a running/moving sum?

You are calculating series.Lag(1).Lag(1).Lag(1).Lag(1)... n times.
Lag is very optimized for allocations and all possible method calls inlining, you are just facing algorithmic complexity by applying lag N times. Each new lag has to deal with a cursor from the inner series and this cursor becomes heavier and heavier recursively, that is why performance should drop faster than linear in this particular case with recursive lag application.

There is also moving window and moving sum (plus other simple stats) if you need this and not just lag. #108

@tfwnicholson
Copy link
Author

It is a rather contrived example. We just want to see how speed scales with applying operators repeatedly on a sequence. Effectively building a large but simple graph of operations that resembles, in number of operations, a more complex real-world strategy.

@buybackoff
Copy link
Member

buybackoff commented Jan 23, 2018

Even chained series.Lag(1).Lag(1).Lag(1).Lag(1)... should have O(1) MoveNext() and the goal of the lib is to optimize for it (this is the real-time case when new data points arrive). But lag(100) should be as fast as lag(1). It should be possible to identify chained lags and fuse them, but it's not a common pattern to make it a priority. Have you compared the absolute performance numbers with Linq or any other library?

It is natural that the system takes more time to performs more operations. I would be surprise if performance drop is really exponential, could you share your numbers please? Or maybe the simple case n = 1 is so fast and the calculations are so trivial that virtual method calls and similar things start to play important role.

@buybackoff
Copy link
Member

Actually lagged cursor is quite complex now and keeps track of current and lagged position, instead of just an offset by N, so it require two cursor moves. Nesting such lags will lead to exponential time growth by current construction.

CurrentValue

MoveNext implementation

It should be simple to create a Lag projection that just uses a single cursor with offset by N, I believe I had such implementation but refactored and optimized most of the cursor code logic and reimplemented projections in generic way at first. There are many optimization opportunities in terms of cursor moves design. I was focusing on allocations and method inlining first.

@tfwnicholson
Copy link
Author

It seems to no just be lagged, getting the same exponential slowdown when implementing lag in terms of scan thus:
replacing series.lag(1u) above with lagged series where:

let lagged (series: Series<_, float>) =
    let somed_lagged = series.Map(Some).Scan((None, None), (fun (old, current) _ recent -> (current, recent))).Map(snd)
    somed_lagged.FilterMap((fun _ (x: float option) -> x.IsSome), (fun x -> x.Value))

Again, I know this is a really silly way of doing this, but just want to see how things scale with graph size.

We have currently compared this against building on top of Reactive.jl, which seems to behave non-exponentially.

@buybackoff
Copy link
Member

Do you have absolute numbers for your benchmarks?

@buybackoff
Copy link
Member

I really do not understand from your code what are you trying to achieve? How comes that Scan replaces Lag?

Also, since you are using the built-in F# Option type then performance is probably your last concern... Option is reference type, if you using it for numerous data points its overhead and allocations will be more visible than Spreads' ones.

@tfwnicholson
Copy link
Author

tfwnicholson commented Jan 24, 2018

Times are:
lag_size,time_in_seconds
1,0.065
2,0.064
3,0.064
4,0.064
5,0.064
6,0.065
7,0.066
8,0.066
9,0.070
10,0.077
11,0.087
12,0.124
13,0.193
14,0.377
15,0.752
16,1.706
17,3.488
18,7.744
19,17.289
20,36.141
21,90.253

As mentioned above we are "stupidly" implementing simple strategies that have the same graph size as eventual more complex strategies to see how Spreads performs and scale. Initially we wanted to test using the Lag operator of Series, but encountering the exponential nature of it, we decided to replace it with a Scan based version just for benchmarking.

@buybackoff
Copy link
Member

buybackoff commented Jan 24, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants