Scalability to run large number of time-series without crashing #152

ekurniawan-ispt · 2022-04-05T04:55:40Z

Description

I get frustrated that it crashes or runs for days when running greater number time series such as 2500 time series.
I am referring to from_pandas_dynamic function (which is the Dynamic NOTEARS implementation in Causalnex)

Context

I love this package and I use it for small number of time series without issues.
However, recently I have been involved in a project with large number of time series, eg around 2500 time series to discover the causal relationships.

Possible Implementation

I suggest for the team look into the possibilities of implementing PySpark MLLib for Spark matrices and Spark dataframes.

Possible Alternatives

I have started implementing Granger Causal test using Pyspark Dataframe, but I still want to see Bayesian Network solution by Causalnex package.

Co-authored-by: philip_pilgerstorfer <philip.pilgerstorfer!@quantumblack.com>

GabrielAzevedoFerreiraQB pushed a commit to GabrielAzevedoFerreiraQB/causalnex that referenced this issue Jun 7, 2022

Fix cyclical import of plots (mckinsey#152)

88f23cd

Co-authored-by: philip_pilgerstorfer <philip.pilgerstorfer!@quantumblack.com>

oentaryorj added the enhancement New feature or request label Aug 25, 2022

oentaryorj assigned GabrielAzevedoFerreiraQB Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalability to run large number of time-series without crashing #152

Scalability to run large number of time-series without crashing #152

ekurniawan-ispt commented Apr 5, 2022

Scalability to run large number of time-series without crashing #152

Scalability to run large number of time-series without crashing #152

Comments

ekurniawan-ispt commented Apr 5, 2022

Description

Context

Possible Implementation

Possible Alternatives