Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are fit_detectors thread-safe as seeing problem with returned anomalies #127

Open
joriws opened this issue Apr 22, 2021 · 3 comments
Open

Comments

@joriws
Copy link

joriws commented Apr 22, 2021

I fetch multiple timeseries data to Pandas DataFrame and validate_data and feed it to Pca_AD. Single threading serial execution worked fine, but with converting to threads to parallel execution on 3 parallel threads I get random result with anomalies-returned and drive to to_event casts TypeError. Plotting data is normal graph pattern and anomaly=anomalies plots normally, but to_events does not "complete". Between different runs different call to_events fails, like first dataset 3 then next run maybe 2 and 3 is ok. Third run could be that dataset 2 works but 1/3 are not.

I've tried also with threads.local() but it does not change anything. Without threading I did not observe this behaviour.

type is same for all
<class 'pandas.core.series.Series'>

pca_ad = PcaAD(k=k,c=c)
anomalies = pca_ad.fit_detect(pdata)
plot(pdata, anomaly=anomalies, ts_linewidth=1, ts_markersize=2, anomaly_color='red', anomaly_alpha=0.3, curve_group='all', axes=axis)
try:
     for startano,endano in to_events(anomalies):
        ...
except TypeError:
        logging.error("cannot expand to_events\n{}".format(to_events(anomalies)))

When checking the output for logging.error for some reason there is no "freq"-parameter, anomaly data which has freq works well. Also non-working returns time stamps not time ranges.

Non-working

[Timestamp('2021-04-18 08:35:00+0000', tz='UTC'), Timestamp('2021-04-18 10:30:00+0000', tz='UTC'), Timestamp('2021-04-18 10:35:00+0000', tz='UTC'), Timestamp('2021-04-18 13:25:00+0000', tz='UTC'), 

Working structure.

[(Timestamp('2021-04-18 22:00:00+0000', tz='UTC', freq='5T'), Timestamp('2021-04-18 22:04:59.999999999+0000', tz='UTC', freq='5T')), (Timestamp('2021-04-18 22:25:00+0000', tz='UTC', freq='5T'), Timestamp('2021-04-18 22:34:59.999999999+0000', tz='UTC', freq='5T')), (Timestamp('2021-04-18 22:40:00+0000', tz='UTC', freq='5T'), Timestamp('2021-04-18 22:49:59.999999999+0000', tz='UTC', freq='5T')
@joriws
Copy link
Author

joriws commented Apr 22, 2021

pip show adtk
Name: adtk
Version: 0.6.2
Summary: A package for unsupervised time series anomaly detection
Home-page: https://github.com/arundo/adtk
Author: Arundo Analytics, Inc.
Author-email: None
License: Mozilla Public License 2.0 (MPL 2.0)
Location: c:\users\guest\appdata\local\packages\pythonsoftwarefoundation.python.3.9_qbz5n2kfra8p0\localcache\local-packages\python39\site-packages
Requires: numpy, scikit-learn, packaging, pandas, tabulate, matplotlib, statsmodels
Required-by:

@joriws
Copy link
Author

joriws commented Apr 22, 2021

Also tested with and no change of outcome:

for startano,endano in to_events(anomalies, freq_as_period=True, merge_consecutive=True):

@earthgecko
Copy link

Hi @joriws

From one user to another. This is probably not a adtk issue. Underlying pandas itself is not thread safe.
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#thread-safety

I hope this helps in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants