You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Per my understanding TRCF is built for time series data. It looks reasonable to always update current model with latest data to adapt the latest data pattern, for example the increasing sales, the anomalies of high daily sales detected in last season may not be considered as anomaly if we update RCF model with this season's data(averagely higher daily sales). But for other use case, like stable error rate metric, no trending pattern, seems it will bring in noise if we update RCF with anomalies. For example, user trained an RCF model well with normal data. Then they want to keep the model immutable, just detect anomalies without updating the internal RCF model. Based on this understanding, should we support not update RCF model in TRCF process method?
Note: I'm not ML expert, even not understand the RCF in deep. Sorry if this is too naive and waste your time reading.
The text was updated successfully, but these errors were encountered:
"Note: I'm not ML expert, even not understand the RCF in deep. Sorry if this is too naive and waste your time reading."
No worries. It is a legitimate question. There are two view points of any software here :
As advertised value: in this case continuous ML, where models are updated continually, such a pipeline is not easy to implement correctly, and that is the value.
As desired by a specific user. In this case, stopping the continuous learning model on command. This action may go against the intended development of the software and ingrained assumptions may be violated.
In case of Apache 2.0 open source, there is an easy choice -- the repository should conform to (1) such that the collective does not have to worry about getting unintended artifacts/incorrect use. Individual consuming projects, who know what they are modifying, can/should modify the code to their liking.
In this case of RCFs (and any method that can be used on time series) the issue is not just stopping updates -- but restarting the updates some later time. One follows the other as night follows day. However such start/stop flexibility can break the premise behind RCF and iii can be confusing. On the other hand, even worse, would be that such start/stop works, but the real reason for that would be masked behind RCF.
I would recommend revisiting this later -- when the benefit (if any) of TRCF is already established and then we can discuss more nuanced options.
TRCF (ThresholdedRandomCutForest)
process
method will always update internalRandomCutForest
(Code ThresholdedRandomCutForest, Preprocessor).Per my understanding TRCF is built for time series data. It looks reasonable to always update current model with latest data to adapt the latest data pattern, for example the increasing sales, the anomalies of high daily sales detected in last season may not be considered as anomaly if we update RCF model with this season's data(averagely higher daily sales). But for other use case, like stable error rate metric, no trending pattern, seems it will bring in noise if we update RCF with anomalies. For example, user trained an RCF model well with normal data. Then they want to keep the model immutable, just detect anomalies without updating the internal RCF model. Based on this understanding, should we support not update RCF model in TRCF process method?
Note: I'm not ML expert, even not understand the RCF in deep. Sorry if this is too naive and waste your time reading.
The text was updated successfully, but these errors were encountered: