Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise interface for dynamic-score and dynamic-simulated-score #290

Open
jotok opened this issue Jan 20, 2022 · 0 comments
Open

Revise interface for dynamic-score and dynamic-simulated-score #290

jotok opened this issue Jan 20, 2022 · 0 comments

Comments

@jotok
Copy link
Collaborator

jotok commented Jan 20, 2022

In RCF3, we plan to drop the DynamicScoringRandomCutForest class. This class added API methods to the base RandomCutForest class that allowed users to customize the anomaly scoring algorithms by directly passing lambdas to the method. The intention of these methods was to encourage experimentation with the anomaly score algorithm, but I believe these methods are too specialized and not used frequently enough to appear in the API of RandomCutForest. Users have always had the capability to create specialized scoring routines through the traverseForest method, but the signature on this method may be a little daunting to new users. I propose to split the difference and create a new API method that takes a VisitorFactory<Double> or Supplier<Visitor<Double>> and averages the results of visitors across trees.

Current method in DynamicScoringRandomCutForest and example usage:

public double getDynamicScore(double[] point, int ignoreLeafMassThreshold, BiFunction<Double, Double, Double> seen,
            BiFunction<Double, Double, Double> unseen, BiFunction<Double, Double, Double> damp);

public double getDynamicSimulatedScore(double[] point, BiFunction<Double, Double, Double> seen,
            BiFunction<Double, Double, Double> unseen, BiFunction<Double, Double, Double> damp,
            Function<IBoundingBoxView, double[]> vecSep);

double result = forest.getDynamicScore(point, 0, (x, y) -> 1.0 * (x + Math.log(y)), (x, y) -> 1.0 * x, (x, y) -> 1.0);

Proposed methods and example usage:

public double getAverageScalarResult(double[] point, VisitorFactory<Double> visitorFactory);

public double getAverageScalarResult(double[] point, Supplier<Visitor<Double>> supplier);

double result = forest.getAverageScalarResult(point, (tree, y) -> new DynamicScoreVisitor(
                tree.projectToTree(y), tree.getMass(), forest.getIgnoreLeafMassThreshold(), (x, y) -> 1.0 * (x + Math.log(y)), (x, y) -> 1.0 * x, (x, y) -> 1.0);

One issue, seen in the example, is that with the current implementation we have access to forest configuration fields like ignoreLeafMassThreshold. In the proposed implementation, this has to be passed into the closure, which fields a little awkward. We may also want to rethink VisitorFactory interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant