Kernel function extension #558

Srceh · 2022-07-18T10:08:24Z

This PR aims to increase the number of supported kernels and associated operations.

The objectives of the PR can be summarized as follows:

Allow different kernels to be built upon a base kernel class (i.e. periodic kernel).
Allow each kernel to select a subset of the feature dimensions, enabling different feature dimensions to be treated differently (i.e. use periodic kernel on time-related features and RBF on other numeric features).
Implement combined kernels by adding or product two or more different kernels.
Provide generic methods to initialise and train different kernel parameters as a whole set.
Allow the base and composite kernels to be serialised

The work can therefore be divided into the following stages:

Design and change the current kernel implementation, adding base classes for kernels and combined kernels.
Change the parameter initialisation and training behaviour among existing kernel-based detectors.
Adding docs and tutorial notebooks on the new kernels and usages.
Make additional testing on the introduced functions and procedures.
Ensure the serialisation for all introduced kernels is compatible with the existing framework.

Design notes

As inspired by multiple Gaussian Process libraries (i.e. GPy, GPytorch), a base kernel class is the typical implementation for supporting different built-in and user-defined kernel functions.

Concerning alibi-detect, we aim to support the following functionalities.
(1) provides a generalised template for all kernels. (initialisation, compute kernel value).
(2) gives unified management for relevant kernel parameters. (initialisation, inference, training, view and modify).
(3) allows the kernel value to be computed on specific feature dimensions.

While ideally, the above functions should be integrated into a single base class, we have decided to distribute them into separate parts given the current implementation of alibi detect.

On (1), at the moment, the base kernel class is kept minimum. It only inherits the backend nn module and provides a holder for kernel parameters parameter_dict. This dictionary will be helpful when the kernel has multiple parameters, as we can loop over the dictionary keys to operate on every parameter without explicitly call the name (i.e. sigma.)

On (2), previously, we hard-coded the initialisation, inference and training of the sigma parameter within the RBF kernel (and detectors). This behaviour is lifted now as a separate class called KernelParameter. This class is implemented as a wrapper over the corresponding backend variable (tf.Varaible, torch.nn.Parameter). The class includes init_fn, requires_grad, requires_init as its attributes. Therefore, each parameter can be separately inspected over these attributes and invoke the corresponding procedures. As a result, any manipulation over kernel parameters can now be moved outside of the kernel implementation and only requires to declare of an instance of KernelParameter when the kernel is written. On the other hand, such an implementation maintains the arguments and logistics of the previous RBF kernel therefore would require limited modification over existing detectors.

On (3), following the review comments from @ojcobb, coding the selection within the base class (hence each kernel) will result in duplication of codes and increase the complexity for the users to implement customised kernels. The solution now is to write a wrapper kernel DimensionSelectKernel that performs the selection before passing it to a given kernel. Following the conviction of GPy, we refer to the selection argument as active_dims.

…ements.

alibi_detect/cd/pytorch/lsdd.py

alibi_detect/cd/pytorch/context_aware.py

alibi_detect/utils/pytorch/kernels.py

ojcobb · 2022-08-08T12:50:39Z

alibi_detect/utils/pytorch/kernels.py

+class Periodic(BaseKernel):
+    def __init__(
+        self,
+        tau: torch.Tensor = None,


Would advocate for just having period as a more interpretable argument name here.

Given we call 'bandwidth' as 'sigma', I was wondering if it is better to keep all parameters as greek letters for consistency.

I would rather be in favour of renaming sigma to bandwidth (is this PR backwards incompatible without renaming?) and tau to period.

Those suggestions sounds good to me.

alibi_detect/utils/pytorch/kernels.py

ojcobb · 2022-08-08T13:35:10Z

alibi_detect/utils/pytorch/kernels.py

+            self.log_tau = nn.Parameter(torch.empty(1), requires_grad=trainable)
+            self.init_required = True
+        else:
+            self.log_tau = nn.Parameter(tau.log(), requires_grad=trainable)


If we are going to the effort to make parameters trainable we should probably make it possible to train one and not the other, particularly given the period will (almost always?) be fixed according to domain specific knowledge whereas one may wish to train the bandwidth.

The new parameter implementation should allow such behaviour. But at the moment for existing kernels, these are still specified by the global trainable argument. Guess the better solution here is to add a trainable option for each parameter.

alibi_detect/utils/pytorch/kernels.py

ojcobb · 2022-08-08T13:54:21Z

Nice work! Indeed the previous coupling to GaussianRBF got a bit nasty in places and is nice to have got rid of all that and extended with some new possibilities!

A few things that may be worth further consideration:

At the moment specifying parameters, their initialisation functions and trainability is a bit tricky. It seems currently they must be all trainable or all fixed. Moreover if some are specified and some aren't then they all get the initialisation function applied regardless. Is there something we can do to make this a bit cleaner?
Following on from the brief discussion we had on call -- I realise that using dunder methods to define the addition and multiplication of kernels syntactically raises the issue of making associated weights trainable. However given that the vast majority of use cases won't require trainable combinations it seems that perhaps a favourable approach is to proceed with the syntactic approach and then deal with trainability separately when desired. For example the deep kernel would be a lightweight class that defines a composite_kernel = w_a * kernel_a + w_b * kernel_b where w_a and w_b have been defined as trainable parameters.
I know we mentioned this before (and I commented above) but unless there's a compelling reason I'm not aware of we should try to remove the necessity to duplicate active_dims and feature_axis logic within each kernel.

ascillitoe · 2022-08-08T16:00:01Z

Self-requesting a review for this since I expect this PR will require moderate changes to the save/load functionality.

…parameter implementation for the general kernel class. (3) added an initial example notebook.

review-notebook-app · 2022-08-18T10:36:07Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

alibi_detect/cd/lsdd.py

alibi_detect/cd/pytorch/context_aware.py

alibi_detect/cd/tensorflow/context_aware.py

…1) modify the type of composite kernels as BaseKernel, and change the type signatures accordingly. (2) remove the feature dimension option in BaseKernel. (3) add specific tests on parameter inference. (4) remove numpy inputs from kernels with pytorch backend. (5) misc minor fixes following previous comments.

…ements.

…parameter implementation for the general kernel class. (3) added an initial example notebook.

…ow allows: (1) any sum and product with the direct add and multiply equation. (2) the dimension selection is built-in with the main class. (3) the deep kernel is also implemented with the new base class and the user can access it as a single composite kernel.

…r messages on unsupported operations. Also added new notebook on creating user-defined kernels for drift detectors.

…pe hint for various methods and attributes for better consistency.

…divide input types.

…(3) type hints and (4) misc fixes.

…1) modify the type of composite kernels as BaseKernel, and change the type signatures accordingly. (2) remove the feature dimension option in BaseKernel. (3) add specific tests on parameter inference. (4) remove numpy inputs from kernels with pytorch backend. (5) misc minor fixes following previous comments.

…or nested composite kernels.

…ove validation function into schema to allow full pedantic check. (3) make temp copies of FIELDS_TO_RESOLVE for composite kernel.

CLAassistant · 2024-05-07T14:13:48Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Srceh added 7 commits July 15, 2022 13:57

Initial kernel change for pytorch

b9043fd

Change torch kernel-based methods to support new kernel behaviours.

b5f48c5

Initial TF implementation added.

27be93b

Modify generic detector class and associated tests.

e05ee05

Fixed prediction behaviour for torch gpu with new base kernel.

753ba72

Fixed feature dimension selection function.

8832bef

Added support to passing multiple kernel parameters. Doc string refin…

d53984f

…ements.

Srceh marked this pull request as ready for review August 8, 2022 09:55