Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel function extension #558

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open

Conversation

Srceh
Copy link

@Srceh Srceh commented Jul 18, 2022

This PR aims to increase the number of supported kernels and associated operations.

The objectives of the PR can be summarized as follows:

  • Allow different kernels to be built upon a base kernel class (i.e. periodic kernel).
  • Allow each kernel to select a subset of the feature dimensions, enabling different feature dimensions to be treated differently (i.e. use periodic kernel on time-related features and RBF on other numeric features).
  • Implement combined kernels by adding or product two or more different kernels.
  • Provide generic methods to initialise and train different kernel parameters as a whole set.
  • Allow the base and composite kernels to be serialised

The work can therefore be divided into the following stages:

  • Design and change the current kernel implementation, adding base classes for kernels and combined kernels.
  • Change the parameter initialisation and training behaviour among existing kernel-based detectors.
  • Adding docs and tutorial notebooks on the new kernels and usages.
  • Make additional testing on the introduced functions and procedures.
  • Ensure the serialisation for all introduced kernels is compatible with the existing framework.

Design notes

As inspired by multiple Gaussian Process libraries (i.e. GPy, GPytorch), a base kernel class is the typical implementation for supporting different built-in and user-defined kernel functions.

Concerning alibi-detect, we aim to support the following functionalities.
(1) provides a generalised template for all kernels. (initialisation, compute kernel value).
(2) gives unified management for relevant kernel parameters. (initialisation, inference, training, view and modify).
(3) allows the kernel value to be computed on specific feature dimensions.

While ideally, the above functions should be integrated into a single base class, we have decided to distribute them into separate parts given the current implementation of alibi detect.

On (1), at the moment, the base kernel class is kept minimum. It only inherits the backend nn module and provides a holder for kernel parameters parameter_dict. This dictionary will be helpful when the kernel has multiple parameters, as we can loop over the dictionary keys to operate on every parameter without explicitly call the name (i.e. sigma.)

On (2), previously, we hard-coded the initialisation, inference and training of the sigma parameter within the RBF kernel (and detectors). This behaviour is lifted now as a separate class called KernelParameter. This class is implemented as a wrapper over the corresponding backend variable (tf.Varaible, torch.nn.Parameter). The class includes init_fn, requires_grad, requires_init as its attributes. Therefore, each parameter can be separately inspected over these attributes and invoke the corresponding procedures. As a result, any manipulation over kernel parameters can now be moved outside of the kernel implementation and only requires to declare of an instance of KernelParameter when the kernel is written. On the other hand, such an implementation maintains the arguments and logistics of the previous RBF kernel therefore would require limited modification over existing detectors.

On (3), following the review comments from @ojcobb, coding the selection within the base class (hence each kernel) will result in duplication of codes and increase the complexity for the users to implement customised kernels. The solution now is to write a wrapper kernel DimensionSelectKernel that performs the selection before passing it to a given kernel. Following the conviction of GPy, we refer to the selection argument as active_dims.

@Srceh Srceh marked this pull request as ready for review August 8, 2022 09:55
class Periodic(BaseKernel):
def __init__(
self,
tau: torch.Tensor = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would advocate for just having period as a more interpretable argument name here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we call 'bandwidth' as 'sigma', I was wondering if it is better to keep all parameters as greek letters for consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather be in favour of renaming sigma to bandwidth (is this PR backwards incompatible without renaming?) and tau to period.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those suggestions sounds good to me.

self.log_tau = nn.Parameter(torch.empty(1), requires_grad=trainable)
self.init_required = True
else:
self.log_tau = nn.Parameter(tau.log(), requires_grad=trainable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to the effort to make parameters trainable we should probably make it possible to train one and not the other, particularly given the period will (almost always?) be fixed according to domain specific knowledge whereas one may wish to train the bandwidth.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter implementation should allow such behaviour. But at the moment for existing kernels, these are still specified by the global trainable argument. Guess the better solution here is to add a trainable option for each parameter.

@ojcobb
Copy link
Contributor

ojcobb commented Aug 8, 2022

Nice work! Indeed the previous coupling to GaussianRBF got a bit nasty in places and is nice to have got rid of all that and extended with some new possibilities!

A few things that may be worth further consideration:

  1. At the moment specifying parameters, their initialisation functions and trainability is a bit tricky. It seems currently they must be all trainable or all fixed. Moreover if some are specified and some aren't then they all get the initialisation function applied regardless. Is there something we can do to make this a bit cleaner?
  2. Following on from the brief discussion we had on call -- I realise that using dunder methods to define the addition and multiplication of kernels syntactically raises the issue of making associated weights trainable. However given that the vast majority of use cases won't require trainable combinations it seems that perhaps a favourable approach is to proceed with the syntactic approach and then deal with trainability separately when desired. For example the deep kernel would be a lightweight class that defines a composite_kernel = w_a * kernel_a + w_b * kernel_b where w_a and w_b have been defined as trainable parameters.
  3. I know we mentioned this before (and I commented above) but unless there's a compelling reason I'm not aware of we should try to remove the necessity to duplicate active_dims and feature_axis logic within each kernel.

@jklaise jklaise self-requested a review August 8, 2022 14:02
@ascillitoe
Copy link
Contributor

Self-requesting a review for this since I expect this PR will require moderate changes to the save/load functionality.

@ascillitoe ascillitoe self-requested a review August 8, 2022 16:00
@ascillitoe ascillitoe added the WIP PR is a Work in Progress label Aug 10, 2022
@ascillitoe ascillitoe changed the title [WIP] Kernel function extension Kernel function extension Aug 10, 2022
…parameter implementation for the general kernel class. (3) added an initial example notebook.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

alibi_detect/cd/lsdd.py Outdated Show resolved Hide resolved
…1) modify the type of composite kernels as BaseKernel, and change the type signatures accordingly. (2) remove the feature dimension option in BaseKernel. (3) add specific tests on parameter inference. (4) remove numpy inputs from kernels with pytorch backend. (5) misc minor fixes following previous comments.
…parameter implementation for the general kernel class. (3) added an initial example notebook.
…ow allows: (1) any sum and product with the direct add and multiply equation. (2) the dimension selection is built-in with the main class. (3) the deep kernel is also implemented with the new base class and the user can access it as a single composite kernel.
…r messages on unsupported operations. Also added new notebook on creating user-defined kernels for drift detectors.
…pe hint for various methods and attributes for better consistency.
…1) modify the type of composite kernels as BaseKernel, and change the type signatures accordingly. (2) remove the feature dimension option in BaseKernel. (3) add specific tests on parameter inference. (4) remove numpy inputs from kernels with pytorch backend. (5) misc minor fixes following previous comments.
@Srceh Srceh changed the base branch from master to release/v0.11.0 February 3, 2023 14:57
@Srceh Srceh changed the base branch from release/v0.11.0 to master February 3, 2023 14:57
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP PR is a Work in Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants