Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializability of SVMs with user-defined/callable kernels #91

Open
till-m opened this issue Mar 28, 2022 · 7 comments
Open

Serializability of SVMs with user-defined/callable kernels #91

till-m opened this issue Mar 28, 2022 · 7 comments

Comments

@till-m
Copy link
Member

till-m commented Mar 28, 2022

As has been noted by @barucden in #88, SVMs with user-defined/callable kernels are generally not (de-)serializable. Since the issue has recently been brought up again in conjunction with downstream changes in JuliaAI/MLJLIBSVMInterface.jl#13 it would probably be worth having an issue one can reference to track the problem and collate discussion.

Current situation:

An SVM with a user-defined/callable kernel can be serialized and deserialized without problem, while the kernel function is available:

using LIBSVM
using Serialization

X = [-2 -1 -1 1 1 2;
     -1 -1 -2 1 2 1]
y = [1, 1, 1, 2, 2, 2]

kernel(x1, x2) = x1' * x2

model = svmtrain(X, y, kernel=kernel)

ỹ, _ = svmpredict(model, X)
print(y == ỹ) #true

serialize("serialized_svm.jls", model)

model = deserialize("serialized_svm.jls")
T = [-1 2 3;
     -1 2 2]

ŷ, _ = svmpredict(model, T)
print([1, 2, 2] == ŷ) #ŧrue

After exiting and re-entering REPL, kernel is undefined:

using LIBSVM
using Serialization

model = deserialize("serialized_svm.jls") #error

execution fails with

model = deserialize("serialized_svm.jls")
ERROR: UndefVarError: #kernel not defined
Stacktrace:
 [1] open(f::typeof(deserialize), args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Base ./io.jl:330
 [2] open
   @ ./io.jl:328 [inlined]
 [3] deserialize(filename::String)
   @ Serialization /usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:798
 [4] top-level scope
   @ REPL[3]:1

If kernel is defined at the time deserialize is called, the code works:

using LIBSVM
using Serialization

kernel(x1, x2) = x1' * x2

model = deserialize("serialized_svm.jls")
T = [-1 2 3;
     -1 2 2]

ỹ, _ = svmpredict(model, T)

print([1, 2, 2] == ỹ) #true

In contrast, serialization using built-in kernels works without a problem:

using LIBSVM
using Serialization

X = [-2 -1 -1 1 1 2;
     -1 -1 -2 1 2 1]
y = [1, 1, 1, 2, 2, 2]

model = svmtrain(X, y, kernel=Kernel.Linear)

ỹ, _ = svmpredict(model, X)

print(y == ỹ) #true

serialize("serialized_svm.jls", model)

After exiting and re-entering REPL:

using LIBSVM
using Serialization

model = deserialize("serialized_svm.jls")
T = [-1 2 3;
     -1 2 2]

ỹ, _ = svmpredict(model, T)

print([1, 2, 2] == ỹ) #true

Possible Courses

I don't have too much experience with Julia and Serialization.jl in particular, but I see a few ways of tackling this issue:

  • Leaving the current state, since there is no "misleading" behaviour. The error message seems pretty clear, at least to me.
  • Additionally, adding a note in the README mentioning that serialization doesn't work for user-defined/callable kernels (I haven't gotten around to Doc: add callable kernel example in README #89, will try to work on that over easter or slightly after)
  • Maybe it is possible to provide custom serialization strategies that allow us to properly serialize trained models with custom/user-defined kernels. This is probably not possible using Serialization.jl, since its functionality seems to be rather restricted, but JLD.jl can do it, I think?
@barucden
Copy link
Member

Thanks for this great summary!

I am not sure what the correct approach is.

Scikit-learn appears to just pass the model to a general serialization library. If that library is pickle, then the serialization fails.

JLD.jl has a way to ensure that modules containing the necessary definitions get imported before loading the serialized data. But... the user still needs to specify what modules should be loaded, and I don't think this is the way to go anyway. I think that users would tend to believe that everything was taken care of by LIBSVM, and that they can just conveniently load their model anytime, which is not true -- they would still need to maintain the same (or equivalent) kernel definitions in the specific module. So I feel that we should leave the responsibility completely to users (or downstream libraries). After all, LIBSVM.jl is only a thin wrapper for libsvm.

...which makes me wonder -- did we design the feature of callable kernels carefully enough in the first place? I mean -- it's just for convenience; the full, dense Gram matrix is computed anyway. What if we just provided a function that produces the Gram matrix given kernel function and kept only the precomputed kernel? This would, of course, be a step aside, and I am not sure how widely the callable kernel feature has been adopted by other packages already (for example, MLJ seem to address this only now?).

@ablaom
Copy link
Contributor

ablaom commented Mar 30, 2022

What if we just provided a function that produces the Gram matrix given kernel function and kept only the precomputed kernel?

Perhaps I'm missing something but wouldn't this restrict prediction? Don't we need to evaluate the kernel function for any new data on which we want to predict: k(x, x_i) for each new pattern x and each training pattern x_i?

For what it's worth, I doubt MLJ will move to support pre-computed kernels it's difficult to do without introducing API changes that would be very specific to SVM models. The pending PR referenced above will add callable kernel support, which I think will be much appreciated by our users.

@till-m
Copy link
Member Author

till-m commented Mar 31, 2022

the user still needs to specify what modules should be loaded

I see. I thought it might be possible to define serialization strategies for specific types/structs.

the full, dense Gram matrix is computed anyway.

This is actually not the case. For training its done this way (because it is necessary), but for predicting on new data its sparse, see here.

What if we just provided a function that produces the Gram matrix given kernel function and kept only the precomputed kernel?

That's certainly doable -- you could export the functions that construct the matrices, even the function that constructs the sparse kernel matrix. However, if you want to make use of a serialized model, that would mean that you would also need to provide the training data (I think this is what @ablaom was referring to). Thus, instead of making sure the kernel function is available -- which I guess you'd still have to do -- one additionally would have to ensure the training data is available. Considering the current state, i.e. with the error message being rather descriptive, I think this would be more complicated.

...which makes me wonder -- did we design the feature of callable kernels carefully enough in the first place?

Looking back now, I somewhat agree with you here: Considering that LIBSVM.jl is supposed to be a thin wrapper it would've probably been more appropriate to provide it upstream. Ultimately, this option still exists. However, since both functionality and the associated issue are quarantined away, i.e. they don't impact any other functionality of the package, I'm not sure it justifies the measure.

@barucden
Copy link
Member

@ablaom

Perhaps I'm missing something but wouldn't this restrict prediction? Don't we need to evaluate the kernel function for any new data on which we want to predict: k(x, x_i) for each new pattern x and each training pattern x_i?

You are not missing anything, you are completely right. My point is that (because I don't see an apparent, universal, totally satisfactory solution to this) maybe this is what higher-level libraries should take care of (or just the users of LIBSVM.jl) and not LIBSVM.jl. Libraries like MLJ could build the callable kernel feature upon the LIBSVM's precomputed kernel feature, picking the most suitable solution for them.

@till-m

I thought it might be possible to define serialization strategies for specific types/structs.

I am not an expert in any way here but I could not find a way to do that reliably.

For training its done this way (because it is necessary), but for predicting on new data its sparse

You are right that for prediction we evaluate the kernel only for some points. In that sense the matrix is sparse. However, when preparing SVMNodes to pass them to libsvm, we always work with full matrix (e.g., here). We save computation, not memory.

Our function could provide two methods

# for training, produce l×l matrix
gram(k, X::AbstractMatrix)
# for prediction, produce sparse l×n matrix, evaluate only at SV indices
gram(k, X::AbstractMatrix, svm::SVM)

Thus, instead of making sure the kernel function is available -- which I guess you'd still have to do -- one additionally would have to ensure the training data is available.

We would only need the support vectors, right? They are in the SVM structure, which can be serialized already (SVM is under our control, user-defined kernel functions are not).

@till-m
Copy link
Member Author

till-m commented Mar 31, 2022

We save computation, not memory.

I see, fair point.

We would only need the support vectors, right? They are in the SVM structure, which can be serialized already (SVM is under our control, user-defined kernel functions are not).

Yes and no, I think. Since, currently, if one provides the pre-computed kernel matrix the SVM never actually gets access to the features of the support vectors. The only thing that is stored in the pre-computed case are the indices of SVs, i.e. their position in the training set, but to calculate the kernel matrix for prediction one does need the actual features. But maybe I am missing some obvious solution to this problem -- everything I can think of either massively clutters the call signatures or the process of constructing the SVM, neither of which seems desireable.

Of course if the idea is just to provide these functions for a higher-level library such as MLJ it's not an issue since they would just store the actual data in their model. If that is the case, though, should we even provide functions for calculating the gram? What is the advantage over leaving that to the higher-level library as well?

@barucden
Copy link
Member

barucden commented Mar 31, 2022

The only thing that is stored in the pre-computed case are the indices of SVs.

True. I did not realize.

Of course if the idea is just to provide these functions for a higher-level library such as MLJ it's not an issue since they would just store the actual data in their model.

IMHO this is our responsibility.

I can see a clear and transparent way to do this, but it would require small refactor. We could model the kernels as structs, not an enum. Something like:

abstract type Kernel end
struct Linear <: Kernel end
struct Polynomial <: Kernel end
...
struct Precomputed <: Kernel 
   support_vectors #maybe keep the indices too
end

kernel_id(::Linear) = 0
kernel_id(::Polynomial) = 1
...
kernel_id(::Precomputed) = 4

Then we would update the conversion between SVM and SVMModel. Having kernels represented as structs would be beneficial anyway, since we could use dispatching instead of the ifs that we have now.

What do you think?

EDIT: After some more thinking, it would be a bigger change than I originally figured. So don't mind the above.

@till-m
Copy link
Member Author

till-m commented Apr 1, 2022

Let me just sum up my thoughts on the matter -- take this with a grain of salt as there is most likely some personal bias involved:
The original idea was based on convenience/simplicity, but also syntactic consistency with the way the users select specific built-in kernels -- this, I believe, has its place somewhere. Since the problem of serialization is strictly contained to callable kernels, I strongly believe the functionality should be either retained here or implemented in MLJ. I also believe that if we decide to remove it here, we should first make sure there exists a working implementation in MLJLIBSVMInterface.jl (or wherever appropriate).

@barucden
(Not sure your edit means you've abandoned the idea of modifying the way Kernel is implemented, so I'll just mention this as well): If we decide to make changes to the way kernels are modeled or the way Kernel.Precomputed works, we should make sure that the original way of using them stays available. In particular, we should ensure that people can still construct gram matrices however they want. Constructing the gram matrices is in all likelihood the most computationally expensive task when using SVMs, but depending on the kernel function there may be ways to construct the matrices more efficiently than the generic function can (see e.g. NormalizedKernel or ScaledKernel in KernelFunctions.jl).

In either case I am happy to defer to more experienced maintainers on these issues though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants