Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Serializable classes #44

Open
valera7979 opened this issue Mar 27, 2018 · 3 comments
Open

Non-Serializable classes #44

valera7979 opened this issue Mar 27, 2018 · 3 comments

Comments

@valera7979
Copy link

It would be nice to add serialization in classes. In particular, to save cluster models

@kno10
Copy link
Member

kno10 commented Mar 27, 2018

Actually I do not think there is much use in serialization of cluster models. They are not predictive models that you would "deploy" to a "production pipeline", like a classifier.

But I agree that in general, it would be nice to have efficient serialization support.
But this is a lot of very boring work, and we do not have volunteers to do this. So it is of very low priority and is likely not going to happen.

@valera7979
Copy link
Author

Thanks.
About of cluster models serialization. I worked on a task where I had to train a model and then in another task I compared the data with the model created earlier. Because there was no serialization, I had to save the points entering into clusters, and then restore the model to outliers detection. So I think the serialization of the cluster model is also useful.

@kno10
Copy link
Member

kno10 commented Apr 13, 2018

The difficulties with a general solution are that the clusterings do not have the data. They only have the object IDs. And these are not persistent.
So any serializer would likely have to "join" the clusters with the original data. At which point it becomes a huge blob to serialize, and for many applications you are much better off with just using your own serialization with exactly the format and data parts (coordinates, labels, identifiers such as file names - there could be arbitrary complex data associated with each object ID) that you need.
For many clustering algorithms, you do not have much more than the object IDs (except k-means, where you have cluster means). And this variability makes any generic serialization a real pain to design, and likely to break all the time.

@elki-project elki-project deleted a comment from jneelampalli Oct 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants