Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data depth methods #9

Open
topepo opened this issue Mar 4, 2017 · 7 comments
Open

data depth methods #9

topepo opened this issue Mar 4, 2017 · 7 comments

Comments

@topepo
Copy link
Contributor

topepo commented Mar 4, 2017

You might consider adding some of Tukey's data depth methods. R has a few packages that you could wrap including ddalpha (see this paper gives a pretty good description of that).

@gdkrmr
Copy link
Owner

gdkrmr commented Mar 6, 2017

It is the first time I hear about this, sounds quite interesting! I gave it a quick read so please correct me if I am doing something wrong.

In the ddalpha package I need a training sample with already known classes to train a classifier, so it not unsupervised.

Did you think on something like this?

library(ddalpha)
library(rgl)

# example 1
ds <- depth.space.Mahalanobis(as.matrix(iris[1:4]), c(50, 50, 50))
plot3d(ds, col = as.numeric( iris[[5]]) )

# example 2
perm <- sample(150)
ds2 <- depth.space.Mahalanobis(as.matrix(iris[perm, 1:4]), c(50, 50, 50))
plot3d(ds2, col = as.numeric( iris[[5]][perm] ))

# example 3
clusters <- kmeans(scale(iris[1:4]), 3)
c.ord <- order(clusters$cluster)
ds3 <- depth.space.Mahalanobis(as.matrix(iris[c.ord, 1:4]), as.vector(table(clusters$cluster)))
plot3d(ds3, col = as.numeric( iris[[5]][c.ord]))

The first one is really cool, the second one not so much. One would have to supply a class vector as a parameter or some unsupervised classifier like kNN, as in the third example.

What do you think @topepo ?
Is there an entirely unsupervised version of this?

@topepo
Copy link
Contributor Author

topepo commented Mar 13, 2017

caret has a function that computes the distances of a new sample to the class centroids. I was thinking of something along the same lines although you could certainly just have an interface to generate the depths for all the data.

dimRed has a nice interface to other dimension reduction methods and (supervised or not) these metrics would be great to include. ddalpha is pretty good but I find the api more complex that I think it should be.

@gdkrmr
Copy link
Owner

gdkrmr commented Mar 14, 2017

I think something like

embed(data, "DataDepth", classes = cl, ...)

where classes can either be some vector with classes or a function that returns a vector of classes from the data should be possible. It could also accept some character vectors like "knn" that takes the number of classes from ndim and does some standard clustering.
I like the idea but it will probably take me a while to get to it (after v0.1.0) because I am busy with other stuff at the moment. If you want it in soon I would accept a pull request.
There should probably also be a predict function but I am not sure how this should look like, it will probably have to accept some additional arguments.

@topepo
Copy link
Contributor Author

topepo commented Mar 14, 2017

No problem on time.

For predict, you'll have to just save the original data (as you do in the other methods) and pass it as an argument to depth.X.

Also, I'll send you an invite to a repo that I'll be making public soon in case you are interested in what I've been doing in regards to my previous requests. I have some of the depth parts worked out already but your interface you be better than my do.call's

@gdkrmr
Copy link
Owner

gdkrmr commented Mar 15, 2017

The recipes are quite a nice idea. Why not simply make a dimRed recipe, this would be interesting because I did not really consider data preprocessing in my package?
One of the methods you might want to add is t-SNE, this method is very good for visualization of complex data structures. Also the R package Rtsne is based on a very efficient implementation which can be used for relatively large data which is not the case for Isomap and kPCA.

@topepo
Copy link
Contributor Author

topepo commented Mar 15, 2017

I've used t-SNE a lot (back when I used to actually analyze data for a living) and like it. However, I'm constrained to using methods where the projection can be applied to new data sets (based on estimates from the old/training data).

I didn't think to make a general dimRed step but did something similar for the depth methods. I'll put that on the list.

@gdkrmr
Copy link
Owner

gdkrmr commented Mar 15, 2017

t-SNE works by gradient descent and in theory one can hold the old points fixed and apply it to new points only but as far as I know no one implemented it. Here is a cool package for different SNE variants: https://github.com/jlmelville/sneer I think it is not on CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants