Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with the documentation #67

Open
ClaudMor opened this issue Jan 9, 2021 · 4 comments
Open

Help with the documentation #67

ClaudMor opened this issue Jan 9, 2021 · 4 comments

Comments

@ClaudMor
Copy link

ClaudMor commented Jan 9, 2021

Hello,

I am interested in using your package, but I am not a domain expert in kde estimation or products of them.
From the ReadMe it is not clear to me what methods I may call on a BallTreeDensity. For example, I noted that calling rand on a BallTreeDensity like this:

extractions = randn(1000)
p = kde!(extractions )
rand(p)

actually works.

  1. Could you add to the ReadMe a list of the methods one may call on a BallTreeDensity?
  2. More specifically, what does the resample method do?
  3. When fitting a multivariate, does the kernel assume that the different dimensions are uncorrelated? If so, is there a way to relax this assumption?
  4. Is there a way to evaluate the pdf of a BallTreeDensity at a point, even when this point is not included in the dataset from which we fit the kde? I mean something KernelDensity.jl - like ( using the p from before):
pdf(p, 0.5) # evaluate the probability density of p at 0.5 ( even though 0.5 was not included in `extractions`)

Regarding question 4. , I saw this, but I didn't really understand.

Great package!

Thanks in advance

@dehann dehann added this to the v0.5.5 milestone Jan 10, 2021
@dehann
Copy link
Member

dehann commented Jan 10, 2021

Hi @claudio20497 ,

Thanks for posting and suggestions. I will add as soon I can, but in the mean time:

  • List of calls on BallTreeDensity should not be too long, mostly check list of export functions that take ::BallTreeDensity: https://github.com/JuliaRobotics/KernelDensityEstimate.jl/blob/master/src/KernelDensityEstimate.jl
  • resample takes any kde of M samples and then builds a new kde consisting of N sample number requested by user,
  • kernel bandwidths are "diagonals only", but there is an independent variance along each dimension. No covariance (off-diagonals) for individual kernel bandwidths. This is an improvement over original library that assumed a single isotropic bandwidth across all dimensions. Using only diagonals is not so bad since the idea is to approximate any pdf function -- i.e. adding together many 'hyper-ellipses' can still approximate an overall belief function of any orientation and weight distribution arbitrarily. Relaxing each kernel to a complete dense covariance matrix might become tricky (if i recall the internals correctly). I'd suggest experimenting with a simple minimum example and try 'break' the approximation in a way where more samples in the kde are not good enough and in a way where having off-diagonal kernel bandwidths would have helped. There might be performance arguments this way or that way. This package currently is more of a kde and product of kdes package, rather than drop in replacement for low mixture count Gaussian mixture model (GMM) -- i.e. if you have 5 modes in a belief, then use 100+ samples. The idea is to be computationally efficient with more samples rather than shoehorning weird beliefs into low count GMM models (at present, but not cast in stone).
  • Yes, it should be easy to evaluate the pdf anywhere:
# build a random kde object
X = kde!(randn(2,100))

# eval on the objection itself to get pdf density values
densities = X(5*rand(2,10))

# see plotting at KernelDensityEstimatePlotting.jl for examples

@ClaudMor
Copy link
Author

ClaudMor commented Jan 10, 2021

Hello @dehann ,

Thank you very much for the detailed answer.

So concerning point 3, if I understood correctly, if I sample from - say - a bivariate distribution of two correlated variables, and then call kde! on that sample, the resulting BallTreeDensity won't exhibit the correlation again, right?

EDIT: I did some experimenting:

using Distributions, KernelDensityEstimate, Plots
# generate correlated data
x = rand(Uniform(-10, 10), 1000)
y = x .^ 2
data = Array(hcat(x,y)')

# fit  a kde on them
p_corr = kde!(data )

# sample from the kde
sample_p_corr = rand(p_corr, 100)

# plot the data together with the sample
sorted_sample_p_corr = sample_p_corr[sortperm(sample_p_corr[:, 1]), :] 
sorted_data = data[sortperm(data[:,1]), :]

plot(sorted_data[:,1],sorted_data[:,2], lw = 3)
plot!(sorted_sample_p_corr[:,1], sorted_sample_p_corr[:,2] , lw = 3 )

# and note that they coincide very well

So the answer I think is: Yes, correlations are conserved.

@dehann
Copy link
Member

dehann commented Jan 11, 2021

Yes, correlations are conserved.

That's correct, the correlations are conserved. This remains true even though the individual kernel bandwidths that make up the kde use diagonal only values.

Also note KernelDesityEstimatePlotting package exists with useful function plotKDE and a variety of keyword options.

@ExpandingMan
Copy link

I just wanted to step in and mention that I discovered this package today, it looks quite nice, but I think lack of documentation is going to make it quite difficult for me to use. Even a link to a review of the algorithms involved would be enormously helpful, coming in cold it's very unclear what most of these methods are doing.

@dehann dehann modified the milestones: v0.5.5, shortlist Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants