Help with the documentation #67

ClaudMor · 2021-01-09T10:39:22Z

Hello,

I am interested in using your package, but I am not a domain expert in kde estimation or products of them.
From the ReadMe it is not clear to me what methods I may call on a BallTreeDensity. For example, I noted that calling rand on a BallTreeDensity like this:

extractions = randn(1000)
p = kde!(extractions )
rand(p)

actually works.

Could you add to the ReadMe a list of the methods one may call on a BallTreeDensity?
More specifically, what does the resample method do?
When fitting a multivariate, does the kernel assume that the different dimensions are uncorrelated? If so, is there a way to relax this assumption?
Is there a way to evaluate the pdf of a BallTreeDensity at a point, even when this point is not included in the dataset from which we fit the kde? I mean something KernelDensity.jl - like ( using the p from before):

pdf(p, 0.5) # evaluate the probability density of p at 0.5 ( even though 0.5 was not included in `extractions`)

Regarding question 4. , I saw this, but I didn't really understand.

Great package!

Thanks in advance

The text was updated successfully, but these errors were encountered:

dehann · 2021-01-10T03:22:17Z

Hi @claudio20497 ,

Thanks for posting and suggestions. I will add as soon I can, but in the mean time:

List of calls on BallTreeDensity should not be too long, mostly check list of export functions that take ::BallTreeDensity: https://github.com/JuliaRobotics/KernelDensityEstimate.jl/blob/master/src/KernelDensityEstimate.jl
resample takes any kde of M samples and then builds a new kde consisting of N sample number requested by user,
kernel bandwidths are "diagonals only", but there is an independent variance along each dimension. No covariance (off-diagonals) for individual kernel bandwidths. This is an improvement over original library that assumed a single isotropic bandwidth across all dimensions. Using only diagonals is not so bad since the idea is to approximate any pdf function -- i.e. adding together many 'hyper-ellipses' can still approximate an overall belief function of any orientation and weight distribution arbitrarily. Relaxing each kernel to a complete dense covariance matrix might become tricky (if i recall the internals correctly). I'd suggest experimenting with a simple minimum example and try 'break' the approximation in a way where more samples in the kde are not good enough and in a way where having off-diagonal kernel bandwidths would have helped. There might be performance arguments this way or that way. This package currently is more of a kde and product of kdes package, rather than drop in replacement for low mixture count Gaussian mixture model (GMM) -- i.e. if you have 5 modes in a belief, then use 100+ samples. The idea is to be computationally efficient with more samples rather than shoehorning weird beliefs into low count GMM models (at present, but not cast in stone).
Yes, it should be easy to evaluate the pdf anywhere:

# build a random kde object
X = kde!(randn(2,100))

# eval on the objection itself to get pdf density values
densities = X(5*rand(2,10))

# see plotting at KernelDensityEstimatePlotting.jl for examples

ClaudMor · 2021-01-10T09:22:42Z

Hello @dehann ,

Thank you very much for the detailed answer.

So concerning point 3, if I understood correctly, if I sample from - say - a bivariate distribution of two correlated variables, and then call kde! on that sample, the resulting BallTreeDensity won't exhibit the correlation again, right?

EDIT: I did some experimenting:

using Distributions, KernelDensityEstimate, Plots
# generate correlated data
x = rand(Uniform(-10, 10), 1000)
y = x .^ 2
data = Array(hcat(x,y)')

# fit  a kde on them
p_corr = kde!(data )

# sample from the kde
sample_p_corr = rand(p_corr, 100)

# plot the data together with the sample
sorted_sample_p_corr = sample_p_corr[sortperm(sample_p_corr[:, 1]), :] 
sorted_data = data[sortperm(data[:,1]), :]

plot(sorted_data[:,1],sorted_data[:,2], lw = 3)
plot!(sorted_sample_p_corr[:,1], sorted_sample_p_corr[:,2] , lw = 3 )

# and note that they coincide very well

So the answer I think is: Yes, correlations are conserved.

dehann · 2021-01-11T14:15:16Z

Yes, correlations are conserved.

That's correct, the correlations are conserved. This remains true even though the individual kernel bandwidths that make up the kde use diagonal only values.

Also note KernelDesityEstimatePlotting package exists with useful function plotKDE and a variety of keyword options.

ExpandingMan · 2021-04-26T17:15:48Z

I just wanted to step in and mention that I discovered this package today, it looks quite nice, but I think lack of documentation is going to make it quite difficult for me to use. Even a link to a review of the algorithms involved would be enormously helpful, coming in cold it's very unclear what most of these methods are doing.

dehann added this to the v0.5.5 milestone Jan 10, 2021

dehann added the documentation label Jan 10, 2021

dehann modified the milestones: v0.5.5, shortlist Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with the documentation #67

Help with the documentation #67

ClaudMor commented Jan 9, 2021 •

edited

dehann commented Jan 10, 2021 •

edited

ClaudMor commented Jan 10, 2021 •

edited by dehann

dehann commented Jan 11, 2021

ExpandingMan commented Apr 26, 2021

Help with the documentation #67

Help with the documentation #67

Comments

ClaudMor commented Jan 9, 2021 • edited

dehann commented Jan 10, 2021 • edited

ClaudMor commented Jan 10, 2021 • edited by dehann

dehann commented Jan 11, 2021

ExpandingMan commented Apr 26, 2021

ClaudMor commented Jan 9, 2021 •

edited

dehann commented Jan 10, 2021 •

edited

ClaudMor commented Jan 10, 2021 •

edited by dehann