-
Hello, I have a question about hi-dimension dataset on KDTree. I am using 'KDTree' from sklearn.neighbors. But, I am confused with the parameters.
This gives below.
In the 'https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/neighbors/_binary_tree.pxi', there is '_query_single_depthfirst'. This says that if the tree hits the leaf node, it does 'n_leaves += 1' and does the 'self.rdist'. And whenever the tree does the 'self.rdist', it does 'n_calls += 1'. So, i thought that each leaf node has leaf_size numbers of points(i.e, if leaf_size is 128, then each leaf node has 128 points). So, n_calls = leaf_size * n_leaves. But it is not. It seems that n_calls is 2 * leaf_size * n_leaves. This means that each leaf node has 2 * leaf_size numbers of points. This makes me confused. Do i mis-understand the meaning of leaf_size in kd_tree? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Although I am not familiar with kdtree, I will try to answer your question from the source code.
This for i in range(node_info.idx_start, node_info.idx_end):
dist_pt = self.rdist(pt,
&self.data[self.idx_array[i], 0],
self.data.shape[1]) In this case, each leaf node has two points ( you can see node info by following way: for i in range(tree.node_data.shape[0]):
info = tree.node_data[i]
if info["is_leaf"] == 1:
print(info)
break
> {'idx_start': 0, 'idx_end': 2, 'is_leaf': 1, 'radius': 2.432875397014594}
The # determine number of levels in the tree, and from this
# the number of nodes in the tree. This results in leaf nodes
# with numbers of points between leaf_size and 2 * leaf_size
self.n_levels = int(
np.log2(fmax(1, (n_samples - 1) / self.leaf_size)) + 1)
self.n_nodes = (2 ** self.n_levels) - 1 From the number of levels, we can estimate the number of the leaf node. |
Beta Was this translation helpful? Give feedback.
Although I am not familiar with kdtree, I will try to answer your question from the source code.
This
self.rdist()
is located inside a for-loop.In this case, each leaf node has two points (
idx_en…