Negative mutual information after using shuffle (but correct trend) #17

qiongxiu · 2020-04-10T12:13:39Z

Dear Greg,

I am using npeet for estimating mutual information in distributed least squares problem, but it seems I often get negative mutual information even with the use of shuffle_test. Despite that, one interesting thing is that even most of the results are negative, the tendency seems right. As I attached in the figure, the blue line first increase and then converge, the red line is far away from blue line and then converge. This trend is what I expected, but I cannot explain the negative values, do you have any idea about this? Thanks in advance.

gregversteeg · 2020-04-12T20:30:03Z

I'm not sure I understand the example. What is the x-axis, and the difference between blue and red?

While mutual information should never be negative, the estimator can be negative. The reason is that estimating mutual information empirically with finite data has some bias, when we subtract out the bias we end up getting an estimator whose mean is correct, but sometimes gives negative answers. For many applications, it suffices to consider a negative MI estimate as zero.

gregversteeg · 2020-04-12T20:31:31Z

To add another point: the shuffle test is trying to estimate the bias of the mutual information estimator. It does so by shuffling data (to get a case that should have zero mutual information). The mutual information we get in the shuffled case is an estimate of the bias. This bias is then subtracted from our estimate on real data, sometimes leading to negative MI.

qiongxiu · 2020-04-13T05:53:46Z

x-axis denotes the iteration, the blue and red line denote two mutual information. In theory, the blue line should be more correlated than the red line at the first few iterations and then they should converge to the same result. The trend in this plot is correct.

It seems npeet has difficulty in distinguishing very small mutual information like difference of 10^{-5} and 10^{-2}. And I think npeet does distinguish -0.5 to be less correlated than -0.1, if I just set all negative mutual information as zero, then we cannot distinguish them. I am a bit confused on how to handling this result? simply set all the negative MI as zero will lose the distinguishability.

qiongxiu · 2020-04-13T05:55:22Z

I set N as 10000 for now, and I assume it is sufficient. Do you think I should increase the samples like 10^5?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative mutual information after using shuffle (but correct trend) #17

Negative mutual information after using shuffle (but correct trend) #17

qiongxiu commented Apr 10, 2020

gregversteeg commented Apr 12, 2020

gregversteeg commented Apr 12, 2020

qiongxiu commented Apr 13, 2020

qiongxiu commented Apr 13, 2020

Negative mutual information after using shuffle (but correct trend) #17

Negative mutual information after using shuffle (but correct trend) #17

Comments

qiongxiu commented Apr 10, 2020

gregversteeg commented Apr 12, 2020

gregversteeg commented Apr 12, 2020

qiongxiu commented Apr 13, 2020

qiongxiu commented Apr 13, 2020