Entropy calculation with non-probability vectors #769

milankl · 2022-02-21T14:11:54Z

The current (v0.33.16) implementation of entropy(p)

Lines 760 to 768 in 071d10a

    
           function entropy(p) 
        
               if isempty(p) 
        
                   throw(ArgumentError("empty collections are not supported since they do not " * 
        
                                       "represent proper probability distributions")) 
        
               end 
        
               return -sum(xlogx, p) 
        
           end 
        
           entropy(p, b::Real) = entropy(p) / log(b)

returns weird results if the argument isn't a probability-like vector (i.e. sum(p) != 1)

julia> entropy([0.5,0.5],2)       # correct
1.0
julia> entropy([0.5,0.25],2)      # this should throw an error?
1.0
julia> entropy([0.25,0.25],2)     # this too, or at least some notion of normalization of p?
1.0
julia> entropy([0.125,0.125],2)   # incorrect, no idea what's happening here
0.75

The text was updated successfully, but these errors were encountered:

milankl · 2022-02-21T14:18:20Z

What about adding a Distributions.isprobvec(p) check in entropy to avoid these cases?

nalimilan · 2022-04-21T20:05:33Z

Yes we should probably check that the sum is one, with a check keyword argument to disable the check for performance. Throwing an error when the sum isn't one would be technically breaking, but maybe it's OK to do this in a patch release given that we would have returned nonsensical results anyway. See also #725.

Cc: @andreasnoack @matbesancon @devmotion @mschauer

i9e1 · 2023-05-12T16:42:45Z

I came accross this when calculating the entropy of single values (like entropy(8,2) = -24.0 Bit, nonsense). Currently my uncertainty model includes certainty and robust bounds like Union{Real, Interval, UnivariateDistribution}. For me a multiple dispatch fix works: entropy(x::Real) = 0.0.

A better way is LinearAlgebra.normalize-ing the input. This would yield correct results for singleton inputs (H=0) and vectors / collections of relative counts. For everything else someone has to explain the meaning of entropy in this context. (Interpret the [0.25, 0.25] as relative counts in some weird basis, like "1//4 per hour")

...
prop = normalize(p, 1)
return -sum(xlogx, prop)

For probability vectors this changes nothing. At least when existing uses are consistent with documentation, this would be non-breaking.
Performance should be not so different from the Distributions.isprobvec(p) variant. check keyword is a fine idea.

i9e1 · 2023-05-13T15:07:17Z

Ok it's not that easy after playing with the tests.
If we don't want in-place normalization, the additional allocations (and thus performance) are an issue.

nalimilan · 2023-05-28T11:28:31Z

I wouldn't normalize by default anyway as this could hide bugs (just like the current situation). Better require users to be explicit.

milankl · 2023-05-30T18:37:32Z

So what would people think about

isprobvec(p::AbstractVector{<:Real}) = all(x -> x ≥ zero(x), p) && isapprox(sum(p), one(eltype(p)))

function entropy(p::AbstractVector{<:Real}; check::Bool = true) 
    check && (isprobvec(p) || throw(ArgumentError("Not a proper probability distribution")))
    return -sum(xlogx, p) 
end 
  
entropy(p, b::Real; check::Bool = true) = entropy(p; check) / log(b)

If there's a thumbs up, I'd create a PR. This would yield

julia> entropy([0.5,0.5],2)   # as before
1.0

julia> entropy([1,1],2)    #  throws an error
ERROR: ArgumentError: Not a proper probability distribution

julia> entropy([1,1],2,check=false)     # but the check can be disabled
-0.0

julia> entropy([])    # throws an error as before, but because of dispatch constraints
ERROR: MethodError: no method matching entropy(::Vector{Any})

As you see I've taken isprobvec from Distribtions.jl as Distributions depends on StatsBase not vice versa, I don't know how you want to deal with this otherwise. Also as isprobvec is only defined for AbstractVector{<:Real} (does that make sense for a matrix??) I've included a similar constrain here for entropy.

milankl · 2023-05-30T18:49:10Z

I don't know whether the -0 here

julia> entropy([0,1])
-0.0

is usually just ignored, but I guess it would be more proper if did instead of return -sum(xlogx, p) just

return copysign(sum(xlogx, p),0)

As the sum(xlogx, p) has to be non-positive for a probability vector anyway, so it'll only affect the -0 case where 0 is returned instead?

milankl · 2023-05-30T18:59:13Z

On the argument constraint ::AbstractVector{<:Real}: I currently wonder about the meaning in these results

julia> entropy([1,1+im])    # complex probabilities
0.43882457311747564 - 1.131971753677421im

julia> entropy(rand(2,2))   # matrix input
1.0956946888140768

Not even in quantum mechanics are complex probabilities a thing afaik, and I'd usually also not interpret a matrix as a probabilitiy distribution, but maybe people want to calculate the entropy of a joint probability mass function? So maybe relaxing it to ::AbstractArray{<:Real} and turning isprobvec to isprobarray ?

i9e1 · 2023-05-31T21:03:01Z

There is no problem imho returning -0.0. It's a float so there are signed versions of zero, 0.0 and -0.0.

julia> iszero(0.0)
true

julia> iszero(-0.0)
true

julia> 0.0 == -0.0
true

julia> 0.0 === -0.0
false

For matrices it's common to interpret rows or cols as probabilities (e.g. as transition matrix for Markov Chains). There are definitions of row/right, col/left and doubly stochastic matrices, meaning rows or cols or both each sum up to one (= can be interpreted as probs).
In this context the user should better do something like:

isrowstochastic(M::AbstractMatrix) = all([ isprobvec(r) for r in eachrow(M) ])

H_rows = [ entropy(r) for r in eachrow(M) ]

Joint entropy, mutual information and so on should better be defined seperately.

i9e1 · 2023-05-31T22:35:27Z

Complex probabilities may be a thing complex probability and information theory. Entropy extensions are in Sec 8 but I dont get it.
Constrain argument on AbstractVector{<:Real} and wait for people smarter than me.

milankl · 2023-05-31T22:37:16Z

Joint entropy, mutual information and so on should better be defined seperately.

I agree, the entropy calculation her easily returns not what people would expect if we allowed matrices, because as you say you may interpret the row as a probability vector, or all entries together. To avoid this, and force the user to be more explicit, I'd then suggest to constrain types here to vectors so that you'd either need to do [entropy(x) for x in eachrow(M)] as you say or entropy(vec(M)) for a joint probability matrix.

milankl linked a pull request May 31, 2023 that will close this issue

entropy with isprobvec check #865

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entropy calculation with non-probability vectors #769

Entropy calculation with non-probability vectors #769

milankl commented Feb 21, 2022

milankl commented Feb 21, 2022

nalimilan commented Apr 21, 2022 •

edited

i9e1 commented May 12, 2023 •

edited

i9e1 commented May 13, 2023

nalimilan commented May 28, 2023

milankl commented May 30, 2023

milankl commented May 30, 2023

milankl commented May 30, 2023 •

edited

i9e1 commented May 31, 2023 •

edited

i9e1 commented May 31, 2023

milankl commented May 31, 2023

Entropy calculation with non-probability vectors #769

Entropy calculation with non-probability vectors #769

Comments

milankl commented Feb 21, 2022

milankl commented Feb 21, 2022

nalimilan commented Apr 21, 2022 • edited

i9e1 commented May 12, 2023 • edited

i9e1 commented May 13, 2023

nalimilan commented May 28, 2023

milankl commented May 30, 2023

milankl commented May 30, 2023

milankl commented May 30, 2023 • edited

i9e1 commented May 31, 2023 • edited

i9e1 commented May 31, 2023

milankl commented May 31, 2023

nalimilan commented Apr 21, 2022 •

edited

i9e1 commented May 12, 2023 •

edited

milankl commented May 30, 2023 •

edited

i9e1 commented May 31, 2023 •

edited