Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entropy calculation with non-probability vectors #769

Open
milankl opened this issue Feb 21, 2022 · 11 comments · May be fixed by #865
Open

Entropy calculation with non-probability vectors #769

milankl opened this issue Feb 21, 2022 · 11 comments · May be fixed by #865

Comments

@milankl
Copy link

milankl commented Feb 21, 2022

The current (v0.33.16) implementation of entropy(p)

function entropy(p)
if isempty(p)
throw(ArgumentError("empty collections are not supported since they do not " *
"represent proper probability distributions"))
end
return -sum(xlogx, p)
end
entropy(p, b::Real) = entropy(p) / log(b)

returns weird results if the argument isn't a probability-like vector (i.e. sum(p) != 1)

julia> entropy([0.5,0.5],2)       # correct
1.0
julia> entropy([0.5,0.25],2)      # this should throw an error?
1.0
julia> entropy([0.25,0.25],2)     # this too, or at least some notion of normalization of p?
1.0
julia> entropy([0.125,0.125],2)   # incorrect, no idea what's happening here
0.75
@milankl
Copy link
Author

milankl commented Feb 21, 2022

What about adding a Distributions.isprobvec(p) check in entropy to avoid these cases?

@nalimilan
Copy link
Member

nalimilan commented Apr 21, 2022

Yes we should probably check that the sum is one, with a check keyword argument to disable the check for performance. Throwing an error when the sum isn't one would be technically breaking, but maybe it's OK to do this in a patch release given that we would have returned nonsensical results anyway. See also #725.

Cc: @andreasnoack @matbesancon @devmotion @mschauer

@i9e1
Copy link

i9e1 commented May 12, 2023

I came accross this when calculating the entropy of single values (like entropy(8,2) = -24.0 Bit, nonsense). Currently my uncertainty model includes certainty and robust bounds like Union{Real, Interval, UnivariateDistribution}. For me a multiple dispatch fix works: entropy(x::Real) = 0.0.

A better way is LinearAlgebra.normalize-ing the input. This would yield correct results for singleton inputs (H=0) and vectors / collections of relative counts. For everything else someone has to explain the meaning of entropy in this context. (Interpret the [0.25, 0.25] as relative counts in some weird basis, like "1//4 per hour")

...
prop = normalize(p, 1)
return -sum(xlogx, prop) 

For probability vectors this changes nothing. At least when existing uses are consistent with documentation, this would be non-breaking.
Performance should be not so different from the Distributions.isprobvec(p) variant. check keyword is a fine idea.

@i9e1
Copy link

i9e1 commented May 13, 2023

Ok it's not that easy after playing with the tests.
If we don't want in-place normalization, the additional allocations (and thus performance) are an issue.

@nalimilan
Copy link
Member

I wouldn't normalize by default anyway as this could hide bugs (just like the current situation). Better require users to be explicit.

@milankl
Copy link
Author

milankl commented May 30, 2023

So what would people think about

isprobvec(p::AbstractVector{<:Real}) = all(x -> x  zero(x), p) && isapprox(sum(p), one(eltype(p)))

function entropy(p::AbstractVector{<:Real}; check::Bool = true) 
    check && (isprobvec(p) || throw(ArgumentError("Not a proper probability distribution")))
    return -sum(xlogx, p) 
end 
  
entropy(p, b::Real; check::Bool = true) = entropy(p; check) / log(b) 

If there's a thumbs up, I'd create a PR. This would yield

julia> entropy([0.5,0.5],2)   # as before
1.0

julia> entropy([1,1],2)    #  throws an error
ERROR: ArgumentError: Not a proper probability distribution

julia> entropy([1,1],2,check=false)     # but the check can be disabled
-0.0

julia> entropy([])    # throws an error as before, but because of dispatch constraints
ERROR: MethodError: no method matching entropy(::Vector{Any})

As you see I've taken isprobvec from Distribtions.jl as Distributions depends on StatsBase not vice versa, I don't know how you want to deal with this otherwise. Also as isprobvec is only defined for AbstractVector{<:Real} (does that make sense for a matrix??) I've included a similar constrain here for entropy.

@milankl
Copy link
Author

milankl commented May 30, 2023

I don't know whether the -0 here

julia> entropy([0,1])
-0.0

is usually just ignored, but I guess it would be more proper if did instead of return -sum(xlogx, p) just

return copysign(sum(xlogx, p),0)

As the sum(xlogx, p) has to be non-positive for a probability vector anyway, so it'll only affect the -0 case where 0 is returned instead?

@milankl
Copy link
Author

milankl commented May 30, 2023

On the argument constraint ::AbstractVector{<:Real}: I currently wonder about the meaning in these results

julia> entropy([1,1+im])    # complex probabilities
0.43882457311747564 - 1.131971753677421im

julia> entropy(rand(2,2))   # matrix input
1.0956946888140768

Not even in quantum mechanics are complex probabilities a thing afaik, and I'd usually also not interpret a matrix as a probabilitiy distribution, but maybe people want to calculate the entropy of a joint probability mass function? So maybe relaxing it to ::AbstractArray{<:Real} and turning isprobvec to isprobarray ?

@i9e1
Copy link

i9e1 commented May 31, 2023

There is no problem imho returning -0.0. It's a float so there are signed versions of zero, 0.0 and -0.0.

julia> iszero(0.0)
true

julia> iszero(-0.0)
true

julia> 0.0 == -0.0
true

julia> 0.0 === -0.0
false

For matrices it's common to interpret rows or cols as probabilities (e.g. as transition matrix for Markov Chains). There are definitions of row/right, col/left and doubly stochastic matrices, meaning rows or cols or both each sum up to one (= can be interpreted as probs).
In this context the user should better do something like:

isrowstochastic(M::AbstractMatrix) = all([ isprobvec(r) for r in eachrow(M) ])

H_rows = [ entropy(r) for r in eachrow(M) ]

Joint entropy, mutual information and so on should better be defined seperately.

@i9e1
Copy link

i9e1 commented May 31, 2023

Complex probabilities may be a thing complex probability and information theory. Entropy extensions are in Sec 8 but I dont get it.
Constrain argument on AbstractVector{<:Real} and wait for people smarter than me.

@milankl
Copy link
Author

milankl commented May 31, 2023

Joint entropy, mutual information and so on should better be defined seperately.

I agree, the entropy calculation her easily returns not what people would expect if we allowed matrices, because as you say you may interpret the row as a probability vector, or all entries together. To avoid this, and force the user to be more explicit, I'd then suggest to constrain types here to vectors so that you'd either need to do [entropy(x) for x in eachrow(M)] as you say or entropy(vec(M)) for a joint probability matrix.

@milankl milankl linked a pull request May 31, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants