Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type consistency #256

Open
Crown421 opened this issue Mar 19, 2023 · 6 comments
Open

Type consistency #256

Crown421 opened this issue Mar 19, 2023 · 6 comments

Comments

@Crown421
Copy link
Contributor

I have found this repo recently, and as I am integrating it into my code, I noticed that a lot of type information is lost.

I.e.

> eltype(Mean)
Any

which is surprising, given that Mean has a <:Number type parameter. I personally would expect that

> eltype(Mean(Float32))
Float32

Surprisingly other objects like FitNormal don't allow a type parameter, even though it is parametrized with V<:Variance, so one might expect something like

> tmp = FitNormal(Float32)
FitNormal{Variance{Float32, Float32, EqualWeight}}: n=0 | value=(0.0, 1.0)
> eltype(tmp)
Float32

to work.

I am not sure when I would have time to work on something like this, but I first wanted to open this issue, and see if the above would be a desired behaviour.

@joshday
Copy link
Owner

joshday commented Mar 20, 2023

I'm not sure what you mean by type info is lost. eltype is used primarily for iteration, which isn't defined (e.g. for i in Mean()... is an error)

To your second point, FitNormal(Variance(Float32)) works, but I suppose the shorter FitNormal(T) would be nice to have.

@Crown421
Copy link
Contributor Author

In my specific use case I am using EnsembleProblem from SciML and reducing the results with OnlineStats, as I want to compute a lot of trajectories in a way that doesn't blow up my RAM.

My current implementation returns a Vector{<:OnlineStat} for the trajectory (which may or may not be the best option, but we will see)

However, when constructing the solution object, a eltype(eltype(T)) happens, which makes the solution parametrized with Any, which is not great.

Long story short, I had

>eltype(eltype(Float64))
Float64

as reference for the behaviour I had been expecting, and was hence surprised.

@joshday
Copy link
Owner

joshday commented Mar 20, 2023

Hmm, okay.

Where is the eltype(eltype(T)) happening/why is that necessary? I'm trying to understand the use case since OnlineStats aren't iterable to begin with.

I'm not sure what a "trajectory" is in this context, but maybe you want to use value.(trajectory) instead of the stats directly?

@Crown421
Copy link
Contributor Author

Crown421 commented Mar 20, 2023

A trajectory in the ODE/ dynamical system sense, where one might have m states, each with dimension d.
This could be a scalar ODE, so each state would be a Number, or something higher dimensional, in which case each state is a Vector{<:Number}. The whole trajectory is then a Vector{<: Number} or a Vector{Vector{<:Number}
Now, for something like a SDE, each solution might be slightly different, and one wants summary statistics for a (large) collection of trajectories for the distribution of states at each time step.

The way I went about this is to have a Vector{<:OnlineStat}, i.e. by doing [FitNormal() for _ in 1:m] and add trajectories via broadcasting. Once the simulation is done, I can nicely get the values out by broadcasting mean.(..), cov.(...) or similar.

I suppose I could do this via Group, but it does not seem like there is a great constructor for large groups (but I might have missed something).
Even then, if I do something like

> g = Group(FitNormal(), FitNormal())
> fit!(g, rand(2))

I can't get the means out as easily as both mean.(g) and mean(g) don't work, so I have to go via value.

Further, even though Group is iterable, we again get

> eltype(g)
Any

This is sensible, since a group could contain anything, but in a case like this, where all stats in the group are the same, one might expect a more specific eltype.

Also comparing to Distributions:

> eltype(Distributions.Normal(2.f0))
Float32
> eltype(Distributions.MvNormal([2.f0, 3.f0]))
Float32

Given that FitNormal and Normal otherwise function quite similar, it is again surprising to see a difference here.

I think that eltypes are quite useful beyond iterating to indicate what kind of data is wrapped in an object.

@joshday
Copy link
Owner

joshday commented Mar 20, 2023

Thanks for the info!

I'll have to mull this over a bit since I'd rather not add methods to the OnlineStatsBase interface if I can avoid it.

@Crown421
Copy link
Contributor Author

I just took a stab at creating a convenience constructor (see #258), but stumbled over additional surprising behaviour.
First, the internal type of FitMvNormal is fixed to CovMatrix{Float64}, and second the fallback does not incorporate type information even when it can be specified (i.e. for FitNormal).

julia> m = FitNormal(Variance(Float32))
FitNormal: n=0 | value=(0.0, 1.0)

julia> typeof(value(m))
Tuple{Float64, Float64}

julia> for _ in 1:3
       fit!(m, rand(Float32))
       end

julia> m
FitNormal: n=3 | value=(0.482926, 0.478244)

julia> typeof(value(m))
Tuple{Float32, Float32}

I also note that

julia> typeof(m.v)
Variance{Float32, Float32, EqualWeight}

which suggests that it is possible to have a Float32 mean and a Float64 variance?

I have made an attempt to fix the above, let me know what you think.

On that note, I am using Float32/ Float64 as placeholders, that could also be replaced with any new user-defined type NewScalarNumberType <: Real. This might be quite interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants