Why are extrema needed for ContinuousTerm? #222

matthieugomez · 2021-04-13T05:23:12Z

As I was trying to improve the performances of FixedEffectModels.jl, I noted that schema accounted for a substantial amount of time:

N = 10_000_000
df = (y = rand(N), x1 = rand(N), x2 = rand(N))
@time StatsModels.schema(@formula(y~x1+x2), df)
#  0.151931 seconds (49 allocations: 3.625 KiB)

The reason is that schema calls extrema on each term in the formula:

@time extrema(df.y), extrema(df.x1), extrema(df.x2)
#   0.127862 seconds (23 allocations: 1.422 KiB, 5.44% compilation time)

Is there a way to avoid computing these extrema? Why are they needed to begin with? (btw, calling extrema is slower than calling separately minimum and maximum (see JuliaLang/julia#31442)).

The text was updated successfully, but these errors were encountered:

matthieugomez mentioned this issue Apr 13, 2021

remove mean/var/min/max #223

Open

matthieugomez mentioned this issue Nov 30, 2023

Wrong results in a large data set with one set of FE FixedEffects/FixedEffectModels.jl#249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are extrema needed for ContinuousTerm? #222

Why are extrema needed for ContinuousTerm? #222

matthieugomez commented Apr 13, 2021 •

edited

Why are extrema needed for ContinuousTerm? #222

Why are extrema needed for ContinuousTerm? #222

Comments

matthieugomez commented Apr 13, 2021 • edited

matthieugomez commented Apr 13, 2021 •

edited