Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds PiecewiseLinearStretching #54

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

zygmuntszpak
Copy link
Member

Adds PiecewiseLinearStretching as a basis for addressing #33

Currently, I have implemented this under the old adjust_histogram umbrella. The intention is to refactor everything into adjust_intensities in subsequent pull requests.

The next steps involve adding Percentile and perhaps Quantile types to ImageCore to facilitate the implementation of some convenience functions described in #33 and other places.

f = PiecewiseLinearStretching(src_intervals = ((0.0 => 1.0),), dst_intervals = ((-1.0 => 0.0),))
ret = adjust_histogram(img, f)
@test eltype(ret) == eltype(img)
# @test minimum(ret) ≈ -1.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove this comment...

@codecov
Copy link

codecov bot commented Sep 18, 2021

Codecov Report

Merging #54 (87bee8a) into master (b3e7330) will increase coverage by 0.71%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #54      +/-   ##
==========================================
+ Coverage   95.94%   96.65%   +0.71%     
==========================================
  Files          12       14       +2     
  Lines         592      718     +126     
==========================================
+ Hits          568      694     +126     
  Misses         24       24              
Impacted Files Coverage Δ
src/ImageContrastAdjustment.jl 100.00% <100.00%> (ø)
src/algorithms/piecewise_linear_stretching.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3e7330...87bee8a. Read the comment docs.

Copy link
Member

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in many cases we only want to set breakpoints, it might also make sense to have the following equivalences:

# the following two are equivalent
PiecewiseLinearStretching((0.0, 0.2, 0.8, 1.0) => (0.0, 0.4, 0.9, 1.0))
PiecewiseLinearStretching(src_intervals = (0.0 => 0.2, 0.2 => 0.8, 0.8 => 1.0),
                          dst_intervals = (0.0 => 0.4, 0.4 => 0.9, 0.9 => 1.0))

# the following two are equivalent
# the min/max value of `img` is lazily get when the algorithm is applied
dst_intervals = (0.0, 0.4, 0.9, 1.0)
PiecewiseLinearStretching(nothing => dst_intervals)
PiecewiseLinearStretching(range(minimal(img), stop=maximum(img), length=length(dst_intervals)) => dst_intervals)

# the following two are equivalent
src_intervals = (0.0, 0.2, 0.8, 1.0)
PiecewiseLinearStretching(src_intervals => nothing)
PiecewiseLinearStretching(src_intervals => range(0.0f0, stop=1.0f0, length=length(src_intervals)))

The first might be the most useful. I'm not sure if we want the second and the third equivalence, I mention them here only to keeps consistent to our LinearStretching specification.

src/algorithms/piecewise_linear_stretching.jl Outdated Show resolved Hide resolved
src/algorithms/piecewise_linear_stretching.jl Outdated Show resolved Hide resolved
Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started this some time ago and got distracted by other duties, hopefully it's not redundant!

src/algorithms/piecewise_linear_stretching.jl Outdated Show resolved Hide resolved
then intensities that fall outside `src_intervals` are automatically set to the boundary
values of the `dst_intervals`. For example, given the combination `src_intervals = (0.1 =>
0.9, )` and `dst_intervals = (0.0 => 1.0)`, intensities below `0.1` are set to zero, and
intensities above `0.9` are set to one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pairs almost imply mapping. I almost think this is more naturally written as lower_endpoint = 0.1 => 0.0, upper_endpoint = 0.9 => 1.0. If you want to signify an interval, how about using IntervalSets.jl?

Or what about src_knots and dst_knots and just have each be an AbstractVector where src[i] -> dst[i]? You don't really have, or meaningfully interpret, gaps between your intervals, so having to double-declare all the interior points is extra work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps IntervalSets.jl is the best option for the implementation, with a view to providing a "knots-based" constructor for convenience as you have both pointed out. I'm not sure that a solely knots-based representation is sufficiently general. For example, one may want to slightly brighten the dark parts of the image, and slightly darken the bright part of the image, leaving the rest unchanged.

I can imagine specifying this as:

PiecewiseLinearStretching(0.0..0.1 => 0.0..0.2, 0.1..0.8 => 0.1..0.8, 0.8..1.0 => 0.7..1.0)

Unless I misunderstood, the knots-based variant wouldn't work because the source knots are: 0, 0.1, 0.8, 1.0 but the destination knots are 0, 0.2, 0.1, 0.8, 0.7, 1.0 (different lengths).

A potential different way of specifying this might be to leave out the "identity" mapping 0.1..0.8 => 0.1..0.8

PiecewiseLinearStretching(0.0..0.1 => 0.0..0.2, 0.8..1.0 => 0.7..1.0)

However, as Johnny has quite pointed out, I currently don't handle the case where there are "gaps in the middle". Part of the issue is that I have introduced ambiguity in what it means to not specify an interval. In one case, pixels get "saturated" to the boundary values, whereas in other cases it is an implicit "identity" mapping.

Maybe the best way to handle this is to introduce a keyword argument that allows one to specify whether the edges should be saturated, and then internally explicitly specify the interval mapping that results in the saturation.

src/algorithms/piecewise_linear_stretching.jl Outdated Show resolved Hide resolved
else
is_inside_src_intervals = false
for (n, (A,B)) in pairs(f.src_intervals)
if (A <= val < B) || (A <= val <= B && n == N)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were using an Interval this would just be val ∈ iv. But as stated above I'm not sure that's the best user-facing interface anyway.

From a performance standpoint, you can order them and then use searchsortedfirst which does binary search. Probably not relevant for most cases (you probably don't have very many distinct intervals) but it seems "safe."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance standpoint, you can order them and then use searchsortedfirst which does binary search. Probably not relevant for most cases (you probably don't have very many distinct intervals) but it seems "safe."

I couldn't quite get that to work, since isless is not defined for open-closed intervals, i.e. (0.2, 0.5] in IntervalSets.jl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant a vector of edges, not of intervals. Though you can pass a custom comparator through the lt keyword to searchsortedfirst, e.g., searchsortedfirst(a, x; lt=(p, q) -> p.left < q.left) or something.

@zygmuntszpak
Copy link
Member Author

@timholy @johnnychen94
I think this implementation should cover what we need to facilitate generalised linear stretching. Please let me know if you think there is anything major I may have missed. Barring that, I plan to merge this pull-request in a couple of days.

Copy link
Member

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design now looks much more understandable to me without looking at the docstrings. I've added some more comments to simplify the constructor signature.

The implementation looks good to me, and the comprehensive tests are great. Hopefully, this is the last round review from me.

Comment on lines +164 to +253
function PiecewiseLinearStretching(; src_knots, dst_knots, saturate::Bool = true)
return PiecewiseLinearStretching(src_knots, dst_knots; saturate = saturate)
end

function PiecewiseLinearStretching(img::GenericGrayImage; src_knots, dst_knots, saturate::Bool = true)
return PiecewiseLinearStretching(src_knots, dst_knots, img; saturate = saturate)
end

function PiecewiseLinearStretching(src_knots::MinMax, dst_knots::T, img::GenericGrayImage; saturate::Bool = true) where T <: Union{Tuple{Vararg{Union{Real,AbstractGray}}}, AbstractVector, Percentiles}
img_min, img_max = minfinite(img), maxfinite(img)
if (img_min ≈ img_max)
throw(DomainError("The image consists of a single intensity, which gives rise to a degenerate source interval."))
end
if T <: Percentiles
return PiecewiseLinearStretching((img_min, img_max), dst_knots, img; saturate = saturate)
else
return PiecewiseLinearStretching((img_min, img_max), dst_knots; saturate = saturate)
end
end

function PiecewiseLinearStretching(src_knots::T, dst_knots::MinMax, img::GenericGrayImage; saturate::Bool = true) where T <: Union{Tuple{Vararg{Union{Real,AbstractGray}}}, AbstractVector, Percentiles}
img_min, img_max = minfinite(img), maxfinite(img)
if T <: Percentiles
return PiecewiseLinearStretching(src_knots, (img_min, img_max), img ; saturate = saturate)
else
return PiecewiseLinearStretching(src_knots, (img_min, img_max); saturate = saturate)
end
end

function PiecewiseLinearStretching(src_knots::AbstractVector, dst_knots::Tuple{Vararg{Union{Real,AbstractGray}}}; saturate::Bool = true)
return PiecewiseLinearStretching(tuple(src_knots...), dst_knots; saturate = saturate)
end

function PiecewiseLinearStretching(src_knots::Tuple{Vararg{Union{Real,AbstractGray}}}, dst_knots::AbstractVector; saturate::Bool = true)
return PiecewiseLinearStretching(src_knots, tuple(dst_knots...); saturate = saturate)
end

function PiecewiseLinearStretching(src_percentile::Percentiles, dst_percentile::Percentiles, img::GenericGrayImage; saturate::Bool = true)
if length(src_percentile.p) != length(dst_percentile.p)
throw(ArgumentError("The number of src percentiles and dst percentiles must match."))
end
src_knots = percentile(vec(img), collect(src_percentile.p))
dst_knots = percentile(vec(img), collect(dst_percentile.p))
return PiecewiseLinearStretching(tuple(src_knots...), tuple(dst_knots...); saturate = saturate)
end

function PiecewiseLinearStretching(src_percentile::Percentiles, dst_knots::T, img::GenericGrayImage; saturate::Bool = true) where T <: Union{Tuple{Vararg{Union{Real,AbstractGray}}}, AbstractVector}
if length(src_percentile.p) != length(dst_knots)
throw(ArgumentError("The number of src percentiles and destination knots must match."))
end
src_knots = percentile(vec(img), collect(src_percentile.p))
return PiecewiseLinearStretching(tuple(src_knots...), dst_knots; saturate = saturate)
end

function PiecewiseLinearStretching(src_knots::T, dst_percentile::Percentiles, img::GenericGrayImage; saturate::Bool = true) where T <: Union{Tuple{Vararg{Union{Real,AbstractGray}}}, AbstractVector}
if length(src_knots) != length(dst_percentile.p)
throw(ArgumentError("The number of src knots and destination percentiles must match."))
end
dst_knots = percentile(vec(img), collect(dst_percentile.p))
return PiecewiseLinearStretching(src_knots, tuple(dst_knots...); saturate = saturate)
end

function PiecewiseLinearStretching(src_knots::AbstractVector, dst_knots::AbstractVector; saturate::Bool = true)
return PiecewiseLinearStretching(tuple(src_knots...), tuple(dst_knots...); saturate = saturate)
end

function PiecewiseLinearStretching(src_knots::Tuple{Vararg{Union{Real,AbstractGray}}}, dst_knots::Tuple{Vararg{Union{Real,AbstractGray}}}; saturate::Bool = true)
if length(src_knots) != length(dst_knots)
throw(ArgumentError("The number of src and destination knots must match."))
end

if length(src_knots) < 2 || length(dst_knots) < 2
throw(ArgumentError("You need to specify at least two knots (the start and end of an interval)."))
end

# Promote the src_knots and dst_knots to a common type to address the case
# where someone might use a mixture of integers and floating point numbers
# when specifying the knots.
T₁ = promote_type(typeof.(src_knots)...)
T₂ = promote_type(typeof.(dst_knots)...)
T = promote_type(T₁, T₂)

if saturate
intervals = convert_knots_to_intervals_and_saturate(T.(src_knots), T.(dst_knots))
else
intervals = convert_knots_to_intervals(T.(src_knots), T.(dst_knots))
end

return PiecewiseLinearStretching(tuple(intervals...))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason we need these many methods to dispatch is that we have to dispatch on more than two argument types. I've noticed a lot of effort are paid here to carefully avoid dispatch ambiguities. How about defining a helper function that dispatches on only one type (see also https://docs.julialang.org/en/v1/manual/methods/#man-methods-orthogonalize):

# moving `img` to the constructor of `MinMax`/`Percentiles` unifies the dispatching for `to_intervals` and thus for the constructor of `PiecewiseLinearStretching`.
struct MinMax{AT<:AbstractArray}
    data::AT
end

struct Percentiles{T, AT<:AbstractArray}
    p::T
    data::AT
end

to_intervals(knots::Tuple) = ...
to_intervals(knots::AbstractVector) = to_intervals(...) # maybe this isn't needed at all
to_intervals(knots::MinMax) = to_intervals(...)
to_intervals(knots::Percentile) = to_intervals(...)

This immediately simplifies the entire constructor dispatching to only one method:

function PiecewiseLinearStretching(src_knots, dst_knots; saturate)
    src_intervals = to_intervals(src_knots)
    dst_intervals = to_intervals(dst_knots)
    if saturate
        # I'm not sure if we can separate the saturate function apart.
        saturate_intervals!(src_intervals)
        saturate_intervals!(dst_intervals)
    end
    PiecewiseLinearStretching(tuple(intervals...))
end

Besides the simplification, another benefit is that it's possible to support mixed MinMax and Percentile:

PiecewiseLinearStretching(MinMax(img), Percentile(img); saturate=true)

Comment on lines +4 to +13
PiecewiseLinearStretching((0.0, 0.2, 0.8, 1.0), (0.0, 0.1, 0.7, 1.0) ; saturate = true)
PiecewiseLinearStretching([0.0, 0.2, 0.8, 1.0], [0.0, 0.1, 0.7, 1.0] ; saturate = true)
PiecewiseLinearStretching(; src_knots = (0.0, 0.2, 0.8, 1.0),
dst_knots = (0.0, 0.1, 0.7, 1.0), saturate = true)
PiecewiseLinearStretching(ClosedInterval(0.0, 0.2) => ClosedInterval(0.0, 0.1),
Interval{:open, :closed}(0.2, 0.8) => Interval{:open, :closed}(0.1, 0.7),
Interval{:open, :closed}(0.8, 1,0) => Interval{:open, :closed}(0.7, 1.0))
PiecewiseLinearStretching(Percentiles(1,99), (0,1), img::GenericGrayImage)
PiecewiseLinearStretching(Percentiles(1,50,99), (0, 0.5, 1), img::GenericGrayImage)
PiecewiseLinearStretching(MinMax(), (0,1), img::GenericGrayImage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my above suggestion is applied, then we can also simplify the signatures here:

    PiecewiseLinearStretching(src_knots, dst_knots)

and then explain what are valid knots values in later part.


Maybe PiecewiseLinearStretching(src_knots => dst_knots) is more illustrative? I'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that some simplification here might be good. It's a pretty daunting list of signatures and my initial reaction reading this would be to worry about what complexity I'm getting myself into.

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite remarkable and I hate complaining about something so nice, but hopefully these will be fairly easy to address. I think the only fundamental change in my review (@johnnychen94's has other really good suggestions) is the inferrability of intervals, which will likely have significant effects on the performance of this code.

Comment on lines +4 to +13
PiecewiseLinearStretching((0.0, 0.2, 0.8, 1.0), (0.0, 0.1, 0.7, 1.0) ; saturate = true)
PiecewiseLinearStretching([0.0, 0.2, 0.8, 1.0], [0.0, 0.1, 0.7, 1.0] ; saturate = true)
PiecewiseLinearStretching(; src_knots = (0.0, 0.2, 0.8, 1.0),
dst_knots = (0.0, 0.1, 0.7, 1.0), saturate = true)
PiecewiseLinearStretching(ClosedInterval(0.0, 0.2) => ClosedInterval(0.0, 0.1),
Interval{:open, :closed}(0.2, 0.8) => Interval{:open, :closed}(0.1, 0.7),
Interval{:open, :closed}(0.8, 1,0) => Interval{:open, :closed}(0.7, 1.0))
PiecewiseLinearStretching(Percentiles(1,99), (0,1), img::GenericGrayImage)
PiecewiseLinearStretching(Percentiles(1,50,99), (0, 0.5, 1), img::GenericGrayImage)
PiecewiseLinearStretching(MinMax(), (0,1), img::GenericGrayImage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that some simplification here might be good. It's a pretty daunting list of signatures and my initial reaction reading this would be to worry about what complexity I'm getting myself into.

When used with `adjust_histogram`, returns an image for which the intensity range was
partitioned into subintervals and mapped linearly to a new set of subintervals.

# Details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your writing and documentation shines for its clarity. It's a standard the rest of us can only aspire to.

That said, it's also long and the LaTeX notation does not read well in the REPL. Maybe make some of this # Extended help? See https://docs.julialang.org/en/v1/manual/documentation/#man-documentation (point 11). See also JuliaImages/ImageFiltering.jl#153.

intervals = convert_knots_to_intervals(src_knots, dst_knots)
lb = gray(typemin(eltype(src_knots)))
ub = gray(typemax(eltype(src_knots)))
lb = lb == -Inf ? lb = nextfloat(-Inf) : lb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lb = lb == -Inf ? lb = nextfloat(-Inf) : lb
lb = lb == -Inf ? nextfloat(lb) : lb

I would have guessed that the extra assignment would have been a bug, but in any case it's best to make this inferrable. Since -Inf32 == -Inf, just checking equality does not imply the same type. My implementation will always preserve the type of lb.

Likewise for ub.

end

function convert_knots_to_intervals(src_knots::NTuple{N, <:Union{Real,AbstractGray}}, dst_knots::NTuple{N, <:Union{Real,AbstractGray}}) where {N}
intervals = Vector{Pair{<:AbstractInterval, <: AbstractInterval}}(undef, N-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not inferrable. Have you profiled and looked at it with ProfileView? I bet you get pervasive red bars which dominate the runtime and which are traceable back to this line. Ideally isconcretetype(eltype(intervals)) would return true; failing that, a small Union of concrete types would allow union-splitting and make it possible for the compiler to generate efficient code. The limit is 3, and that might be enough for open/closed, closed/closed, closed/open were you to need all of those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not always sure how to interpret the ProfileView results. Here is the result I get with the current implementation:
pls_profile

with the red bar corresponding to the line

     for (n, (src, _)) in enumerate(f.intervals)

which presumably reflects the fact that the type of src is not stable, but can one of the various Interval types?

While I have

intervals = Vector{Pair{<:AbstractInterval, <: AbstractInterval}}(undef, N-1)

inside this function, after building up all the intervals I explicitly construct a tuple

 PiecewiseLinearStretching(tuple(intervals...))

thinking that in so doing the compiler would "magically" have more information to work with. If I understood correctly, the trouble is that the vagueness of

intervals = Vector{Pair{<:AbstractInterval, <: AbstractInterval}}(undef, N-1)

breaks inference and there is no information to propagate further?

Is the suggestion to switch from representing intervals as a tuple inside the PiecewiseLinearStretching struct, and define it as:

T1 = Pair{<: Interval{:open, :closed}, <: Interval{:open, :closed}}
T2 = Pair{<: Interval{:closed, :open}, <: Interval{:closed, :open}}
T3 = Pair{<: Interval{:closed, :closed}, <: Interval{:closed, :closed}}
intervals = Vector{Union{T1, T2, T3}}(undef, N-1)

I suppose this works for the knots instances, but limits full generality because one may want to include something like:

T4 = Pair{<: Interval{:closed, :closed}, <: Interval{:closed, :open}}

etc.

I've often come across a scenario where one may naturally wish to keep a vector of mixed types. It is a real pity there doesn't seem to be any clean way of dealing with such data structures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage I'm thinking of defining my own Interval type akin to

struct Interval{T} where T <: Number
  lb::T
  ub::T
  lb_open::Bool
  ub_open::Bool
end

abstract type AbstractBoundary end
struct Open{T}  <: AbstractBoundary where T<: Number
  val::T
end

struct Closed{T} <: AbstractBoundary where T <: Number
  val::T
end

and define constructors so that one can call

function Interval(lb::AbstractBoundary, ub::AbstractBoundary)
   if (lb isa Open) && (ub isa Open)
      return Interval(lb.val, up.val, true, true)
  elseif ...
  end
end

so that I can store a vector of my Interval types without the need for Unions. Usage would look something like

Interval(Open(0.2), Closed(0.4))
etc.

Copy link
Member

@timholy timholy Oct 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not always sure how to interpret the ProfileView results

Width = runtime cost. Height = nesting depth. A wide red bar with nothing on top of most of its width (no callee) means an operation was dominated by runtime dispatch; eliminate that runtime dispatch, and the bar will shrink down to the width of the callees (meaning the pink/green/blue stack of bars on top of that). As you can see that would be a huge performance gain.

thinking that in so doing the compiler would "magically" have more information to work with

No magic, it's all about predictability. How does the compiler know in advance which kind of Tuple will be created? It can't. Tuples are faster only when the type of each element is known at compile time. (You are using this as a bit of a function barrier, which is why you have a stack of well-inferred calls on top of the wide red bar, but in this case the operations behind the barrier run for much less time than the cost of the initial dispatch so the strategy isn't working very well.)

What about

T1 = Pair{<: Interval{:open, :closed}, <: Interval{:open, :closed}}
T2 = Pair{<: Interval{:closed, :open}, <: Interval{:closed, :open}}
T3 = Pair{<: Interval{:closed, :closed}, <: Interval{:closed, :closed}}
intervals = Vector{Union{T1, T2, T3}}(undef, N-1)

Still not concrete, you'd need the element type so that isconcretetype(T1) returns true. In other words, get rid of the <: in each pair and make sure each type inside is concrete.

It is a real pity there doesn't seem to be any clean way of dealing with such data structures.

If you have to you can use manual dispatch to overcome the limitations of automatic union-splitting:

if x isa T1   # T1 and all other "Ts" had better be concrete if you want this to be fast
    foo(x)    # fast compile-time dispatch: the compiler knows that x::T1 so can look up the method when compiling
elseif x isa T2
    foo(x)    # fast compile-time dispatch: the compiler knows that x::T2 so can look up the method when compiling
elseif x isa T3
    foo(x)    # etc
elseif x isa T4
    foo(x)
else
    foo(x)    # slow runtime dispatch: here the compiler has no idea what type x
end

All the ones inside an isa have the corresponding method of foo looked up at compile time (i.e., looked up once even if this is inside a loop) because the type of x is determined within that block of the conditional. The final one is not, and will be slow; moreover, if this is in a loop then you pay that price on each iteration of the loop vs looking it up at compile time and never paying the price at all. This is basically exactly the code that Julia generates for you when it does automatic union-splitting, but when you do it manually you can use more than 3 types if you wish.

The main limitation, as emphasized in a comment above, is that x isa T1 is a pointer-comparison if T1 is concrete (very fast), and has to invoke Julia's subtyping machinery if T1 is abstract (pretty slow):

julia> fconc(c) = c[] isa Float32
fconc (generic function with 1 method)

julia> fabs(c) = c[] isa AbstractFloat
fabs (generic function with 1 method)

julia> c = Ref{Any}(1.2f0)
Base.RefValue{Any}(1.2f0)

julia> @btime fconc($c)
  1.481 ns (0 allocations: 0 bytes)
true

julia> @btime fabs($c)
  15.658 ns (0 allocations: 0 bytes)
true

There are cases where you can still get a win even if T1 is abstract, but the only way to get close to concretely-inferrable performance is to use concrete types.

At this stage I'm thinking of defining my own Interval type

That seems like a really good idea. Having it be a single type has another advantage: less compilation time due to less specialization. Rather than making Open and Closed different types, you could have them both create instances of SingleBoundary{T} and build Interval out of two SingleBoundarys. That would improve inferrability even more.

I'd recommend including the option to create that "internal" interval type from the ones supported by IntervalSets. Even better might be to add your runtime-encoded boundary interval type to IntervalSets, so that lots of packages get the benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to write those explanations.

Rather than making Open and Closed different types, you could have them both create instances of SingleBoundary{T} and build Interval out of two SingleBoundarys. That would improve inferrability even more.

Apologies, but I'm not sure I understand your suggestion here. Are you suggesting something like this:

struct SingleBoundary{T}
  val::T
  is_open::Bool
end

SingleBoundary(val::AbstractBoundary) = SingleBoundary(val, val isa Open)

abstract type AbstractBoundary end
struct Open{T}  <: AbstractBoundary 
 boundary::SingleBoundary{T}
 function Open(val::T)
     new{T}(val, true)
 end
end

struct Closed{T} <: AbstractBoundary 
  boundary::SingleBoundary{T}
 function Closed(val::T)
    new{T}(val, false)
 end
end

struct Interval{T}
  lb::T
  ub::T
  function Interval(lb::AbstractBoundary, ub::AbstractBoundary)
     new{T}(lb.boundary, ub.boundary)
  end 
end

with usage

Interval(Open(0.2), Closed(0.4))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like

Interval(boundary(0.2, :open), boundary(0.4, :closed))

boundary(val, b::Symbol) = SingleBoundary(val, b == :open ? true : (b == :closed ? false : error("whoops"))

and just get rid of the Open and Closed types altogether. Using types merely to control dispatch is fine if you need extensibility or clarification of interpretation, but if it's mergely to set a Bool then you are better off not using the type system for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance I was primarily going for readability and ease of typing. Having the Open and Closed types seems to lead to a succinct constructor, especially if one is going to specify several intervals.

Maybe this is something a macro would address instead...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to introduce the types if it's just for user convenience. And since this is for interval-list construction and not used with each pixel, it's really not a big deal either way. Fixing the inference on the part that runs per-pixel is really all you need to do; feel free to make whatever choice you think best for the rest.

out[p] = val
else
is_inside_src_intervals = false
for (n, (src, _)) in enumerate(f.intervals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably recognize this is O(M*N), where M is the number of intervals and N the number of pixels. But if M will always be fairly small then this might actually be the fastest approach (a binary search is O(log(M)*N) but is slower when M is just a few).

Since your list of intervals is a tuple, it will be best suited for small M (you might experience long compile times otherwise). But an in-code comment about the tradeoffs might be a helpful guide to an external contributor who someday tries it for large M.

@zygmuntszpak
Copy link
Member Author

Thank you both for the excellent feedback. I'll have a stab at refactoring the code and ping you once I am done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants