Broadcast had one job (e.g. broadcasting over iterators and generator) #18618

vchuravy · 2016-09-21T20:39:30Z

It was surprising to find that broadcast is not working with iterators

dict = Dict(:a => 1, :b =>2)
@show string.(keys(dict)) # => Expected ["a", "b"]
"Symbol[:a,:b]"

This is due to Broadcast.containertype returning Any

julia/base/broadcast.jl

Line 31 in 413ed79

containertype(::Type) = Any

leading to the fallback at:

julia/base/broadcast.jl

Line 265 in 413ed79

@inline broadcast_c(f, ::Type{Any}, a...) = f(a...)

Defining containertype to be Array for that iterator lead to problems with calling size on it, because broadcast doesn't check against the iterator interface iteratorsize(IterType).

map solves this with the fallback map(f, A) = collect(Generator(f,A)) which may be more sensible that the current definition of broadcast(f, Any, A) = f(A)

The text was updated successfully, but these errors were encountered:

stevengj · 2016-09-21T21:44:37Z

This is intentional. broadcast is for containers with shapes, and defaults to treating objects as scalars. map is for containers without shapes, and default to treating objects as iterators.

For example, broadcast treats strings as "scalars", whereas map iterates over the characters.

pabloferz · 2016-09-21T21:48:14Z

Maybe the problem is that people find the new dot syntax too convenient. There has been desire of having a compact way of expressing map in the past though. Unfortunately, the dot syntax is already taken.

pabloferz · 2016-09-21T21:53:47Z

Also, as @stevengj has pointed out before: there's got to be a difference between map and broadcast, if not, what is the point of having both.

vchuravy · 2016-09-21T22:22:50Z

@stevengj But Iterators do have shape (especially generators) http://docs.julialang.org/en/release-0.5/manual/interfaces/#interfaces

I would argue that iterators are in this awkward realm were most things that you would want to do with a container you also want to do with iterators, and yes maybe it is purely the fact that the . syntax is too convenient (and the error you get is very opaque).

@pabloferz The main difference between map and broadcast is the treatment of scalars. Now the definition of scalar is debatable and I would say that everything that has length(x) > 1 should not be considered a scalar.

bramtayl · 2016-09-22T02:40:30Z

Marking which arguments are to be treated as iterable, instead of the function call itself, would remove the ambiguity. I think?

pabloferz · 2016-09-22T07:02:03Z

For broadcast (I believe also in general) having shape, means having size (not just length) and being indexable. Except for tuples anything else without size is treated as scalar. Given the current implementation you first need getindex or being able to define one for the object you want to broadcast over. For iterators, that is not possible in general.

rfourquet · 2017-09-04T07:59:21Z

I ran into this too. Coming from #16769 where I look for a way to fill! an array with repeated evaluations of a function (instead of a fixed value), I thought dot-syntax may already do the trick. But, when a = zeros(2, 3); a .= [rand() for i=1:2, j=1:3] works, the (would be) cheaper a .= (rand() for i=1:2, j=1:3) doesn't; this generator is HasShape(), but has indeed no indexing capability. I'm very light in the understanding of how broadcast/dot-syntax works, but would it help here to have a trait for indexing capabilities? there is a PR (#22489) for that already...

stevengj · 2017-09-04T12:49:40Z

@rfourquet, you can do a = zeros(2, 3); a .= rand.()

rfourquet · 2017-09-04T13:03:33Z

Yes, but I should have be more precise: I want to use a function which gets the indices as parameters, like a .= (f(i, j) for i=1:2, j=1:3).

nalimilan · 2017-09-04T13:50:45Z

What would be the drawbacks of broadcasting dimensions of HasShape iterators? That sounds like a natural thing to do.

stevengj · 2017-09-05T11:28:55Z

@nalimilan, at first glance I think that would be reasonable and probably relatively easy to implement. Would be breaking so should be done by 1.0.

stevengj · 2017-10-19T19:50:59Z

One potential problem with this is that HasShape iterators don't necessarily support getindex, and that might make it tricky to implement?

oscardssmith · 2017-10-20T01:50:05Z

One possibility would be to temporarily (for 1.0) make simple implementation that just copied to an array. That would allow an optimization post 1.0

rfourquet · 2017-11-16T12:33:22Z

One potential problem with this is that HasShape iterators don't necessarily support getindex, and that might make it tricky to implement?

As I said above, I have a PR at #22489 to allow indexing into iterators, if this can help.

nalimilan · 2017-12-24T14:20:31Z

What needs to be done for 1.0 so that at least we can improve the behavior in 1.x?

rfourquet · 2017-12-24T16:12:38Z

Thanks @nalimilan for bringing that up, I wanted to do that as well. If allowing HasShape generators on the right hand side of broadcast expression is not possible to implement for 1.0, should we make this an error now, instead of treating generators as scalars? so that this can be enabled in 1.x.

JeffBezanson · 2017-12-28T21:06:25Z

👍 Triage recommends making this an error (the safe choice) or calling collect on it (if easy to do).

map treats all of its arguments as containers, and tries to iterate over all of them. In my ideal world, broadcast would be similar, and treat all of its arguments has having shapes that can be broadcast, and give an error if e.g. size isn't defined. I'll point out that any value can be treated as a scalar in broadcast by wrapping it with fill, yielding a 0-d array:

julia> fill("a")
0-dimensional Array{String,0}:
"a"

julia> fill([2])
0-dimensional Array{Array{Int64,1},0}:
[2]

nalimilan · 2017-12-28T21:30:28Z

Do you really suggest treating all scalars as containers by default? That doesn't sound very practical.

Looking at how we could either support any iterable, or just throw an error for them until we support them, it looks like we would need a way to identify iterators in BroadcastStyle. Currently that's not possible, since Base.iteratorsize returns HasLength even for scalars like Symbol. We could introduce a Base.isiterable trait (which could be useful for other things), or make Base.iteratorsize default to NotIterable (which would make sense too as having HasLength as a default always sounds a bit surprising, if harmless).

Sacha0 · 2017-12-28T22:21:46Z

(Tricky case for future discussion: UniformScaling.)

nalimilan · 2017-12-29T12:25:08Z

@timholy Since you've done the redesign of broadcast, any suggestions?

stevengj · 2017-12-29T15:48:32Z

@JeffBezanson, the whole point of broadcast is to be able to "broadcast" scalars to match containers, e.g. to do ["bug", "cow", "house"] .* "s" ----> ["bugs", "cows", "houses"]. This is fundamentally different from the behavior of map.

This is why broadcast treats objects as scalars by default, so that they can be combined with containers. Throwing an error for an unrecognized type would make it much less useful.

It should be possible to declare a particular type to be a container for broadcast by defining some appropriate method on it, but I think the default should continue to be to treat objects as scalars.

nalimilan · 2017-12-31T18:12:26Z

In an unrelated PR (#25339), @Keno suggested using applicable(start, (x,)) to find out whether x is iterable or not. Should we use the same approach here? I would find it clearer to have a more explicit definition of iterators (based either on Base.iteratorsize or on a trait), but using start also kind of makes sense.

StefanKarpinski · 2017-12-31T18:46:35Z

We could have an explicit trait which defaults to applicable(start, (x,)); that would allow overriding it if necessary.

mbauman · 2018-01-26T20:14:25Z

Jeff's proposal in #18618 (comment) has room to allow for broadcast treating Symbol and Char as "scalar" types — they just need to opt-in to the behavior. By default they would be an error, as they do not implement start.

The most compelling part here is that the only sensible definition for a "collection" type is that it is iterable. Yes, that means that strings are collections. Sometimes they are used as such! So let's default to the behavior that easily allows folks to opt-in to the other at the call site.

There is a wart here, though. Since numbers are iterable (they even HasShape), they'll be treated as zero-dimensional containers. That means that, taken to its logical conclusion, 1 .+ 2 != 3. It'd be fill(3, ()) instead.

bramtayl · 2018-01-26T20:38:39Z

EDIT: in an attempt to avoid derailing the thread further, moved to discourse:

https://discourse.julialang.org/t/lazycall-again-sorry/8629

nalimilan · 2018-01-26T22:34:30Z

Jeff's proposal in #18618 (comment) has room to allow for broadcast treating Symbol and Char as "scalar" types — they just need to opt-in to the behavior. By default they would be an error, as they do not implement start.

Yeah, my position is just based on the assumption that scalars are a more natural fallback, especially given that collections need to implement some methods (iteration, possibly indexing) while scalars are just "the rest" and have nothing in common. In the end any type will be able to implement any behavior it wants, but we should make it as convenient and logical as possible, which in particular should help avoiding inconsistencies (e.g. some types declared and behaving as scalars and others not).

I'm not overly concerned with having a few exceptions for essential types like strings and numbers, as long as the rules are clear for other types.

andyferris · 2018-02-09T00:23:16Z

I've been thinking a little bit about our iteration and indexing interfaces. I note that we can have useful objects which iterate (but can not be indexed), objects that can be indexed (but are not iterable), and objects that do both. Based on that, I wonder if:

map could be strongly tied to the iteration protocol - it seems valid that we can make a lazy out = map(f, iterable) for any arbitrary iterable such that e.g. first(out) is the same as f(first(iterable)), and it seems to me that this generic lazy operation could be useful.
broadcast could be strongly tied to the indexing interface - it seems valid that we can make a lazy out = broadcast(f, indexable) such that out[i] is the same as f(indexable[i]), and it seems to me that this generic lazy operation could be useful. Obviously broadcast with multiple inputs could still do all the fancy stuff it does now. For the purpose of broadcasting, scalars would be those things that can't be indexed (or index trivially like Number and Ref and AbstractArray{0}).

I do also feel that it would be desirable if one-argument map and one-argument broadcast do very similar things for collections which are both iterable and indexable. However, the fact that AbstractDict iteration returns different things than getindex seems to block a nice unification here. ☹️ (Our other collection types seem fine)

(To me, the fact that things like strings may have to be explicitly wrapped up like ["bug", "cow", "house"] .* ("s",) doesn't sound like a deal-breaker here. I have the same problem when I want to think of a 3-vector as being a "single 3D point" and it's not too hard to deal with (xref #18379)).

stevengj · 2018-02-09T13:19:26Z

I agree that broadcast should be for indexable containers, but I think that should be consecutively indexable, which excludes strings. e.g. collect(eachindex("aαb🐨γz")) gives [1, 2, 4, 5, 9, 11], which will play badly with any broadcast implementation based on indexing.

But being for indexable containers is essentially that containers need a trait to opt-in, which is basically what I've been advocating.

andyferris · 2018-02-09T22:25:10Z

I'm not sure consecutive indices is a good constraint - dictionaries will have arbitrary indices, for example.

However, broadcast(f, ::String) can't create a new String and guarantee that the output indices remain the same as the input indices since the UTF-8 character widths might change under f (it would have to turn into something like an AbstractDict{Int, Char} to make that guarantee, which really doesn't seem very useful!). I would almost say that the indices of a String are more like "tokens" for fast lookup rather than semantically-important indices (e.g. you can convert to an equivalent UTF-32 string and the indices would change).

I don't mind if we make the broadcasting behavior opt-in via trait; I'm just saying that imagining how a generic broadcast(f, ::Any) behaves is a good way of guiding the implementation of things like broadcast(f, ::AbstractDict) (and would naturally answer the question I raised in #25904, i.e. broadcast over dictionary values and not key-value pairs).

tkoolen · 2018-07-16T14:32:55Z

Are people truly happy with this change? I for one have never needed to broadcast over a container without a shape, while I broadcast over things that should be treated as scalars all the time. Every deprecation warning I 'fix' makes me shed a tear.

mbauman · 2018-07-16T15:41:38Z

I broadcast over things that should be treated as scalars all the time.

What are the types of those things?

tkoolen · 2018-07-16T15:59:13Z

Can be anything. For example, in a package that defines an optimization model type Model and a decision variable type Variable, you might have x::Vector{Variable} for which you want to get the values after solving the model model using a function value(::Variable, ::Model)::Float64. Previously, you could do that like:

value.(x, model)

It's also often the case that the argument types are from other packages, so adding a method to broadcastable for those types would be type piracy in that case. So you have to use Ref or a one-element tuple. That's not insurmountable, but it just makes the common case a lot less elegant in order to support a relatively obscure usage pattern, in my opinion.

mbauman · 2018-07-16T19:40:56Z

Yes, I see your point, and I do agree that it is annoying in situations like that. That said, the old behavior was absolutely problematic — it was one of those "the default fallback is definitively wrong in some cases" things.

In short, there are four options that avoid the incorrect fallback:

require everything to implement some method that describes how they broadcast
Default to treating things as containers and error/deprecate for non-containers.
- We will just try to iterate unknown objects and that will error for scalars
- There are two escape hatches for scalars — users can wrap them at a call site and library authors can opt-in to unwrapped scalar-like broadcasting.
Default to treating things as scalars and error for unknown containers
- Given that there are no relevant methods only defined for scalars, we'd have to assert that iterate throws a method error. That is slow and circuitous.
- There would only be one escape hatch available for custom containers to not error: their library authors to explicitly opt-in to broadcast. This seems quite backwards for a function whose primary purpose is to work with containers.
Check applicable(iterate, …) and switch behaviors accordingly
- This currently doesn't work due to the deprecation mechanism from start/next/done, and in general could be wrong for wrapper types that defer methods to a member.

Option 1 is worse for everyone, option 2 is the status quo, and option 3 is backwards, and option 4 is something we've never done before and likely to be buggy.

tkoolen · 2018-07-16T21:16:03Z

I guess some of the discussion must have happened behind the scenes, but I'm just not convinced by the arguments I've seen in this thread and in #25356 against nalimilan's and stevengj's positions.

There would only be one escape hatch available for custom containers to not error: their library authors to explicitly opt-in to broadcast. This seems quite backwards for a function whose primary purpose is to work with containers.

This is my main point of disagreement. To me it seems that in all of Julia code # of iterator types << # of types that should be treated as scalars in a broadcast situation < # of broadcast calls. So I'd prefer it if the number of times something 'extra' needs to be done scales with the number of iterator types, rather than with the number of broadcast calls. And if a library author defines an iterator, it's not completely unreasonable to ask them to define one more method, whereas it is completely unreasonable to ask every package author to define Base.broadcastable(x) = Ref(x) for all their non-iterable types in order to avoid ugly (IMHO) Refs in a high percentage of broadcast calls.

I know that having a single method to implement to define iteration is nice, but it's not that much work to implement one more for either a new trait, or for making it required to specify Base.iteratorsize for a new iterator (and getting rid of the problematic HasLength default). The fallback broadcastable method could then be based on that trait. Or, if you're really in love with defining iteration with a single method, you could (post-deprecation-removal) have that explicit trait default to applicable(iterate, ...) as in #18618 (comment), and simply override that default if need be. Corner cases like String could also still be handled by further specializing broadcastable if desired.

mbauman · 2018-07-16T22:11:41Z

That's effectively the design of 0.6, which led to this issue and #26421 and #19577 and #23197 and #23746 and possibly more — searching for this is hard.

It means that Base is providing a default fallback that is incorrect for a whole class of objects. That's why I prefer a mechanism that errors unless you opt in, one way or another. It's opinionated and the transition is a pain, but it forces you to be explicit.

You may be right that there are more "scalar-like" custom types than there are iterator-like ones, but I stand by the fact that broadcasting is first and foremost an operation on containers. I expect f.(x) to do some sort of mapping and not just be f(x).

And, finally, containers that get the scalar-default treatment simply aren't able to be used elementwise with broadcasting. For example, String is a collection type that we have special-cased to behave like a scalar; it's not possible to "reach in" and work elementwise, even though that would seem to make sense in some situations (e.g., isletter.("a1b2c3")). That's the asymmetry argument: you can more efficiently wrap containers in a Ref to treat them as scalars than you can collect them into an actually broadcastable collection.

Those are the main arguments. As far as the ugliness of Ref, I fully agree. A solution there is #27608.

tkoolen · 2018-07-17T08:13:57Z

Fair enough. I don't have any knockdown arguments or magic solutions to those problems, and #27608 will improve things.

ederag · 2018-08-09T15:14:33Z

@tkoolen I had the same concerns and use case.

@mbauman The arguments given above may not be fully convincing. Here are two questions to be more complete:

It would be possible to make broadcastable a required interface for any iterable.
This would be completely systematic, and would force developers to think about
how their iterator should behave under broadcast.
A recommendation to set it as collect(x) would make the transition relatively easy in most cases.
There would not be any performance loss, right ?
So it boils down to the will to have an error for f.(x) if x broadcasts as a scalar.
Why not a linter warning/error for f.(x, y, z), such as "all arguments of 'f' broadcast as scalars" ?

Anyway, it might be wise to fix #27563 (e.g. by #27608) and let users play with it a while before 1.0.
[0.7 and 1.0.0-rc1.0 were released without a fix].

StefanKarpinski · 2018-08-10T01:19:10Z

Anyway, it might be wise to fix #27563 (e.g. by #27608) and let users play with it a while before 1.0.
[0.7 and 1.0.0-rc1.0 were released without a fix].

I take it you missed the news that 1.0 has been released.

ederag · 2018-08-10T13:08:20Z

@StefanKarpinski Missed that, indeed. Congratulation to all developers, julia is amazing, keep on !

stevengj added the broadcast Applying a function over a collection label Sep 21, 2016

Sacha0 mentioned this issue Sep 26, 2016

Deprecate vectorized float methods in favor of compact broadcast syntax #18495

Closed

nalimilan added this to the 1.0 milestone Sep 5, 2017

JeffBezanson mentioned this issue Sep 20, 2017

make KeyIterator and ValueIterator more array-like #10092

Open

StefanKarpinski added the broadcast label Nov 20, 2017

rfourquet mentioned this issue Dec 5, 2017

WIP/RFC: random unleashed #24912

Closed

6 tasks

nalimilan added the triage This should be discussed on a triage call label Dec 24, 2017

vtjnash mentioned this issue Feb 5, 2018

WIP: deprecate iteration on bare Associative, introduce PairIterator. #25013

Closed

mbauman mentioned this issue Feb 6, 2018

Julep: broadcast with dictionaries #25904

Open

JeffBezanson removed the triage This should be discussed on a triage call label Feb 8, 2018

JeffBezanson mentioned this issue Feb 8, 2018

String should be officially immutable & "abc" === "abc" #22193

Closed

nalimilan mentioned this issue Feb 11, 2018

find* functions and pairs()/keys() #25999

Closed

This was referenced Feb 22, 2018

Implement broadcasting for AbstractDict and NamedTuple #26158

Open

RFC: Deprecate defaulting to scalar in broadcast #26212

Closed

vtjnash mentioned this issue Mar 8, 2018

map over a KeySet unexpectedly returns a Set #26359

Closed

JeffBezanson mentioned this issue Mar 12, 2018

Wrong output when composing length() with values() for Dictionaries #26421

Closed

mbauman mentioned this issue Mar 13, 2018

Deprecate defaulting to scalar in broadcast (take 2) #26435

Merged

mbauman closed this as completed in #26435 Mar 22, 2018

vtjnash mentioned this issue May 5, 2018

Notation for lazy map #19198

Open

anowacki mentioned this issue Aug 16, 2018

heatmap color gradient problem on 32 bit machine with GR JuliaPlots/Plots.jl#1665

Closed

mbauman mentioned this issue Jan 26, 2023

RFC: support iteration for CartesianIndex #48404

Open

Broadcast had one job (e.g. broadcasting over iterators and generator) #18618

Broadcast had one job (e.g. broadcasting over iterators and generator) #18618

Comments

vchuravy commented Sep 21, 2016

stevengj commented Sep 21, 2016 • edited

pabloferz commented Sep 21, 2016 • edited

pabloferz commented Sep 21, 2016 • edited

vchuravy commented Sep 21, 2016

bramtayl commented Sep 22, 2016 • edited

pabloferz commented Sep 22, 2016 • edited

rfourquet commented Sep 4, 2017 • edited

stevengj commented Sep 4, 2017

rfourquet commented Sep 4, 2017

nalimilan commented Sep 4, 2017

stevengj commented Sep 5, 2017

stevengj commented Oct 19, 2017

oscardssmith commented Oct 20, 2017

rfourquet commented Nov 16, 2017

nalimilan commented Dec 24, 2017

rfourquet commented Dec 24, 2017

JeffBezanson commented Dec 28, 2017

nalimilan commented Dec 28, 2017

Sacha0 commented Dec 28, 2017

nalimilan commented Dec 29, 2017

stevengj commented Dec 29, 2017

nalimilan commented Dec 31, 2017

StefanKarpinski commented Dec 31, 2017 • edited

mbauman commented Jan 26, 2018 • edited

bramtayl commented Jan 26, 2018 • edited

nalimilan commented Jan 26, 2018

andyferris commented Feb 9, 2018 • edited

stevengj commented Feb 9, 2018

andyferris commented Feb 9, 2018 • edited

tkoolen commented Jul 16, 2018

mbauman commented Jul 16, 2018

tkoolen commented Jul 16, 2018

mbauman commented Jul 16, 2018

tkoolen commented Jul 16, 2018

mbauman commented Jul 16, 2018 • edited

tkoolen commented Jul 17, 2018

ederag commented Aug 9, 2018

StefanKarpinski commented Aug 10, 2018 • edited

ederag commented Aug 10, 2018

stevengj commented Sep 21, 2016 •

edited

pabloferz commented Sep 21, 2016 •

edited

pabloferz commented Sep 21, 2016 •

edited

bramtayl commented Sep 22, 2016 •

edited

pabloferz commented Sep 22, 2016 •

edited

rfourquet commented Sep 4, 2017 •

edited

StefanKarpinski commented Dec 31, 2017 •

edited

mbauman commented Jan 26, 2018 •

edited

bramtayl commented Jan 26, 2018 •

edited

andyferris commented Feb 9, 2018 •

edited

andyferris commented Feb 9, 2018 •

edited

mbauman commented Jul 16, 2018 •

edited

StefanKarpinski commented Aug 10, 2018 •

edited