Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking string concatenation seriously, or, a proposal to deprecate *, ^ for string concatenation and repetition #11030

Closed
quinnj opened this issue Apr 27, 2015 · 180 comments
Labels
design Design of APIs or of the language itself domain:strings "Strings!" needs decision A decision on this change is needed
Milestone

Comments

@quinnj
Copy link
Member

quinnj commented Apr 27, 2015

The frequency and vehemency of discussions around this subject beg for a change. * and ^ were introduced for strings back when the language wasn't as strict on operator punning and overall meaning.

As a first, step, I propose we deprecate these two methods for string operations.

As a next discussion, we can talk about the possibility of using a different operator(s) for concatenation/repetition. Just using repeat, with no operator, has been suggested, as well as the following for string concatenation:

  • ++, as a general sequence concatenation operator
  • .., similar to Lua

Things to consider:

  • Would the same operator apply to other concatenation operations like vectors or multi-dimensional arrays? In that case, how would it interact with vcat/hcat?
@timholy
Copy link
Sponsor Member

timholy commented Apr 27, 2015

+1 for no infix operators at all. This subject attracts too much noise, and O(n) for a * b * c * d ... concatenation isn't good.

If there is discussion about alternatives, then +100 for moving it to the julia-infix-operator-debates mailing list.

@johnmyleswhite
Copy link
Member

+1 for no infix operators at all. This subject attracts too much noise, and O(n) for a * b * c * d ... concatenation isn't good.

+1 to that

@staticfloat
Copy link
Sponsor Member

If there is discussion about alternatives, then +100 for moving it to the julia-infix-operator-debates mailing list.

😆

@kmsquire
Copy link
Member

LOL. +1 on the julia-infix-operator-debates.

(I'll personally feel sad to see this use of * and ^ go...)

@staticfloat
Copy link
Sponsor Member

Stefan recently gave a nice, succinct explanation for why he wants to see them go, I'm going to quote it here:

My problem with * for string concatenation is not that people find it unexpected but that it's an inappropriate use of the * generic function, which is agreed upon to mean numerical multiplication. The argument that strings form a monoid is kind of thin since lots of things form a monoid and we're generally not using * for them. At the time I introduced * for strings, we were a lot less strict about operator punning – recall | and & for shell commands – we've gotten much stricter over time, which is a good thing. This is one of the last puns left in the standard library. The reason ++ would be better is not because it would be easier to learn (depends on where you're coming from), but because the ++ operator in Julia would unequivocally mean sequence concatenation.

Note that the operator punning is not an entirely academic concern. The Char corner case shows where the punning can cause problems: people might reasonably expect 'x' * 'y' to produce either "xy" or 241. Because of this, we just make both of these operations no method errors, but it would be perfectly reasonable to allow 'x' ++ 'y' to produce "xy". There's a lot less of a case for having 'x' * 'y' produce 241 or 'ñ', but the sequence concatenation operation does actually make sense.

@ssfrr
Copy link
Contributor

ssfrr commented Apr 27, 2015

@ssfrr
Copy link
Contributor

ssfrr commented Apr 27, 2015

I for one agree that ++ as a general sequence concat operator is clear and explicit, and agree that the Char example brought up by Stefan is a good example of where this simplifies things by disambiguating the user's intent.

@pao
Copy link
Member

pao commented Apr 27, 2015

I didn't see this before unleashing a rant on the unsuspecting lastest derailed mailing list thread, where I suggested julia-stringconcat in lieu of the (better) julia-infix-operator-debates. +INTMAX. Kill the infix operators.

@ScottPJones
Copy link
Contributor

I really think you should avoid using anything that already has a meaning (other than string concatenation) in other major languages in the world.
I spent too much time seeing bugs because of developers going back and forth between multiple languages that overused simple operators, which meant different things in different languages or
in different contexts.

  1. You need something that does not have another meaning for vectors, because people who
    do string processing for a living expect to be able to use strings as vectors of characters
    (and vice-versa). That would rule out +, *, ^, &, and ||.

  2. You need something that is not confusable to most programmers (not just the numerical
    computing world). That rules out [](empty array), <> (SQL and other languages).
    I think ++ would be a little confusable, but as it is a unary operator in C/C++/Java/etc.,
    and this would be a binary operator, I think that it would be fine.

  3. You need a simple infix operator, at least for concatenate, otherwise you'll get pasted with
    tons of virtual tomatoes by all of us who are doing string processing.

I'd vote for ++, it is used for concatenation in a reasonably popular language, i.e. Haskell,
it does evoke the idea of adding to strings together, i.e. concatenating them, and it does
not have any other meaning for vectors/arrays, and could be used as a general
vector/array concatenation operator, which is also good (per point 1 above)

@simonster
Copy link
Member

I don't think ++ as a general sequence concatenation operator is particularly clear. Does "abc"++[1, 2, 3] return:

  • "abc[1,2,3]"
  • "abc\x01\x02\x03"
  • ['a', 'b', 'c', 1, 2, 3]
  • ["abc", [1, 2, 3]]
  • A MethodError

If we're going to have a string concatenation operator, I'd rather it just be a string concatenation operator and nothing else. (Has anyone complained about the lack of an infix operator for other sequence concatenation operations?)

I'm also fine with not having a string concatenation operator, but the presence of such an operator in most other languages makes me wonder if I'd miss it if I were doing more string-heavy projects like web stuff. I'm fine with not having an infix operator if we decide we don't need it because interpolation tends to be more useful than concatenation, but if it's because numerical workflows don't do too much concatenation, I'd think twice.

@pao
Copy link
Member

pao commented Apr 27, 2015

Whether there should be a replacement is a decision that can be deferred. For once, can we keep a string concatenation-related issue narrowly defined?

@simonster
Copy link
Member

If we're going to introduce a replacement, I think it makes the most sense to deprecate * and introduce the replacement at the same time, so that people can actually use the replacement when they update their code.

@ScottPJones
Copy link
Contributor

@StefanKarpinski you'd also get the nice behavior of "mystring" ++ '\u2000', which very annoyingly doesn't work now with "mystring" * '\u2000'.

@simonstr, it makes sense to me, as somebody who spents most of their time with string processing...

a = Vector{UInt8}[1,2,3]
"abc" ++ a
[97, 98, 99, 1, 2, 3]

(if you combine a Vector with a string, (which is immutable), you'd much rather get back another mutable vector, you can always convert it to an immutable string with UTF8String later)

@pao
Copy link
Member

pao commented Apr 27, 2015

Then this issue will devolve into every other discussion about this ever. The community has already established that it is unable to handle the topic. It's the ultimate bikeshed and there are a lot of colors to choose from.

If I sound irritated by this, it's because I am. Here's my experience. "Hey, you can't glue strings together with +?" "Yeah, that's because we use *." "Oh, okay then." At which point I moved on with my life.

So no, I don't think we should discuss alternative infix operators in this issue, because we'll never make progress if we do.

@simonbyrne
Copy link
Contributor

Does "abc"++[1, 2, 3] return?

Obviously a NaN:

https://www.destroyallsoftware.com/talks/wat

@Keno
Copy link
Member

Keno commented Apr 27, 2015

To voice my opinion on the matter, I have used languages whose string concatenation operator was .,+, (space),_and++. When I started julia and learned that_was the concat operator, my first thought wascool, that makes sense, because I never really liked+. The one argument in favor of not using*I like is the one given by @StefanKarpinski about the ambiguity betweenCharas an integer andCharas a 1 character string. As such, it seems++as a concat operator is reasonable, though in that case we should give it clear semantics. The three options for generic++ (what it should do if the type is equal seems clear) that seem reasonable to me are:

++(x,y) = ++(string(x),string(y))
++(x,y) = #MethodError
++(x,y) = ++(promote(x,y)...)

Where promote promotes an appropriate container type. The last option would imply

x = Uint8[1,2,3]
"abc"++x == Uint8['a','b','c',1,2,3]

@ScottPJones
Copy link
Contributor

@Keno, I that's not correct, because 'a' is Char, a 32-bit type.
So, the answer would need to be either: UInt8[97, 98, 99, 1, 2, 3], or Char['a','b','c','\x01','\x02','\x03']

@FrancoisFayard
Copy link

I vote for ++

@ScottPJones
Copy link
Contributor

Actually, if you have a ASCIIString, it could promote to just UInt8[], but a UTF8String (as well as UTF16String and UTF32String) would need to promote to Char[].

@ScottPJones
Copy link
Contributor

(and that sort of promotion would be very useful for my string processing...)

@jiahao
Copy link
Member

jiahao commented Apr 27, 2015

This issue could be titled "Taking string concatenation seriously".

@carlobaldassi
Copy link
Member

the ambiguity between Char as an integer and Char as a 1 character string.

I'll just note that:

julia-0.4> Char <: Integer
false

julia-0.4> 'a' * 'b'
ERROR: MethodError: `*` has no method matching *(::Char, ::Char)
Closest candidates are:
  *(::Any, ::Any, ::Any)
  *(::Any, ::Any, ::Any, ::Any...)

so no, Char is not an integer, and hasn't been since a while in the 0.4 series, and therefore there's no ambiguity whatsoever. String * Char could perfectly well return the concatenated string, etc. That argument is just obsolete.

@IainNZ IainNZ changed the title Deprecate *, ^ for string concatenation, repetition Taking string concatenation seriously, or, a proposal to deprecate *, ^ for string concatenation and repetition Apr 27, 2015
@mbauman
Copy link
Sponsor Member

mbauman commented Apr 27, 2015

Please let's not subject ourselves to 200+ comments before we feel like it's been taken seriously enough.

Can someone just make a PR? I think everyone is in favor of deprecating *, ^ (if only to remove the mailing list bug). The ++ operator seems to be getting decent traction, but it's obviously tricky and not obvious to make it general. There are tricky semantics (similar to push! vs. append!), poor algorithmic complexity, and there's not a clear need for other iterables. So let's just make it work well for strings (and maybe chars) and call it a day.

@Keno
Copy link
Member

Keno commented Apr 27, 2015

@ScottPJones Sure, I was writing it that way for illustrative purposes, since Chars can convert to Uint8s if they are in range. Agreed on the UTF8String promotion problem.

@StefanKarpinski
Copy link
Sponsor Member

@jiahao: This issue could be titled "Taking string concatenation seriously".

LOL.

@carnaval
Copy link
Contributor

Anyone in for a batch order ?

@staticfloat
Copy link
Sponsor Member

I think I'd want one, but can I get it with ++ instead of *?

Okay, sorry. Continuing the injokes is fun, but let's stay focused. Let's try to come up with a bare minimum set of features that a PR could reasonably implement:

  • Deprecation of * and ^ for strings
  • Implementation of ++ for strings on strings

Anything that generalizes to other containers I think we can hash out inside the PR.

@ScottPJones
Copy link
Contributor

I want one with ++! 😀

@ScottPJones
Copy link
Contributor

@staticfloat 💯 👍

@ScottPJones
Copy link
Contributor

@PallHaraldsson The problem there is that both print and string do a lot more than just concatenation, they "stringify" their arguments... I'm not sure that that should be happening implicitly with a general concatenation operator. It doesn't happen with * currently when used as a string concatenation operator either.
BTW, you need to learn how to quote things here with Markdown... people here kindly showed me how to use triple-back quotes followed by julia around Julia code snippets, and put a blank line after quoting somebody with > and your comment.
i.e. something like:

@scottjones:

I never said that I wanted that operator to be used for scalars

is it a good (or bad) idea to exclude the numbers - that are already handled:

julia> print("Páll", 1.0, 1)
Páll1.01
julia> string("Páll", 1.0, 1)
"Páll1.01"

@kmsquire
Copy link
Member

@ScottPJones, unfortunately quoting doesn't seem to work when responding by
email, even if you edit after the fact (testing this here).

print("Hi GitHub!  Is this quoted")

@PallHaraldsson, do be careful how you write someone's name, though. You
actually pinged a different Scott Jones in your message, who probably was
confused to get a notification from you about julia.

In either case, using the GitHub interface, rather than replying by email,
does help with both of these things.

On Sat, Jun 13, 2015 at 5:24 AM, Scott P. Jones notifications@github.com
wrote:

@PallHaraldsson https://github.com/PallHaraldsson The problem there is
that both print and string do a lot more than just concatenation, they
"stringify" their arguments... I'm not sure that that should be happening
implicitly with a general concatenation operator. It doesn't happen
with * currently when used as a string concatenation operator either.
BTW, you need to learn how to quote things here with Markdown... people
here kindly showed me how to use triple-back quotes followed by julia
around Julia code snippets, and put a blank line after quoting somebody
with > and your comment.
i.e. something like:

@scottjones https://github.com/scottjones:

I never said that I wanted that operator to be used for scalars

is it a good (or bad) idea to exclude the numbers - that are already
handled:

julia> print("Páll", 1.0, 1)
Páll1.01
julia> string("Páll", 1.0, 1)"Páll1.01"


Reply to this email directly or view it on GitHub
#11030 (comment).

@ScottPJones
Copy link
Contributor

Ugh... I didn't know that @PallHaraldsson was responding via e-mail, nor that e-mail had that problem (I use the CodeHub app when I'm not at my laptop... it has it's own problems, but not that).
Yep, that's a very different Scott Jones... not even the Scott A. Jones who was an MIT grad student when I was an undergrad, who also lived in Arlington afterwards!

@scottjones
Copy link

LOL, yes, this was confusing! I’ve met “myself" a few times over the years, so this is not the first time that this kind of confusion has happened. :)

Cheers,

-Scott

On Jun 13, 2015, at 11:37 AM, Scott P. Jones <notifications@github.commailto:notifications@github.com> wrote:

Ugh... I didn't know that @PallHaraldssonhttps://github.com/PallHaraldsson was responding via e-mail, nor that e-mail had that problem (I use the CodeHub app when I'm not at my laptop... it has it's own problems, but not that).
Yep, that's a very different Scott Jones... not even the Scott A. Jones who was an MIT grad student when I was an undergrad, who also lived in Arlington afterwards!


Reply to this email directly or view it on GitHubhttps://github.com//issues/11030#issuecomment-111727642.

@ararslan
Copy link
Member

ararslan commented Sep 15, 2016

For what it's worth I actually really like * as string concatenation. For one, it matches the notation used in Computability, Complexity, and Languages by Davis et al. It also gives you juxtaposition concatenation for free (not that I've ever seen that used, it's just neat). I find myself using * all the time, and I've seen it used in a lot of other places, so I think the scale of the code churn for this deprecation would be massive, with (at least IMHO) little benefit.

@StefanKarpinski StefanKarpinski added needs decision A decision on this change is needed design Design of APIs or of the language itself labels Sep 15, 2016
@tkelman tkelman modified the milestones: 1.0, 0.6.0 Dec 29, 2016
@JeffBezanson
Copy link
Sponsor Member

I think we should just keep * for strings, but possibly add ++ later as a generic concatenation operator (which would support strings as well as other things).

@StefanKarpinski
Copy link
Sponsor Member

We may add ++ as a generic sequence concatenation operator in the future, but it seems like getting rid of * and ^ for strings isn't going to happen. I'll say that I'm no longer particularly concerned about "punning" on *, nor do I even actually think this is punning anymore – in abstract algebra, multiplication (represented as * or juxtaposition) is often used as a non-commutative group operation on things that aren't numbers. The main issues here were from the fact that previously Char <: Number but the * operation for Char was incompatible with * for Number. Now that Char is not a subtype of Nubmer, that's no longer a problem.

@smithb32
Copy link

smithb32 commented Sep 2, 2017

I would keep * for string concatenation for the original reason.

This is what Wikipedia says about regular expressions as algebraic operations:

Given regular expressions R and S, the following operations over them are defined to produce regular expressions:
(concatenation) RS denotes the set of strings that can be obtained by concatenating a string in R and a string in S. For example, {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}.
(alternation) R | S denotes the set union of sets described by R and S. For example, if R describes {"ab", "c"} and S describes {"ab", "d", "ef"}, expression R | S describes {"ab", "c", "d", "ef"}.
(Kleene star) R* denotes the smallest superset of set described by R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from set described by R. For example, {"0","1"}* is the set of all finite binary strings (including the empty string), and {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", … }.

In linear algebra, there is a unary operator, the adjoint operator, often denoted by *. In Julia as well as Matlab, the adjoint operator is give by a single quote ('), since * is generally used for multplication. So I would propose string operators in Julia to be (*,+,') for concatenation, alternation, and Kleene star respectively.

@morningkyle
Copy link

morningkyle commented Nov 3, 2017

As @stevengj pointed out, the argument between + and its competitors is about convention, not correctness. And the data @stevengj provided at above has clearly proved that + as a string concatenation operator is the most widely accepted convention in programming world (C++/C#/Java/Python/Javascript and many others). And all the other choices are apparently much less common, whether some people like it more or not.

Then the main reason I could think about keeping * is because deprecating it would break existing code like “abc” * “efg”. Could anyone explain what else + would break if used as a string concatenation operator in Julia, to help me understand the background better? (I understand string concatenation is not a commutative operation.)

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Nov 3, 2017

Nothing would break and if you really want it you can define + to do this. It is, however, just bad math. In algebra, + is always a commutative operation – and string concatenation is not commutative. You can see some of the confusion this causes since + does not lead to a natural way to repeat strings. Do you write "hi" * 5 or 5 * "hi"? Neither one really makes much sense. Compare this with * for concatenation where it's obvious that you should write "hi"^5. In any case, while we may introduce ++ for concatenation (including strings), we are not going to use + for string concatenation no matter how many languages may have chosen this syntax.

@Keno
Copy link
Member

Keno commented Nov 3, 2017

I propose using . for string concat to match PHP. With overloadable getfield, it'd even be trivially implementable:

getfield(x::String, y::String) = string(x, y)
"a"."b"."c"

We could use .. for repetition " "..5

@morningkyle
Copy link

morningkyle commented Nov 4, 2017

@ StefanKarpinski
Thanks for the explanation! I do get more sense about the background now.

(1) If * is used for string concatenation, then ^ would be a logical and natural operation of string repetition.
(2) If + is used for string concatenation, then * would be the logical result of string repetition.
For option (1), I agree ^ is an intuitive operator for string repetition. For (2), even * is the logical result, it might still not be intuitive enough (eg. “hi” * 3 == “hihihi”).

Do you write "hi" * 5 or 5 * "hi"?

No, string repetition (whatever operator it is) is not a frequent operation to me. But string concatenation is. If this is a general case, it seems replacing repetition operator with a named function (eg. repeat, already supported in Julia) makes sense.

And I realized creating new operators is really a disputable/dangerous thing, while supporting more APIs is usually welcome:)

Edit: Found another thread that introduced / and \ for strings. And this helps me understand better why * was chosen for string concatenation.

tpapp added a commit to tpapp/julia that referenced this issue Nov 6, 2017
Rationale: even though JuliaLang#11030
was closed, there is discussion of the issue from time to time on
Github and the forum; so if any question is frequently asked, this is.
tpapp added a commit to tpapp/julia that referenced this issue Nov 6, 2017
Rationale: even though JuliaLang#11030
was closed, there is discussion of the issue from time to time on
Github and the forum; so if any question is frequently asked, this is.
@chadagreene
Copy link

What? This long discussion, and nobody suggests the obvious 🐈 operator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself domain:strings "Strings!" needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests