← Back to index | Original Bugzilla link

Bug 3827 – Warn against and then deprecate implicit concatenation of adjacent string literals

Status: RESOLVED
Resolution: FIXED
Severity: enhancement
Priority: P2
Component: dmd
Product: D
Version: D2
Platform: All
OS: Windows
Creation time: 2010-02-18T12:40:00Z
Last change time: 2017-01-07T03:02:11Z
Assigned to: nobody
Creator: bearophile_hugs

Attachments

ID	Filename	Summary	Content-Type	Size
571	strings.diff	patch for parse.c	text/plain	1033

Comments

Comment #0 by bearophile_hugs — 2010-02-18T12:40:31Z

import std.stdio; void main() { string[] a = ["foo", "bar" "baz", "spam"]; writeln(a); } This code prints: foo barbaz spam But probably the programmer meant to create an array with 4 strings. D has the ~ concat operator, so to prevent possible programming bugs it's better to remove the implicit concat of strings separated by whitespace. Everywhere the programmer wants to concat strings the explicit concat operator can be used: string s = "this is a very long string that doesn't fit in" ~ " a line"; The "Python Zen" has a rule that says: Explicit is better than implicit. The compiler can optimize the concat away at compile time. C code ported to D that doesn't put a ~ just raises a compile time error that's easy to understand and fix.

Comment #1 by aifgi90 — 2010-02-18T14:35:32Z

Created attachment 571 patch for parse.c Vote++ and patch

Comment #2 by bearophile_hugs — 2010-02-18T14:55:40Z

(In reply to comment #1) > Created an attachment (id=571) [details] > patch for parse.c > > Vote++ and patch Thank you. But is DMD doing the joining with ~ at compile time? If not, then you can add that optimization to your patch (if you are able to).

Comment #3 by bearophile_hugs — 2010-02-18T15:03:33Z

> Thank you. But is DMD doing the joining with ~ at compile time? If not, then > you can add that optimization to your patch (if you are able to). And if you think it's needed, you can add the clear error message I was talking about :-)

Comment #4 by aifgi90 — 2010-02-28T09:36:58Z

> Thank you. But is DMD doing the joining with ~ at compile time? If not, then > you can add that optimization to your patch (if you are able to). I think DMD is doing joining at compile time (constfold.c, from line 1387)

Comment #5 by bearophile_hugs — 2010-06-20T16:19:15Z

The error message for the missing ~ can be something like this (adapted from the "'l' suffix is deprecated, use 'L' instead" error message generated by the usage of a 10l long literal): adjacent string literals concatenation is deprecated, add ~ between them instead.

Comment #6 by ellery-newcomer — 2010-06-20T16:29:07Z

(In reply to comment #0) > The "Python Zen" has a rule that says: > > Explicit is better than implicit. > the python compiler has a rule that says do the exact same thing as what d is doing. Your serve.

Comment #7 by bearophile_hugs — 2010-06-20T16:51:06Z

I know Python, but I hope D will become better than Python on this syntax detail.

Comment #8 by bearophile_hugs — 2010-08-21T13:38:59Z

A particularly nice example of why untidy syntax easily leads to bugs (this comes from two different sources of sloppiness of the D2 language): enum string[5] data = ["green", "magenta", "blue" "red", "yellow"]; static assert(data[4] == "yellow"); // asserts void main() {}

Comment #9 by bearophile_hugs — 2010-11-10T18:17:17Z

Another bug caused in my code by that anti-feature: unittest { auto tests = [["", "0000"], ["12346", "0000"], ["he", "H000"], ["soundex", "S532"], ["example", "E251"], ["ciondecks", "C532"], ["ekzampul", "E251"], ["resume", "R250"], ["Robert", "R163"], ["Rupert", "R163"], ["Rubin" "R150"], ["Ashcraft", "A226"], ["Ashcroft", "A226"]]; foreach (pair; tests) assert(processit(pair[0]) == pair[1]); } That code compiles with no errors with DMD 2.050, and then causes a Range violation at runtime because one of those arrays isn't a pair.

Comment #10 by bearophile_hugs — 2010-11-10T18:21:06Z

The C# language, that has a very refined design, refuses this code, showing that it doesn't perform automatic joining of adjacent strings: public class Test { public static void Main() { string s = "hello " "world"; } } prog.cs(3,35): error CS1525: Unexpected symbol `world' Compilation failed: 1 error(s), 0 warnings

Comment #11 by bearophile_hugs — 2010-11-12T04:24:57Z

Walter agrees: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=121830

Comment #12 by bearophile_hugs — 2010-11-12T04:32:16Z

A comment from Andrei Alexandrescu: Walter, please don't forget to tweak the associativity rules: var ~ " literal " ~ " literal " concatenates literals first.

Comment #13 by bearophile_hugs — 2010-11-13T19:19:03Z

(In reply to comment #12) A comment from Stewart Gordon: > You mean make ~ right-associative? I think this'll break more code than > it fixes. > > But implementing a compiler optimisation so that var ~ ctc ~ ctc is > processed as var ~ (ctc ~ ctc), _in those cases where they're > equivalent_, would be sensible. > > ctc = compile-time constant

Comment #14 by ellery-newcomer — 2010-11-13T19:26:18Z

you don't need to mess with associativity rules, you just need to be able to handle two or three ast cases: 1. (~ str str) ie str ~ str 2. (~ (~ x str) str) ie x ~ str ~ str 3. (~ str (~ str x)) ie str ~ (str ~ x)

Comment #15 by clugdbug — 2010-11-13T23:51:58Z

(In reply to comment #14) > you don't need to mess with associativity rules, you just need to be able to > handle two or three ast cases: > > 1. (~ str str) ie str ~ str > 2. (~ (~ x str) str) ie x ~ str ~ str > 3. (~ str (~ str x)) ie str ~ (str ~ x) Like this (optimize.c, line 1023): Expression *CatExp::optimize(int result) { Expression *e; //printf("CatExp::optimize(%d) %s\n", result, toChars()); e1 = e1->optimize(result); e2 = e2->optimize(result); + if (e1->op == TOKcat && (e2->op == TOKstring || e2->op == TOKnull) + && (((CatExp *)e1)->e2->op == TOKstring || ((CatExp *)e1)->e2->op == TOKnull)) + { + // Convert (e ~ str) ~ str into e ~ (str ~ str) + CatExp *ce = ((CatExp *)e1); + e1 = ce->e1; + ce->e1 = ce->e2; + ce->e2 = e2; + e2 = ce; + } e = Cat(type, e1, e2); if (e == EXP_CANT_INTERPRET) e = this; return e; }

Comment #16 by clugdbug — 2010-11-13T23:58:35Z

Sorry, missed out a line: if (e1->op == TOKcat && (e2->op == TOKstring || e2->op == TOKnull) && (((CatExp *)e1)->e2->op == TOKstring || ((CatExp *)e1)->e2->op == TOKnull)) { // Convert (e ~ str) ~ str into e ~ (str ~ str) CatExp *ce = ((CatExp *)e1); e1 = ce->e1; ce->e1 = ce->e2; ce->e2 = e2; e2 = ce; + e2 = e2->optimize(result); }

Comment #17 by smjg — 2010-11-16T17:09:41Z

(In reply to comment #5) > The error message for the missing ~ can be something like this (adapted from > the "'l' suffix is deprecated, use 'L' instead" error message generated by the > usage of a 10l long literal): > > adjacent string literals concatenation is deprecated, add ~ between them > instead. Better watch out for cases where just adding ~ changes the behaviour. For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" evaluate to different strings. Not that there's any real use case for "this" "that" anyway. And those rare use cases it does have in D can be fixed by inserting the ~, though there may be easier-to-miss cases of the above of which to be wary.

Comment #18 by smjg — 2010-11-16T17:15:03Z

(In reply to comment #17) > For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" > evaluate to different strings. Different string arrays even.

Comment #19 by nfxjfg — 2010-11-16T19:01:06Z

(In reply to comment #17) > Not that there's any real use case for "this" "that" anyway. And those rare > use cases I use automatic joining all the time for long string literals. I want them to span multiple source lines without containing line breaks. No, not a rarely used feature.

Comment #20 by bearophile_hugs — 2010-11-16T19:38:56Z

(In reply to comment #19) > (In reply to comment #17) > > Not that there's any real use case for "this" "that" anyway. And those rare > > use cases > > I use automatic joining all the time for long string literals. I want them to > span multiple source lines without containing line breaks. > No, not a rarely used feature. Stewart Gordon was just talking about code like: a ~ "this" "that" where a is a string[]. To join multiple lines you may add a ~ at their end: string text = "I use automatic joining all the time for long string literals. I want them to " ~ "span multiple source lines without containing line breaks. " ~ "No, not a rarely used feature.";

Comment #21 by schveiguy — 2010-11-16T21:33:05Z

(In reply to comment #17) > (In reply to comment #5) > > The error message for the missing ~ can be something like this (adapted from > > the "'l' suffix is deprecated, use 'L' instead" error message generated by the > > usage of a 10l long literal): > > > > adjacent string literals concatenation is deprecated, add ~ between them > > instead. > > Better watch out for cases where just adding ~ changes the behaviour. > > For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" > evaluate to different strings. doesn't this solve that problem? a ~ ("this" ~ "that") BTW, I don't expect very many cases like this (in fact, I bet there are none).

Comment #22 by smjg — 2010-11-17T03:58:08Z

(In reply to comment #21) > doesn't this solve that problem? a ~ ("this" ~ "that") It does. My point was that somebody might accidentally not add the brackets.

Comment #23 by dfj1esp02 — 2010-11-17T12:04:03Z

If constfold can access a's type, it can make the right decision.

Comment #24 by bearophile_hugs — 2010-11-22T12:01:40Z

A recent note by Walter: > Andrei's right. This is not about making it right-associative. It is about > defining in the language that: > > ((a ~ b) ~ c) > > is guaranteed to produce the same result as: > > (a ~ (b ~ c)) > > Unfortunately, the language cannot make such a guarantee in the face of operator > overloading. But it can do it for cases where operator overloading is not in play.

Comment #25 by bearophile_hugs — 2011-03-20T04:29:08Z

See also: http://stackoverflow.com/questions/2504536/why-allow-concatenation-of-string-literals

Comment #26 by bearophile_hugs — 2012-03-10T17:31:34Z

An example of the problems this avoids:http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=22649 Andrej Mitrovic: > I see you are not the only one who started writing string array > literals like this: > > enum PEGCode = grammarCode!( > "Grammar <- S Definition+ EOI" > ,"Definition <- RuleName Arrow Expression" > ,"RuleName <- Identifier>(ParamList?)" > ,"Expression <- Sequence (OR Sequence)*" > ); > > IOW comma on the left side. I know it's not a style preference but > actually a (unfortunate but needed) technique for avoiding bugs. :)

Comment #27 by andrej.mitrovich — 2012-03-10T17:56:16Z

(In reply to comment #26) > > enum PEGCode = grammarCode!( > > "Grammar <- S Definition+ EOI" > > ,"Definition <- RuleName Arrow Expression" > > ,"RuleName <- Identifier>(ParamList?)" > > ,"Expression <- Sequence (OR Sequence)*" > > ); Note that this is Philippe Sigaud's code. So you can him, and me to the list of people affected by this. I'm doing string processing in D on a day-to-day basis, and whenever I have a list of strings I eventually end up shooting myself in the foot because of a missing comma. It's very easy (at least for clumsy me) to make the mistake. E.g. writing some headers to ignore: string[] ignoredHeaders = [ "foo.bar" // todo: have to fix this later "foo.do", // todo: later ]; When I have comments next to the strings it makes it easy to miss the missing comma, especially if the strings are of a different length.

Comment #28 by monarchdodra — 2014-01-19T02:08:29Z

(In reply to comment #27) > Note that this is Philippe Sigaud's code. So you can him, and me to the list of > people affected by this. Add me too.

Comment #29 by yebblies — 2014-01-19T19:56:07Z

*** This issue has been marked as a duplicate of issue 2685 ***

Comment #30 by bearophile_hugs — 2014-01-20T03:31:20Z

*** Issue 2685 has been marked as a duplicate of this issue. ***

Comment #31 by smjg — 2014-01-20T16:39:32Z

(In reply to comment #24) > A recent note by Walter: > > > Andrei's right. This is not about making it right-associative. It is about > > defining in the language that: > > > > ((a ~ b) ~ c) > > > > is guaranteed to produce the same result as: > > > > (a ~ (b ~ c)) You mean by doing away with concatenation of an array type with its element type altogether? That would be overly restrictive.

Comment #32 by bearophile_hugs — 2014-04-27T10:32:38Z

A new bug of mine caused by the implicit concatenation of strings. The point of this little program is to show the Phobos isNumeric function: void main() { import std.stdio, std.string, std.array; foreach (const s; ["12", " 12\t", "hello12", "-12", "02" "0-12", "+12", "1.5", "1,000", "1_000", "0x10", "0b10101111_11110000_11110000_00110011", "-0b10101", "0x10.5"]) writefln(`isNumeric("%s"): %s`, s, s.strip().isNumeric(true)); } As you see the last example "02" of the first row of the array literal lacks a comma. I have found this bug with the DScanner tool.

Comment #33 by bearophile_hugs — 2014-04-30T12:21:45Z

A problem with @nogc: void main() @nogc { string s1 = "AB" "CD"; // OK string s2 = "AB" ~ "CD"; // Error } temp.d(3,17): Error: cannot use operator ~ in @nogc function main Some workarounds: void main() @nogc { static immutable string s3 = "AB" ~ "CD"; // OK string s4 = ctEval!("AB" ~ "CD"); // OK } Where ctEval is used to generate a compile-time value.

Comment #34 by smjg — 2014-04-30T12:31:03Z

(In reply to bearophile_hugs from comment #33) > A problem with @nogc: > > > void main() @nogc { > string s1 = "AB" "CD"; // OK > string s2 = "AB" ~ "CD"; // Error > } > > > temp.d(3,17): Error: cannot use operator ~ in @nogc function main That's a separate bug - please file it if it isn't already there.

Comment #35 by bearophile_hugs — 2014-04-30T13:02:43Z

(In reply to Stewart Gordon from comment #34) > > void main() @nogc { > > string s1 = "AB" "CD"; // OK > > string s2 = "AB" ~ "CD"; // Error > > } > > > > > > temp.d(3,17): Error: cannot use operator ~ in @nogc function main > > That's a separate bug - please file it if it isn't already there. I don't understand what you suggest me to file. A line of code "string s2 = "AB" ~ "CD";" performs a run-time array concatenation, that always performs a memory allocation, so it can't be @nogc.

Comment #36 by yebblies — 2014-04-30T13:10:14Z

(In reply to bearophile_hugs from comment #35) > > I don't understand what you suggest me to file. If the @nogc enforcement was done after constant folding, then that expression wouldn't use the gc. Same with things like `[1, 2, 3][0]`. > A line of code "string s2 = > "AB" ~ "CD";" performs a run-time array concatenation, that always performs > a memory allocation, so it can't be @nogc. Try it, look at the asm. No memory allocation.

Comment #37 by bearophile_hugs — 2014-04-30T13:18:40Z

(In reply to yebblies from comment #36) > Try it, look at the asm. No memory allocation. The asm is not enough. @nogc is a purely front-end thing, to avoid different optimizations in different compilers to lead to @nogc-annotated code to compile or not compile in different compilers. So this problem is not a bug, it's an enhancement request, to add this optimization in the front-end.

Comment #38 by bearophile_hugs — 2014-04-30T13:38:22Z

(In reply to bearophile_hugs from comment #33) > Where ctEval is used to generate a compile-time value. A simple implementation: alias ctEval(alias expr) = expr;

Comment #39 by yebblies — 2014-04-30T13:50:34Z

(In reply to bearophile_hugs from comment #37) > (In reply to yebblies from comment #36) > > > Try it, look at the asm. No memory allocation. > > The asm is not enough. @nogc is a purely front-end thing, to avoid different > optimizations in different compilers to lead to @nogc-annotated code to > compile or not compile in different compilers. So this problem is not a bug, > it's an enhancement request, to add this optimization in the front-end. As nice as that all sounds, the reality is that the dmd frontend currently defines which optimizations are guaranteed by the frontend. There are many places where const-folding already affects semantics (especially implicit conversions). Tying improving the usability of @nogc to a mythical optimization spec will likely only achieve preventing improvement to the usability of @nogc. > to add this optimization in the front-end. That is exactly where const-folding is done.

Comment #40 by bearophile_hugs — 2014-04-30T14:13:20Z

(In reply to yebblies from comment #39) > That is exactly where const-folding is done. You know the compiler much better than me, while I don't understand this situation enough. So I leave to you or Stewart to open the bug/ER. See also Issue 12642 for a probably related problem.

Comment #41 by github-bugzilla — 2016-09-28T05:11:10Z

Commits pushed to master at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/31f070f607398bcef72110f40c125a7f2ea7761b Fix issue 3827 - Deprecate implicit string concatenation Implicit string concatenation is an old feature that can now be replaced by explicit concatenation. It was also a source of bug, even for experienced D programmer, and the deprecation has been agreed on in 2010: http://forum.dlang.org/post/[email protected] https://github.com/dlang/dmd/commit/4fa0ddfcfa7e9914a4ce128595f04c67cbb7b5df Merge pull request #6155 from mathias-lang-sociomantic/deprecate-implicit-concat Fix issue 3827 - Deprecate implicit string concatenation

Comment #42 by github-bugzilla — 2016-10-01T11:48:50Z

Commits pushed to stable at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/31f070f607398bcef72110f40c125a7f2ea7761b Fix issue 3827 - Deprecate implicit string concatenation https://github.com/dlang/dmd/commit/4fa0ddfcfa7e9914a4ce128595f04c67cbb7b5df Merge pull request #6155 from mathias-lang-sociomantic/deprecate-implicit-concat

Comment #43 by github-bugzilla — 2016-10-24T19:17:53Z

Commit pushed to master at https://github.com/dlang/dlang.org https://github.com/dlang/dlang.org/commit/aed85471943baba2b003df341dacc7a185a33d53 Issue 3827: Document deprecation of implicit string concatenation See dlang/dmd#6155

Comment #44 by github-bugzilla — 2016-11-04T09:05:13Z

Commits pushed to newCTFE at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/31f070f607398bcef72110f40c125a7f2ea7761b Fix issue 3827 - Deprecate implicit string concatenation https://github.com/dlang/dmd/commit/4fa0ddfcfa7e9914a4ce128595f04c67cbb7b5df Merge pull request #6155 from mathias-lang-sociomantic/deprecate-implicit-concat

Comment #45 by github-bugzilla — 2017-01-07T03:02:11Z

Commit pushed to stable at https://github.com/dlang/dlang.org https://github.com/dlang/dlang.org/commit/aed85471943baba2b003df341dacc7a185a33d53 Issue 3827: Document deprecation of implicit string concatenation