Bug 2418 – Same-value string (char[]) literals get overwritten (unlike array literals)

Status
RESOLVED
Resolution
INVALID
Severity
normal
Priority
P2
Component
dmd
Product
D
Version
D1 (retired)
Platform
x86
OS
Windows
Creation time
2008-10-14T17:50:00Z
Last change time
2014-03-01T00:36:06Z
Assigned to
bugzilla
Creator
bus_dbugzilla

Comments

Comment #0 by bus_dbugzilla — 2008-10-14T17:50:58Z
When a string is initialized with a string literal, and part of the string is then changed, all other string literals with the same value reflect the same change. For the following code: ------------------ module test; import tango.io.Stdout; class FooChar { char[] str; this(char[] _str) { this.str = _str; } } class FooInt { int[] ints; this(int[] _ints) { this.ints = _ints; } } void main() { FooChar a; FooChar b; FooChar c; FooChar d; a = new FooChar("AAA"); Stdout.formatln("a.str: {}", a.str); a.str[0] = '!'; Stdout.formatln("a.str: {}", a.str); a = new FooChar("AAA"); Stdout.formatln("a.str: {}", a.str); Stdout.formatln(""); b = new FooChar("AAA"); Stdout.formatln("b.str: {}", b.str); b.str[0] = '!'; Stdout.formatln("b.str: {}", b.str); b = new FooChar("AAA"); Stdout.formatln("b.str: {}", b.str); Stdout.formatln(""); c = new FooChar("AA1"); Stdout.formatln("c.str: {}", c.str); c.str[0] = '!'; Stdout.formatln("c.str: {}", c.str); c = new FooChar("AA1"); Stdout.formatln("c.str: {}", c.str); Stdout.formatln(""); const char[] dInit = "AA2"; d = new FooChar(dInit); Stdout.formatln("d.str: {}", d.str); d.str[0] = '!'; Stdout.formatln("d.str: {}", d.str); d = new FooChar(dInit); Stdout.formatln("d.str: {}", d.str); Stdout.formatln(""); FooInt i; i = new FooInt([1, 2]); Stdout.formatln("i.ints: {}", i.ints); i.ints[0] = 77; Stdout.formatln("i.ints: {}", i.ints); i = new FooInt([1, 2]); Stdout.formatln("i.ints: {}", i.ints); } ------------------ Expected output: a.str: AAA a.str: !AA a.str: AAA b.str: AAA b.str: !AA b.str: AAA c.str: AA1 c.str: !A1 c.str: AA1 d.str: AA2 d.str: !A2 d.str: AA2 i.ints: [1, 2] i.ints: [77, 2] i.ints: [1, 2] Actual output: a.str: AAA a.str: !AA a.str: !AA b.str: !AA b.str: !AA b.str: !AA c.str: AA1 c.str: !A1 c.str: !A1 d.str: AA2 d.str: !A2 d.str: !A2 i.ints: [1, 2] i.ints: [77, 2] i.ints: [1, 2] This might be a side-effect of the fix for #817: http://d.puremagic.com/issues/show_bug.cgi?id=817
Comment #1 by bus_dbugzilla — 2008-10-14T17:59:19Z
Sorry, I guess those classes are not needed: --------------- module test; import tango.io.Stdout; void main() { char[] a; char[] b; char[] c; char[] d; a = "AAA"; Stdout.formatln("a: {}", a); a[0] = '!'; Stdout.formatln("a: {}", a); a = "AAA"; Stdout.formatln("a: {}", a); Stdout.formatln(""); b = "AAA"; Stdout.formatln("b: {}", b); b[0] = '!'; Stdout.formatln("b: {}", b); b = "AAA"; Stdout.formatln("b: {}", b); Stdout.formatln(""); c = "AA1"; Stdout.formatln("c: {}", c); c[0] = '!'; Stdout.formatln("c: {}", c); c = "AA1"; Stdout.formatln("c: {}", c); Stdout.formatln(""); const char[] dInit = "AA2"; d = dInit; Stdout.formatln("d: {}", d); d[0] = '!'; Stdout.formatln("d: {}", d); d = dInit; Stdout.formatln("d: {}", d); Stdout.formatln(""); int[] i; i = [1, 2]; Stdout.formatln("i: {}", i); i[0] = 77; Stdout.formatln("i: {}", i); i = [1, 2]; Stdout.formatln("i: {}", i); } --------------- Expected output: a: AAA a: !AA a: AAA b: AAA b: !AA b: AAA c: AA1 c: !A1 c: AA1 d: AA2 d: !A2 d: AA2 i: [1, 2] i: [77, 2] i: [1, 2] Actual output: a: AAA a: !AA a: !AA b: !AA b: !AA b: !AA c: AA1 c: !A1 c: !A1 d: AA2 d: !A2 d: !A2 i: [1, 2] i: [77, 2] i: [1, 2]
Comment #2 by 2korden — 2008-10-14T18:01:16Z
The bug is at your site. You don't make string copies and work on a single string. char[] a = "AAA"; char[] b = a; // both point to the same location a[0] = '!'; assert(b[0] == '!'); // should be true since they both point to the same location Same example in C (this may help understand better): char* str = "AAA"; char* ptr = str; str[0] = '!'; assert(ptr[0] == '!'); The actual bug is that the following line should not compile: char[] a = "AAA"; Problem is, D1 doesn't have invariant type modifier and the spec is frozen, so this will never be fixed. D2, however, doesn't allow you to do this. Solution would be to make copies explicitly: a = new FooChar("AAA".dup);
Comment #3 by bus_dbugzilla — 2008-10-14T20:34:40Z
(In reply to comment #2) > The bug is at your site. You don't make string copies and work on a single > string. > > char[] a = "AAA"; > char[] b = a; // both point to the same location > a[0] = '!'; > assert(b[0] == '!'); // should be true since they both point to the same > location > > Same example in C (this may help understand better): > char* str = "AAA"; > char* ptr = str; > str[0] = '!'; > assert(ptr[0] == '!'); > I don't think that's applicable (or at least, it shouldn't be). Here's a shortened example: char[] a = "AAA"; a[0] = '!'; char[] b = "AAA"; It's absurd that 'b' should become "!AA". It's clear that the compiler is "optimizing" both "AAA" string literals into the same reference: char[] _tmp1 = "AAA"; char[] a = _tmp1; a[0] = '!'; char[] b = _tmp1; But I'm saying that transformation should not be occurring. The two "AAA" literals should not share the same reference. In D2, array literals are immutable, so yes, in D2 it works out fine. But in D1, array literals are *not* immutable, so the compiler shouldn't assume that array literals with a equal *value* can safely share the same *reference*. > The actual bug is that the following line should not compile: > char[] a = "AAA"; Again, this is for D1, so the idea of strings being immutable doesn't apply. Besides, at the very least there's still an inconsistency: char[] a = "AAA"; a[0] = '!'; char[] b = "AAA"; // b is "!AA" char[] c = ['A', 'A', 'A']; c[0] = '!'; char[] d = ['A', 'A', 'A']; // d is "AAA" The above shows that two occurrences of "AAA" share the same reference, but two occurrences of ['A', 'A', 'A'] have separate references. The former leads to odd side-effects, the latter does not.
Comment #4 by 2korden — 2008-10-15T00:02:37Z
> ------- Comment #3 from [email protected] 2008-10-14 > > I don't think that's applicable (or at least, it shouldn't be). Here's a > shortened example: > > char[] a = "AAA"; > a[0] = '!'; > char[] b = "AAA"; > > It's absurd that 'b' should become "!AA". Well, that's what I expect, at least :) > It's clear that the compiler is > "optimizing" both "AAA" string literals into the same reference: > > char[] _tmp1 = "AAA"; > char[] a = _tmp1; > a[0] = '!'; > char[] b = _tmp1; Of course it does! This string literal is put into executable. "AAA" is just a pointer to it. Please, understand that char[] is nothing more than a pointer to a string and an associated length. When you change the string via pointer, everyone who has that pointer gets the change. Compare to object: class A { int i = 15; } A aaa = new A(); // aaa is an analog of "AAA" A a = aaa; writefln(a); // prints 15 a.i = 42; A b = aaa; writefln(b); // why the hell it prints 42???? This is because you don't make a copy of aaa. Here is another example: while (true) { char[] a = "AAA"; a[0] = '!'; // in the next iteration, should a be "AAA" again? } If answer is yes, then there should be a memory allocation in line 2 under the hood. You can't get new string every time without a memory allocation. This, however, definitely should not occur. If you want to make an allocation - do it yourself explicitely. Code with the behaviour you expect should be as follows: while (true) { char[] a = "AAA".dup; writefln(a); a[0] = '!'; writefln(a); } This is by design, it works as it should. It can't be 'fixed' because it is not broken. > Besides, at the very least there's still an inconsistency: > > char[] a = "AAA"; > a[0] = '!'; > char[] b = "AAA"; // b is "!AA" > > char[] c = ['A', 'A', 'A']; > c[0] = '!'; > char[] d = ['A', 'A', 'A']; // d is "AAA" I agree, this might be a bug, d really ought to be "!AAA"! Do you want to create a new bugreport so that the latter case would be 'fixed'? :)
Comment #5 by kamm-removethis — 2008-10-15T02:36:57Z
The D spec says "String literals are immutable (read only)." even in D1. So code like this char[] a = "ABC"; a[0] = '!'; is illegal (and will segfault on linux!). Array literals are allocated on the heap (the D spec explicitly says so) and will thus behave fundamentally different from string literals.
Comment #6 by bus_dbugzilla — 2008-10-15T05:13:03Z
(In reply to comment #5) > The D spec says "String literals are immutable (read only)." even in D1. So > code like this > > char[] a = "ABC"; > a[0] = '!'; > > is illegal (and will segfault on linux!). > > Array literals are allocated on the heap (the D spec explicitly says so) and > will thus behave fundamentally different from string literals. > Alright, I see now. It does seem odd though that array literals would carry a type that lacks any sort of const/read-only/immutable/etc qualifier, but is still immutable anyway. Just another of D2's improvements I guess. Out of curiosity, any idea if the lack of a run-time error on windows is due to windows itself or just something DMD/OPTLINK does differently between the platforms?