Bug 5717 – 1.067 regression: appending Unicode char to string broken

Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P2
Component
dmd
Product
D
Version
D1 (retired)
Platform
x86
OS
Windows
Creation time
2011-03-07T17:18:00Z
Last change time
2011-03-11T08:35:49Z
Keywords
patch, wrong-code
Assigned to
nobody
Creator
dlang-bugzilla

Comments

Comment #0 by dlang-bugzilla — 2011-03-07T17:18:53Z
void main() { string s, s2; s = "Привет"; foreach(c; s) s2 ~= c; assert(s == s2); } DMD now seems to consider each individual char a whole code point (as if it was automatically promoted to dchar).
Comment #1 by sohgo — 2011-03-09T04:35:00Z
Same problem happens on FreeBSD 8.2 with DMD 1.067 too. But the problem does not happen with DMD 1.066.
Comment #2 by clugdbug — 2011-03-10T01:14:55Z
I think this is a foreach problem. Probably triggered by the fix to bug 4389.
Comment #3 by dlang-bugzilla — 2011-03-10T01:17:37Z
It doesn't look like a foreach problem. This fails too: void main() { string s, s2; s = "Привет"; for (int i=0; i<s.length; i++) s2 ~= s[i]; assert(s == s2); }
Comment #4 by clugdbug — 2011-03-10T04:26:27Z
(In reply to comment #3) > It doesn't look like a foreach problem. This fails too: Hmm. You're right. And yet it works fine on D2. It's inserting a call to _d_arrayappendcd, which means the append has been changed into char[] ~ dchar.
Comment #5 by clugdbug — 2011-03-10T07:13:37Z
It was indeed caused by the fix to bug 4389, which wasn't tight enough. s~= c shouldn't turn c into a dchar, if both s and c are the same type. (ie, char[]~=char should go through unaltered). That leaves wchar[] ~ char, which I think is inevitably a mess if c is outside the ASCII range. expression.c, line 8593. CatAssignExp::semantic() { // Append array e2 = e2->castTo(sc, e1->type); type = e1->type; e = this; } else if (tb1->ty == Tarray && (tb1next->ty == Tchar || tb1next->ty == Twchar) && + e2->type->ty != tb1next->ty && e2->implicitConvTo(Type::tdchar) ) { // Append dchar to char[] or wchar[] e2 = e2->castTo(sc, Type::tdchar); type = e1->type; e = this; /* Do not allow appending wchar to char[] because if wchar happens * to be a surrogate pair, nothing good can result. */
Comment #6 by sohgo — 2011-03-10T19:27:38Z
(In reply to comment #5) I've tried Don's patch, it works good in my environment. That's great. Thank you.
Comment #7 by bugzilla — 2011-03-11T03:07:29Z
Comment #8 by dlang-bugzilla — 2011-03-11T08:35:49Z
Thanks - not sure what the second commit has to do with it, though.