Bug 4389 – ICE(constfold.c, expression.c), or wrong code: string~=dchar in CTFE

Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2010-06-25T02:01:00Z
Last change time
2015-06-09T05:12:00Z
Keywords
ice-on-valid-code, patch, wrong-code
Assigned to
nobody
Creator
yebblies

Attachments

IDFilenameSummaryContent-TypeSize
827patch4389.patchPatch against D2 svn 767text/plain4530

Comments

Comment #0 by yebblies — 2010-06-25T02:01:40Z
With dmd2.047 the following code will create a string which is actually invalid utf-32. The problem goes away if you start with a non-empty string. This can occur easily when appending is used in conjunction with the new string front() and back() functions that return dchar. test case: import std.stdio; string get() { string s; dchar c = 'D'; s ~= c; s ~= c; return s; } void main() { enum s = get(); // Call 'get' with ctfe writeln('"', s, '"'); // outputs "D " auto p = (s.ptr); // take pointer to defeat bounds checking auto s2 = p[0..8]; writeln('"', s2, '"'); // outputs "D D " auto s3 = get(); // At compile time, this works as expected writeln('"', s3, '"'); // outputs "DD" }
Comment #1 by clugdbug — 2010-11-23T00:56:25Z
Actually constant folding of string ~= dchar is ALWAYS wrong, even if the string is non-empty. All these tests fail. It can also cause an ICE(constfold.c) if you attempt a further concatenation, or ICE(expression.c) if it is mixed in (bug 3490). It happens because string ~= dchar is treated specially in the backend, but not in constfold.c. --------- int bug4389() { string s; dchar c = '\u2348'; s ~= c; assert(s.length==3); dchar d = 'D'; s ~= d; assert(s.length==4); s = ""; s ~= c; assert(s.length==3); s ~= d; assert(s.length==4); string z; wchar w = '\u0300'; z ~= w; assert(z.length==2); z = ""; z ~= w; assert(z.length==2); return 1; } static assert(bug4389()); // ICE(constfold.c) int ice4389() { string s; dchar c = '\u2348'; s ~= c; s = s ~ "xxx"; return 1; } static assert(ice4389()); // ICE(expression.c) string ice4390() { string s; dchar c = '`'; s ~= c; s ~= c; return s; } static assert(mixin(ice4390()) == ``);
Comment #2 by clugdbug — 2010-11-23T00:57:51Z
*** Issue 4390 has been marked as a duplicate of this issue. ***
Comment #3 by clugdbug — 2010-11-23T12:33:23Z
Created attachment 827 Patch against D2 svn 767 Although the patch is relatively long, it is very mundane. The patch adds these functions to utf.c (they are copied from druntime). These convert single chars from UTF32 to UTF8 or UTF16. void utf_encodeChar(unsigned char *s, dchar_t c); void utf_encodeWchar(unsigned short *s, dchar_t c); int utf_codeLengthChar(dchar_t c); int utf_codeLengthWchar(dchar_t c); Two other wrapper functions provide the interface that is actually used: int utf_codeLength(int sz, dchar_t c); void utf_encode(int sz, void *s, dchar_t c); Then, these functions are used in Cat() in constfold.c Instead of assuming a len of 1, it uses utf_codeLength; and instead of a memcpy, it uses utf_encode. The cases which are patched in Cat() are: TOKnull ~ dchar, dchar ~ TOKnull, TOKstring ~ dchar.
Comment #4 by bugzilla — 2010-12-27T18:54:58Z
The patch should work in D1, but does not.
Comment #5 by bugzilla — 2010-12-27T19:27:54Z
http://www.dsource.org/projects/dmd/changeset/825 Leaving bug open until D1 is fixed.
Comment #6 by clugdbug — 2011-01-19T23:59:47Z
Fails on D1 because dchar, wchar implicitly convert to char, even if they don't fit. PATCH: expression.c, CatAssignExp::semantic(), line 8574. The two 'else' clauses need to be swapped, so that we check for an array~char before we check for implicit conversion. { // Append array e2 = e2->castTo(sc, e1->type); type = e1->type; e = this; } ----------- SWAP THIS SECTION WITH THE NEXT ONE else if ((tb1->ty == Tarray) && e2->implicitConvTo(tb1next) ) { // Append element e2 = e2->castTo(sc, tb1next); type = e1->type; e = this; } ------- else if (tb1->ty == Tarray && (tb1next->ty == Tchar || tb1next->ty == Twchar) && e2->implicitConvTo(Type::tdchar) ) { // Append dchar to char[] or wchar[] e2 = e2->castTo(sc, Type::tdchar); type = e1->type; e = this; /* Do not allow appending wchar to char[] because if wchar happens * to be a surrogate pair, nothing good can result. */ } ----------- else { if (tb1 != Type::terror && tb2 != Type::terror) error("cannot append type %s to type %s", tb2->toChars(), tb1->toChars()); e = new ErrorExp(); } ALTERNATE PATCH: Incidentally, if dchar->char is disallowed, all DMD tests still pass. But this would probably cause existing valid code to break, so probably the first solution is better. mtype.c, line 1570, MATCH TypeBasic::implicitConvTo(Type *to) + if ((ty == Tdchar || ty == Twchar) && to->ty == Tchar) + return MATCHnomatch; + if (ty == Tdchar && to->ty == Twchar) + return MATCHnomatch; } else if (flags & TFLAGSfloating) { // Disallow implicit conversion of floating point to integer if (tob->flags & TFLAGSintegral) return MATCHnomatch;
Comment #7 by clugdbug — 2011-02-06T13:53:08Z