Bug 8229 – string literals are not zero-terminated during CTFE

Status
NEW
Severity
major
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-06-11T15:56:58Z
Last change time
2024-12-13T18:00:30Z
Keywords
CTFE
Assigned to
No Owner
Creator
timon.gehr
Moved to GitHub: dmd#18450 →

Comments

Comment #0 by timon.gehr — 2012-06-11T15:56:58Z
DMD 2.059: static assert(!(x){return *x;}("".ptr)); // error The static assertion should pass.
Comment #1 by clugdbug — 2012-06-12T09:48:41Z
This behaviour is intentional. Pointer operations are strictly checked in CTFE. It's the same as doing int n = 0; char c = ""[n]; which generates an array bounds error at runtime. Is the terminating null character still in the spec? A long time ago it was in there, but now I can only find two references to it in the current spec (in 'arrays' and in 'interfacing to C'), and they both relate to printf. The most detailed is in 'interface to C', which states: "string literals, when they are not part of an initializer to a larger data structure, have a '\0' character helpfully stored after the end of them." which is pretty weird. These funky semantics would be difficult to implement in CTFE, and I doubt they are desirable. Here's an example: const(char)[] foo(char[] s) { return "abc" ~ s; } immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE bool baz() { immutable bar2 = foo("xyz"); // local variable, so isn't a string literal. return true; } static assert(baz()); ---> bar is zero-terminated, bar2 is not, even though they had the same assignment. When does this magical trailing zero get added? I think you could reasonably interpret the spec as meaning that a trailing zero is added to the end of string literals by the linker, not by the compiler. It's only in CTFE that you can tell the difference.
Comment #2 by timon.gehr — 2012-06-12T10:55:45Z
(In reply to comment #1) > This behaviour is intentional. Pointer operations are strictly checked in CTFE. > It's the same as doing > > int n = 0; > char c = ""[n]; > > which generates an array bounds error at runtime. > I think that would be stretching it too far. It is more like: auto s = ['\0']; auto q = s[0..0]; char c = *q.ptr; Which works fine at runtime and during CTFE. > Is the terminating null character still in the spec? A long time ago it was in > there, but now I can only find two references to it in the current spec (in > 'arrays' and in 'interfacing to C'), and they both relate to printf. > > The most detailed is in 'interface to C', which states: > "string literals, when they are not part of an initializer to a larger data > structure, have a '\0' character helpfully stored after the end of them." > > which is pretty weird. These funky semantics would be difficult to implement in > CTFE, I guess this is from D1 times, when string literals were static arrays, and doesn't apply anymore. > and I doubt they are desirable. Here's an example: > > const(char)[] foo(char[] s) { return "abc" ~ s; } > > immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE > Well, this is not specified afaics. > bool baz() > { > immutable bar2 = foo("xyz"); // local variable, so isn't a string literal. > > return true; > } > static assert(baz()); > > ---> bar is zero-terminated, bar2 is not, even though they had the same > assignment. When does this magical trailing zero get added? > This is exactly the behavior that is observed at runtime. If it is undesirable, then that is a distinct issue that should be investigated. It would certainly be desirable to have consistent behavior at compile time and at runtime, but this is not a top-priority issue. > I think you could reasonably interpret the spec as meaning that a trailing zero > is added to the end of string literals by the linker, not by the compiler. It's > only in CTFE that you can tell the difference. In this case, the spec should definitely be fixed.
Comment #3 by clugdbug — 2012-06-13T01:44:42Z
(In reply to comment #2) > (In reply to comment #1) > > This behaviour is intentional. Pointer operations are strictly checked in CTFE. > > It's the same as doing > > > > int n = 0; > > char c = ""[n]; > > > > which generates an array bounds error at runtime. > > > > I think that would be stretching it too far. It is more like: > > auto s = ['\0']; > auto q = s[0..0]; > char c = *q.ptr; That's an interesting interpretation. It can't be true for D1, where string literals are fixed length arrays, but it could work for D2. In D1 it's more like: struct S { static char[3] s = ['a', 'b', 'c']; static char terminator = '\0'; } And every mention of it in the spec dates from D1. > > Is the terminating null character still in the spec? A long time ago it was in > > there, but now I can only find two references to it in the current spec (in > > 'arrays' and in 'interfacing to C'), and they both relate to printf. > > > > The most detailed is in 'interface to C', which states: > > "string literals, when they are not part of an initializer to a larger data > > structure, have a '\0' character helpfully stored after the end of them." > > > > which is pretty weird. These funky semantics would be difficult to implement in > > CTFE, > > I guess this is from D1 times, when string literals were static arrays, and > doesn't apply anymore. Could be. So the few parts of the spec that mention it are horribly out-of-date. Though it also applies to assigning to fixed length arrays. immutable(char)[3] s = "abc"; // Does this have a trailing zero? > > and I doubt they are desirable. Here's an example: > > > > const(char)[] foo(char[] s) { return "abc" ~ s; } > > > > immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE > > > > Well, this is not specified afaics. Hmm, maybe it isn't. The spec says almost nothing about the whole thing. What I do know is that there is a lot of existing code that relies on this behaviour (especially, "abc" ~ "def" having a trailing zero). Pretty much the only thing the spec says is that you can use string literals with printf. Does TDPL mention it? The spec definitely needs to be improved.
Comment #4 by code — 2013-09-27T15:58:28Z
--- string bug(string a) { char[] buf; buf.length = a.length; buf[0 .. a.length] = a[]; return cast(string)buf[]; } static const var = bug("foo"); --- I have a much bigger problem related to this. String literals resulting from CTFE are missing the terminating zero in the data segment. Whether or not the bug bites depends on the object layout and the virtual memory mapping, so this is pretty annoying because it works too often. The underlying issue is that var is emitted to the object file from ArrayLiteralExp::toDt which doesn't perform the zero termination. Not sure if and at which stage this should be converted to a StringLiteralExp.
Comment #5 by code — 2013-09-28T04:20:53Z
It is also a huge performance issue to use ArrayLiteralExp instead of StringLiteralExp during object emission because the compiler creates a list of 1-byte elements. If for example you generate a 5kB string in CTFE this induces a huge overhead.
Comment #6 by k.hara.pg — 2015-01-21T02:16:28Z
I'd just introduce a sample code. From the comment in issue 7570: bool not_end(const char *s, const int n) { return s && s[n]; } bool str_prefix(const char *s, const char *t, const int ns, const int nt) { return (s == t) || !*(t + nt) || (*(s + ns) == *(t + nt) && (str_prefix(s, t, ns+1, nt+1))); } bool contains(const char *s, const char *needle, const int n=0) { return not_end(s, n) && (str_prefix(s, needle, n, 0) || contains(s, needle, n+1)); } enum int x = contains("froogler", "oogle"); Today the code fails to CTFE by the reading of string zero terminator. Supporting it in CTFE may be useful for C string operations.
Comment #7 by robert.schadek — 2024-12-13T18:00:30Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/18450 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB