Bug 15136 – If we want toStringz to be fully correct, it needs to stop checking for '\0'
Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2015-10-02T05:24:48Z
Last change time
2021-02-22T15:12:12Z
Keywords
pull
Assigned to
No Owner
Creator
Jonathan M Davis
Comments
Comment #0 by issues.dlang — 2015-10-02T05:24:48Z
I believe that this has been brought before (by Andrej Mitrovic IIRC), but toStringz has never been changed to fix the problem. This code succeeds:
import std.string;
void main()
{
auto s = S("01234");
auto str = s.str.toStringz;
assert(str == s.str.ptr);
assert(*(str + 5) == 0); // Null terminated.
s.foo = 42;
assert(*(str + 5) == 42); // No longer null terminated.
}
struct S
{
immutable char[5] str;
ubyte foo;
this(char[5] str)
{
this.str = str;
}
}
We have managed to create a null-terminated string which stops being null-terminated after its creation, because the null terminator that toStringz found was actually in the member variable after the static array, not in the static array itself. Now, fixing it so that toStringz had a separate overload for static arrays would fix the code as it stands, but simply slicing the static array before passing it to toStringz would get around that, making this a problem again.
What we _really_ want for toStringz to do is to detect string literals, and if we could do that, then we'd be home free, because toStringz could return the ptr value for a string literal and append '\0' to everything else, but AFAIK, we have no way to detect string literals.
Realistically, this problem is going to be extremely rare, but it _is_ possible, and it does potentially result in buffer overflows, which could mean that a program using toStringz could end up with a security problem in spite of D's array bounds checking and supposed guarantee from toStringz that the string is null-terminated. So, unless we want to take the approach that this example is so unlikely that we don't want to worry about it and just assume that no one is going to have buffer overflows because of this, I think that we're forced to make it so that toStringz always appends even for immutable(char)[] - either that, or we find a way to detect string literals (which would be the ideal solution), but I have no idea how that could be done.
Comment #1 by andrej.mitrovich — 2015-10-02T14:11:31Z
+1. If we really need to avoid memory allocations let's just add an overload or something like assumeZeroTerminated or something..
Comment #2 by john.loughran.colvin — 2015-10-03T11:50:19Z
It seem to me like this could happen with static arrays directly on the stack as well, depending on what variables the compiler happens to put directly after.
Comment #3 by schveiguy — 2017-10-30T13:48:02Z
(In reply to John Colvin from comment #2)
> It seem to me like this could happen with static arrays directly on the
> stack as well, depending on what variables the compiler happens to put
> directly after.
It can be done with heap-allocated arrays as well:
auto s = "hello".idup;
auto str = s.toStringz;
assert(s.ptr == str);
s ~= 'b';
assert(s[5] == 'b');
The assumption that a coincidental null character after the string must mean it is part of a literal is completely unsound. Not to mention, if the null happens to be on a 4-byte boundary, then it doesn't even look at it, potentially wasting an allocation.
What we really need, as Jonathan says, is a way to tell if a string pointer points at a string literal or not.
This should be possible I would think, as the linker puts all the literals into the same section, no? All we have to do is check whether the array points in that section, and the null character pointer is in that section, and the null character is null. Otherwise, allocate.
Comment #4 by schveiguy — 2017-10-30T15:00:23Z
(In reply to Steven Schveighoffer from comment #3)
> assert(s[5] == 'b');
That should have read str[5], obviously :)
Comment #5 by dlang-bot — 2021-02-20T16:36:03Z
@aG0aep6G created dlang/phobos pull request #7806 "fix issue 15136 - If we want toStringz to be fully correct, it needs …" fixing this issue:
- fix issue 15136 - If we want toStringz to be fully correct, it needs to stop checking for '\0'
https://github.com/dlang/phobos/pull/7806
Comment #6 by dlang-bot — 2021-02-22T15:12:12Z
dlang/phobos pull request #7806 "fix issue 15136 - If we want toStringz to be fully correct, it needs …" was merged into master:
- c55ce44705ddac208d5d7340b60ad2f8eacd0552 by aG0aep6G:
fix issue 15136 - If we want toStringz to be fully correct, it needs to stop checking for '\0'
https://github.com/dlang/phobos/pull/7806