Bug 7942 – Appending different string types corrupts memory
Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-04-19T03:45:00Z
Last change time
2015-02-18T03:37:58Z
Keywords
accepts-invalid, pull, wrong-code
Assigned to
yebblies
Creator
james
Comments
Comment #0 by james — 2012-04-19T03:45:23Z
Appending a regular `string` to a `dstring` does not cause a compile time error.
This case:
string a = "abc";
dstring b = "abc"d;
b ~= a;
Causes a runtime error: array cast misalignment.
This case:
string a = "abcd";
dstring b = "abcd"d;
b ~= a;
writeln(b);
causes a segmentation fault. Given that
string a = "abcd";
dstring b = "abcd"d;
a ~= b;
causes a compile time error, and many other operations do not allow implicit string -> dstring casting, it should be picked up as a type error.
Comment #1 by yebblies — 2013-11-26T19:56:27Z
I've hit this before, can't find the report. The compiler casts the rhs to the type of the lhs, then tried to append. Obviously this doesn't work.
Comment #2 by smjg — 2013-11-27T10:31:41Z
This sounds like a case of a more general bug, whereby x ~= y silently casts y to the type of x before appending. Given that array casts are a matter of reinterpreting the sequence of bytes, this doesn't make sense. It should either error at compiletime or convert the string from UTF-8 to UTF-32.
(In reply to yebblies from comment #3)
> I'm not so sure about this being accepts-invalid. D allows implicit string
> decoding/encoding in the single-char case, and it seems reasonable to
> support it here.
In contrast to the one character transcoding on foreach iteration, the implicit transcoding cost on appending will be bigger when appended string is very long, and it would be sometimes difficult to find it.
I think accepting it is not reasonable.
Comment #6 by yebblies — 2014-07-30T15:14:04Z
(In reply to Kenji Hara from comment #5)
>
> In contrast to the one character transcoding on foreach iteration, the
> implicit transcoding cost on appending will be bigger when appended string
> is very long, and it would be sometimes difficult to find it.
>
> I think accepting it is not reasonable.
Appending an array of values to an array will take longer than appending a single value. Even with implicit re-encoding it will still be O(N), same as other arrays.
Appending an array of structs with postblits could easily cause the same sort of performance problems.
Comment #7 by code — 2014-07-30T16:02:12Z
It's only a performance problem if it was easy to do inadvertently.
Where would this be the case?
When done deliberately the cost implications are fairly obvious.
Comment #8 by hsteoh — 2014-08-06T01:07:27Z
I see two solutions to this bug: (1) transcode the string to be appended (or, more generally, convert each value to be appended into the target type), which is costly (but still only O(n)); (2) reject this kind of array appending at compile-time.
I'm leaning towards (1) as being more user-friendly, going along D's motto of "correct by default, efficient if you ask for it". (2) is a bit too restrictive IMO, given the amount of implicit castings that we already have.
One possible compromise is to have (2) suggest std.array.appender for concatenating strings of different element widths.
Comment #9 by dlang-bugzilla — 2014-08-06T13:56:21Z
Another option would be to fix but deprecate the functionality.
> going along D's motto of "correct by default, efficient if you ask for it"
I don't think that means that hidden costs are OK.
Comment #10 by yebblies — 2014-08-06T14:41:02Z
(In reply to Vladimir Panteleev from comment #9)
> Another option would be to fix but deprecate the functionality.
>
> > going along D's motto of "correct by default, efficient if you ask for it"
>
> I don't think that means that hidden costs are OK.
O(n) append is not a hidden cost - it's the standard. If we're ok with the single-char case, we should be fine with the string case too.
Comment #11 by hsteoh — 2014-08-12T04:59:57Z
Looks like we're at an impasse. I think the best thing we can do right now is to make appending strings of different widths illegal for now. That still leaves open the option that in the future, we could support auto-conversion. But in any case, the current situation should not continue, since it introduces memory corruption.
Comment #12 by dlang-bugzilla — 2014-08-12T05:41:03Z
Even though I don't think all O(n)s are equal, I don't disagree with yebblies to the point where I'd object to accepting the code silently.
Comment #13 by bugzilla — 2014-08-15T18:35:29Z
I agree with Kenji. Make it an error for the corruption case, and we'll see how that goes. Better than enabling a feature that we may regret later.