Bug 9131 – Invalid UTF-8 when using std.algorithm.equal with dstring and string

Status
RESOLVED
Resolution
WONTFIX
Severity
normal
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Mac OS X
Creation time
2012-12-09T21:57:00Z
Last change time
2014-08-17T06:20:21Z
Assigned to
nobody
Creator
hsteoh

Comments

Comment #0 by hsteoh — 2012-12-09T21:57:11Z
This is found by the autotester with https://github.com/D-Programming-Language/phobos/pull/987. As I have no access to OSX/64, I have no way to reduce the failing case: auto joiner(RoR)(RoR r) if (isInputRange!RoR && isInputRange!(ElementType!RoR)) { static struct Result { private: RoR _items; ElementType!RoR _current; void prepare() { // Skip over empty subranges. if (_items.empty) return; while (_items.front.empty) { _items.popFront(); if (_items.empty) return; } _current = _items.front; } public: this(RoR r) { _items = r; prepare(); } static if (isInfinite!RoR) { enum bool empty = false; } else { @property auto empty() { return _items.empty; } } @property auto ref front() { assert(!empty); return _current.front; } void popFront() { assert(!_current.empty); _current.popFront(); if (_current.empty) { assert(!_items.empty); _items.popFront(); prepare(); } } static if (isForwardRange!RoR && isForwardRange!(ElementType!RoR)) { @property auto save() { Result copy; copy._items = _items.save; copy._current = _current.save; return copy; } } } return Result(r); } unittest { struct TransientRange { dchar[128] _buf; dstring[] _values; this(dstring[] values) { _values = values; } @property bool empty() { return _values.length == 0; } @property auto front() { foreach (i; 0 .. _values.front.length) { _buf[i] = _values[0][i]; } return _buf[0 .. _values.front.length]; } void popFront() { _values = _values[1 .. $]; } } auto rr = TransientRange(["abc"d, "12"d, "def"d, "34"d]); // Can't use array() or equal() directly because they fail with transient // .front. dchar[] result; foreach (c; rr.joiner()) { result ~= c; } assert(equal(result, "abc12def34"), "Unexpected result: '%s'".format(result)); }
Comment #1 by hsteoh — 2012-12-09T21:58:31Z
P.S. The autotester error is: Invalid UTF sequence: 8ac37b8 - Encoding an invalid code point in UTF-8
Comment #2 by github-bugzilla — 2012-12-17T09:08:56Z
Commit pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/3a2377ccd3318ad42999c2657589c0bbd21c58ff Replace joiner unittest with one that doesn't suffer from issue 9131.
Comment #3 by hsteoh — 2012-12-17T12:54:46Z
Note that the commit does NOT fix this issue, it's for the pull request that was affected by this bug.
Comment #4 by hsteoh — 2012-12-20T14:48:48Z
Found the cause of this bug. It's a stack overflow caused by the transient range struct using a large static array as buffer: it just takes a few copies of this struct (since structs are passed by value) to overflow the stack, upon which memory corruption starts happening. The solution is to make TransientRange._buf a dynamic array whose length is initialized in this(), instead of a static array.
Comment #5 by verylonglogin.reg — 2014-08-17T06:20:21Z
(In reply to hsteoh from comment #4) > Found the cause of this bug. It's a stack overflow caused by the transient > range struct using a large static array as buffer: it just takes a few > copies of this struct (since structs are passed by value) to overflow the > stack, upon which memory corruption starts happening. Why memory corruption instead of "Stack overflow" message? Looks like a serous issue.