Bug 12923 – UTF exception in stride even though passes validate.

Status
RESOLVED
Resolution
FIXED
Severity
blocker
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-06-14T20:59:00Z
Last change time
2014-08-21T18:22:11Z
Assigned to
dmitry.olsh
Creator
timothee.cour2

Comments

Comment #0 by timothee.cour2 — 2014-06-14T20:59:17Z
import std.utf; void main(){ char[3]a=[167, 133, 175]; validate(a); //passes auto k=stride(a,0); /+ std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0) pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141 +/ } This happens even after applying the fix https://github.com/D-Programming-Language/phobos/pull/2038
Comment #1 by timothee.cour2 — 2014-06-14T21:06:18Z
(In reply to Timothee Cour from comment #0) > import std.utf; > void main(){ > char[3]a=[167, 133, 175]; > validate(a); > //passes > > auto k=stride(a,0); > /+ > std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0) > pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141 > +/ > } > > This happens even after applying the fix > https://github.com/D-Programming-Language/phobos/pull/2038 Additionally, another error is thrown on any of those: foreach (i, dchar c; a){} //src/rt/util/utf.d:290 Invalid UTF-8 sequence foreach_reverse (i, dchar c; a){} //src/rt/aApplyR.d:511 Invalid UTF-8 sequence so perhaps std.utf.validate accepts some invalid UTF sequences
Comment #2 by timothee.cour2 — 2014-06-14T21:57:02Z
(In reply to Timothee Cour from comment #1) > (In reply to Timothee Cour from comment #0) > > import std.utf; > > void main(){ > > char[3]a=[167, 133, 175]; > > validate(a); > > //passes > > > > auto k=stride(a,0); > > /+ > > std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0) > > pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141 > > +/ > > } > > > > This happens even after applying the fix > > https://github.com/D-Programming-Language/phobos/pull/2038 > > Additionally, another error is thrown on any of those: > foreach (i, dchar c; a){} //src/rt/util/utf.d:290 Invalid UTF-8 sequence > foreach_reverse (i, dchar c; a){} //src/rt/aApplyR.d:511 Invalid UTF-8 > sequence > > so perhaps std.utf.validate accepts some invalid UTF sequences Here's one possible fix: in decodeImpl: ---- UTFException invalidUTF(){...} //insert this import core.bitop; immutable msbs = 7 - bsr(~fst); if (msbs < 2 || msbs > 6) throw invalidUTF(); UTFException outOfBounds() {...} ---- To have same behavior as inside strideImpl. But is that correct, or was the behavior in strideImpl wrong itself?
Comment #3 by dmitry.olsh — 2014-07-27T13:30:55Z
(In reply to Timothee Cour from comment #2) > (In reply to Timothee Cour from comment #1) > > (In reply to Timothee Cour from comment #0) > > > import std.utf; > > > void main(){ > > > char[3]a=[167, 133, 175]; > > > validate(a); > > > //passes > > > > > > auto k=stride(a,0); > > > /+ > > > std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0) > > > pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141 > > > +/ > > > } > > > > > > This happens even after applying the fix > > > https://github.com/D-Programming-Language/phobos/pull/2038 > > > > Additionally, another error is thrown on any of those: > > foreach (i, dchar c; a){} //src/rt/util/utf.d:290 Invalid UTF-8 sequence > > foreach_reverse (i, dchar c; a){} //src/rt/aApplyR.d:511 Invalid UTF-8 > > sequence > > > > so perhaps std.utf.validate accepts some invalid UTF sequences > > > Here's one possible fix: > > in decodeImpl: > ---- > UTFException invalidUTF(){...} > > //insert this > import core.bitop; > immutable msbs = 7 - bsr(~fst); > if (msbs < 2 || msbs > 6) throw invalidUTF(); > > UTFException outOfBounds() {...} > ---- > > To have same behavior as inside strideImpl. > But is that correct, or was the behavior in strideImpl wrong itself?
Comment #4 by github-bugzilla — 2014-07-27T20:43:17Z
Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/9afbbaf056641edf139e2f94a7d0c4a8f86bb5b3 Fix issue 12923 UTF exception in stride even though passes validate. The root cause is that decode has very lax checking of the first code unit. https://github.com/D-Programming-Language/phobos/commit/cdd26e309d9b8ade1082330c8b06868523ec1a90 Merge pull request #2376 from DmitryOlshansky/issue-12923 Fix issue 12923
Comment #5 by github-bugzilla — 2014-07-31T05:22:32Z
Commit pushed to 2.066 at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/888897030c1587c36adbd19542a1e431f965e480 Merge pull request #2376 from DmitryOlshansky/issue-12923 Fix issue 12923
Comment #6 by github-bugzilla — 2014-08-21T18:22:11Z