Bug 12923 – UTF exception in stride even though passes validate.
Status
RESOLVED
Resolution
FIXED
Severity
blocker
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-06-14T20:59:00Z
Last change time
2014-08-21T18:22:11Z
Assigned to
dmitry.olsh
Creator
timothee.cour2
Comments
Comment #0 by timothee.cour2 — 2014-06-14T20:59:17Z
import std.utf;
void main(){
char[3]a=[167, 133, 175];
validate(a);
//passes
auto k=stride(a,0);
/+
std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0)
pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141
+/
}
This happens even after applying the fix https://github.com/D-Programming-Language/phobos/pull/2038
Comment #1 by timothee.cour2 — 2014-06-14T21:06:18Z
(In reply to Timothee Cour from comment #0)
> import std.utf;
> void main(){
> char[3]a=[167, 133, 175];
> validate(a);
> //passes
>
> auto k=stride(a,0);
> /+
> std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0)
> pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141
> +/
> }
>
> This happens even after applying the fix
> https://github.com/D-Programming-Language/phobos/pull/2038
Additionally, another error is thrown on any of those:
foreach (i, dchar c; a){} //src/rt/util/utf.d:290 Invalid UTF-8 sequence
foreach_reverse (i, dchar c; a){} //src/rt/aApplyR.d:511 Invalid UTF-8 sequence
so perhaps std.utf.validate accepts some invalid UTF sequences
Comment #2 by timothee.cour2 — 2014-06-14T21:57:02Z
(In reply to Timothee Cour from comment #1)
> (In reply to Timothee Cour from comment #0)
> > import std.utf;
> > void main(){
> > char[3]a=[167, 133, 175];
> > validate(a);
> > //passes
> >
> > auto k=stride(a,0);
> > /+
> > std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0)
> > pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141
> > +/
> > }
> >
> > This happens even after applying the fix
> > https://github.com/D-Programming-Language/phobos/pull/2038
>
> Additionally, another error is thrown on any of those:
> foreach (i, dchar c; a){} //src/rt/util/utf.d:290 Invalid UTF-8 sequence
> foreach_reverse (i, dchar c; a){} //src/rt/aApplyR.d:511 Invalid UTF-8
> sequence
>
> so perhaps std.utf.validate accepts some invalid UTF sequences
Here's one possible fix:
in decodeImpl:
----
UTFException invalidUTF(){...}
//insert this
import core.bitop;
immutable msbs = 7 - bsr(~fst);
if (msbs < 2 || msbs > 6) throw invalidUTF();
UTFException outOfBounds() {...}
----
To have same behavior as inside strideImpl.
But is that correct, or was the behavior in strideImpl wrong itself?
Comment #3 by dmitry.olsh — 2014-07-27T13:30:55Z
(In reply to Timothee Cour from comment #2)
> (In reply to Timothee Cour from comment #1)
> > (In reply to Timothee Cour from comment #0)
> > > import std.utf;
> > > void main(){
> > > char[3]a=[167, 133, 175];
> > > validate(a);
> > > //passes
> > >
> > > auto k=stride(a,0);
> > > /+
> > > std.utf.UTFException@std/utf.d(199): Invalid UTF-8 sequence (at index 0)
> > > pure @safe uint std.utf.stride!(char[3]).stride(ref char[3], ulong) + 141
> > > +/
> > > }
> > >
> > > This happens even after applying the fix
> > > https://github.com/D-Programming-Language/phobos/pull/2038
> >
> > Additionally, another error is thrown on any of those:
> > foreach (i, dchar c; a){} //src/rt/util/utf.d:290 Invalid UTF-8 sequence
> > foreach_reverse (i, dchar c; a){} //src/rt/aApplyR.d:511 Invalid UTF-8
> > sequence
> >
> > so perhaps std.utf.validate accepts some invalid UTF sequences
>
>
> Here's one possible fix:
>
> in decodeImpl:
> ----
> UTFException invalidUTF(){...}
>
> //insert this
> import core.bitop;
> immutable msbs = 7 - bsr(~fst);
> if (msbs < 2 || msbs > 6) throw invalidUTF();
>
> UTFException outOfBounds() {...}
> ----
>
> To have same behavior as inside strideImpl.
> But is that correct, or was the behavior in strideImpl wrong itself?
Comment #4 by github-bugzilla — 2014-07-27T20:43:17Z