Bug 18462 – std.regex.matchFirst doesn't work well with characters from extended ASCII

Status
RESOLVED
Resolution
INVALID
Severity
enhancement
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2018-02-19T01:38:27Z
Last change time
2018-02-19T02:17:09Z
Assigned to
No Owner
Creator
Seb

Comments

Comment #0 by greensunny12 — 2018-02-19T01:38:27Z
--- void main(string[] args) { import std.string, std.stdio, std.regex; static ctr = regex(`^`); // unicode works string line = "ยต"; line.representation.writeln; // [194, 181] // but not extended ASCII line = "\xB5"; // [181] line.writeln; // works auto m = line.matchFirst(ctr); } --- The error message is: ``` std.utf.UTFException@/usr/include/dlang/dmd/std/utf.d(1380): Invalid UTF-8 sequence (at index 1) ---------------- ??:? pure dchar std.utf.decodeImpl!(true, 0, const(char)[]).decodeImpl(ref const(char)[], ref ulong) [0x8884beda] ??:? pure @trusted dchar std.utf.decode!(0, const(char)[]).decode(ref const(char)[], ref ulong) [0x8884be5d] ??:? pure @safe bool std.regex.internal.ir.Input!(char).Input.nextChar(ref dchar, ref ulong) [0x8885e318] ```
Comment #1 by greensunny12 — 2018-02-19T02:17:09Z
The problem is that extended ASCII is really invalid UTF. Sorry for the noise.