Bug 1347 – invalid UTF-8 strings cause access violations and inconsistent behavior in std.regexp

Status
RESOLVED
Resolution
WONTFIX
Severity
minor
Priority
P3
Component
phobos
Product
D
Version
D1 (retired)
Platform
x86
OS
Windows
Creation time
2007-07-18T16:54:00Z
Last change time
2014-02-16T15:22:06Z
Assigned to
bugzilla
Creator
dlang-bugzilla

Comments

Comment #0 by dlang-bugzilla — 2007-07-18T16:54:47Z
import std.regexp; void main() { ubyte[] data = [0xFF]; RegExp re = new RegExp(`.*`); re.test(cast(char[])data); } --- Caused me some headache when I to process some non-Unicode files and forgot to convert the data.
Comment #1 by bugzilla — 2007-09-03T15:07:39Z
std.regexp is designed to work only with valid UTF strings. To validate UTF strings, which should be done for input coming from an untrusted source, use the function std.utf.validate().