Bug 1347 – invalid UTF-8 strings cause access violations and inconsistent behavior in std.regexp
Status
RESOLVED
Resolution
WONTFIX
Severity
minor
Priority
P3
Component
phobos
Product
D
Version
D1 (retired)
Platform
x86
OS
Windows
Creation time
2007-07-18T16:54:00Z
Last change time
2014-02-16T15:22:06Z
Assigned to
bugzilla
Creator
dlang-bugzilla
Comments
Comment #0 by dlang-bugzilla — 2007-07-18T16:54:47Z
import std.regexp;
void main()
{
ubyte[] data = [0xFF];
RegExp re = new RegExp(`.*`);
re.test(cast(char[])data);
}
---
Caused me some headache when I to process some non-Unicode files and forgot to convert the data.
Comment #1 by bugzilla — 2007-09-03T15:07:39Z
std.regexp is designed to work only with valid UTF strings. To validate UTF strings, which should be done for input coming from an untrusted source, use the function std.utf.validate().