Bug 1750 – RegExp: lack of support for wchar, dchar; lack of lookingAt() method
Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2007-12-26T10:26:00Z
Last change time
2015-06-09T01:14:23Z
Assigned to
dmitry.olsh
Creator
aarti
Comments
Comment #0 by aarti — 2007-12-26T10:26:08Z
1. RegExp should work for at least wchar & dchar. Maybe also for integral array types (e.g. int[]).
2. There is no bool lookingAt() method which tries to match string at its beginning and if it doesn't match return. For reference: http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html
Currently it is very ineffective to match pattern in incoming stream of data.
Solution with lookingAt() will be much faster.
Comment #1 by andrei — 2010-09-26T11:37:42Z
The new RegEx supports wchar and dchar. Regarding lookingAt(), I'm unclear: how is it different from searching for a pattern starting with the anchor "^"?
Comment #2 by aarti — 2010-09-27T11:02:35Z
lookingAt() can be used on streams without a need for getting whole string from stream. Also ^ can not be used for matching some specific pattern in stream. You just can not assume that your input is starting after line end. Input can even not be splitted into lines.
Comment #3 by andrei — 2011-06-04T17:45:52Z
Reassigning to GSoC student Dmitry. Dmitry, please close when you think the issue has been addressed. Thanks!
Comment #4 by dmitry.olsh — 2012-03-12T01:45:41Z
Ok. Meant to do it for ages.
The second point rised in this bug report has no proof, and, in fact, is invalid.
Truth of the matter is that looking through all of Java's regex documentation I observe:
1. There is no such thing as regex on stream in Java, all objects it works on are 3 variants of character buffers i.e. wrapped arrays and it's ilk.
2. lookingAt is indeed equivalent to appending '^' to a regex pattern, and as far as performance concerns go both versions should use the same optimization, namely "no search" optimization. And at least current std.regex does optimize for '^' _somewhere_ at start e.g. sily things like "(^...)..." still get optimized.
3. Due to implementation details of Java-style regex there is no way it can to work directly on stream and keep all it's syntax features, even if tried to do so, the problem common to all backtracking engines. And yes, in some cases it has to walk the entire input to make sure it matched what it should match.
Marking as fixed as the first point of the report was solved long ago, the second isinvalid as is. It also rises a good point on however that was accounted for already.