Bug 13268 – Implement longest match mode in std.regex

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Linux
Creation time
2014-08-07T18:15:14Z
Last change time
2024-12-01T16:22:09Z
Assigned to
Dmitry Olshansky
Creator
hsteoh
Moved to GitHub: phobos#9638 →

Comments

Comment #0 by hsteoh — 2014-08-07T18:15:14Z
Currently, the | operator works on a first-match basis, such that a pattern like (ab)|(abcd) will never match the second alternative because (ab) is always matched first. It would be nice if there was a way to do greedy matching between alternations, such that an alternation a|b|c|... will always prefer the longest match. Probably this will have performance implications, so perhaps a "greedy alternation" operator distinct from | should be used. Maybe something like |* might be a possible syntax: (ab)|*(abcd) will capture (abcd) if the input contains "abcd", but fallback to (ab) only if the input doesn't contain "abcd" but does contain "ab". Precedents for greedy alternation include lex / flex, which take a list of input regexen and always performs longest-match on them. In essence, given a list of patterns P1, P2, ..., the equivalent of P1 |* P2 |* ... is performed.
Comment #1 by dmitry.olsh — 2016-04-06T10:12:12Z
This is possible to achieve but rather on a whole-regex level then on each alternation level.
Comment #2 by robert.schadek — 2024-12-01T16:22:09Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9638 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB