I want to test for the pattern "two consecutive characters, arbitrary sequence, the same two consecutive characters". My pattern is r"(..).*\1". It works with bmatch but not with match/matchAll. A complete example follows.
import std.regex, std.stdio;
void main ()
{
writeln (bmatch ("abab", r"(..).*\1")); // [["abab", "ab"]]
writeln (match ("abab", r"(..).*\1")); // [["abab", "ab"]]
writeln (matchAll ("abab", r"(..).*\1")); // [["abab", "ab"]]
writeln (bmatch ("xabab", r"(..).*\1")); // [["abab", "ab"]]
writeln (match ("xabab", r"(..).*\1")); // []
writeln (matchAll ("xabab", r"(..).*\1")); // []
}
As you can see, bmatch (usage discouraged in the docs) gives me the result I want, but match (also discouraged) and matchAll (way to go) don't.
Original thread in D.learn: http://forum.dlang.org/thread/[email protected]. No one replied so far, so I assume this is a bug.
Comment #1 by schuetzm — 2016-01-01T12:39:15Z
Digger blames this PR:
dmd: Merge pull request #2768 from dawgfoto/fix11406
https://github.com/D-Programming-Language/dmd/pull/2768
fix Issue 11406 - ld.gold breaks switch table jumps
Does the regex implementation use a jump table? Looks plausible.
Comment #2 by schuetzm — 2016-01-01T12:42:09Z
bisect.ini:
bad = master
good = 2.064
tester = cd /tmp && dmd xx.d && ! ./xx | grep '\[]'
@ag0aep6g: If digger is to be believed, dmd is actually the right component.
Comment #3 by gassa — 2016-01-01T16:02:32Z
(In reply to Marc Schütz from comment #1)
> Digger blames this PR:
>
> dmd: Merge pull request #2768 from dawgfoto/fix11406
>
> https://github.com/D-Programming-Language/dmd/pull/2768
>
> fix Issue 11406 - ld.gold breaks switch table jumps
>
> Does the regex implementation use a jump table? Looks plausible.
Strange. I tested with 2.069.2 but also with 2.063 and now with 2.063.2 (Jun 18, 2013 by this post: http://forum.dlang.org/post/[email protected]). There is no matchAll there yet, so I had to comment two lines, but with 2.063 and 2.063.2, match and bmatch still differ in output. And the pull is merged on Nov 17, 2013 (later). So perhaps the bug disappeared and reappeared multiple times.
Comment #4 by dmitry.olsh — 2016-04-06T12:53:49Z
Ivan Kazmenko (In reply to Ivan Kazmenko from comment #0)
> I want to test for the pattern "two consecutive characters, arbitrary
> sequence, the same two consecutive characters". My pattern is r"(..).*\1".
> It works with bmatch but not with match/matchAll. A complete example
> follows.
>
> import std.regex, std.stdio;
> void main ()
> {
> writeln (bmatch ("abab", r"(..).*\1")); // [["abab", "ab"]]
> writeln (match ("abab", r"(..).*\1")); // [["abab", "ab"]]
> writeln (matchAll ("abab", r"(..).*\1")); // [["abab", "ab"]]
> writeln (bmatch ("xabab", r"(..).*\1")); // [["abab", "ab"]]
> writeln (match ("xabab", r"(..).*\1")); // []
> writeln (matchAll ("xabab", r"(..).*\1")); // []
> }
>
> As you can see, bmatch (usage discouraged in the docs) gives me the result I
> want, but match (also discouraged) and matchAll (way to go) don't.
>
Well this boils down to :
a) Use what works for now.
b) You might have found a case where Thompson engine simply cannot produce the right result.
Finally I should detect these patterns and switch between simple backtracking and thompson engine.
Comment #5 by dmitry.olsh — 2017-08-25T10:10:55Z
*** Issue 17520 has been marked as a duplicate of this issue. ***
Comment #6 by dmitry.olsh — 2017-08-25T10:13:16Z
*** Issue 16251 has been marked as a duplicate of this issue. ***
Comment #7 by mipri — 2019-11-24T11:16:53Z
> a) Use what works for now.
Is this an option anymore? bmatch() no longer matches this result, and appears to have been deprecated in late 2017 as the source is now identical to that of match()
Comment #8 by bugzilla — 2019-12-01T15:58:02Z
Seems to work meanwhile...
Comment #9 by oddp — 2020-06-19T10:41:34Z
This backreference issue isn't fixed entirely:
void main () {
import std.regex, std.stdio;
auto r = r"(..).*\1";
foreach (s; ["abab", "abcab", "xababx", "xabcabx"])
writefln("%s %s %s %s", s, s.matchFirst(r), s.match(r), s.bmatch(r));
// abab ["abab", "ab"] [["abab", "ab"]] [["abab", "ab"]]
// abcab ["abcab", "ab"] [["abcab", "ab"]] [["abcab", "ab"]]
// xababx ["abab", "ab"] [["abab", "ab"]] [["abab", "ab"]]
// xabcabx [] [] []
}
Here are a handful ids that std.regex fails to match:
vrnyrnbfcx
jrqrkgdrqm
meflbefata
svdyayyedy
cwdpoivoie
Another hundred on the playground: https://run.dlang.io/is/O8slu9
// 100 out of 100 failed.
Comment #10 by robert.schadek — 2024-12-01T16:25:35Z