Bug 17161 – [REG 2.072.2] Massive Regex Slowdown

Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2017-02-09T00:36:00Z
Last change time
2017-02-24T18:15:50Z
Assigned to
nobody
Creator
jack

Attachments

IDFilenameSummaryContent-TypeSize
1636slow.dThe sample codetext/plain1261

Comments

Comment #0 by jack — 2017-02-09T00:36:56Z
Created attachment 1636 The sample code 8x slower. Not noticeable on smaller files. The input file to reproduce is too large to post here. You can generate it by running the python code here: https://benchmarksgame.alioth.debian.org/u64q/program.php?test=fasta&lang=python3&id=3 $ python3 fasta.py 5000000 > input5000000.txt # 2.072.2 $ /Users/Jack/digger/result/bin/dmd -O -inline -release -boundscheck=off slow.d $ cat input5000000.txt | time ./slow ./slow 2.19s user 0.09s system 97% cpu 2.330 total # 2.073.0 $ dmd -O -inline -release -boundscheck=off slow.d $ cat input5000000.txt | time ./slow ./slow 18.23s user 0.16s system 98% cpu 18.616 total
Comment #1 by jack — 2017-02-09T15:43:15Z
Comment #2 by jack — 2017-02-09T20:32:26Z
Bad news: I see a similar performance decrease for run-time regex as well. # 2.073.0 $ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2 ./test2 4.44s user 0.09s system 98% cpu 4.591 total # 2.072.2 ~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2 ./test2 3.20s user 0.09s system 98% cpu 3.344 total I consistently get around a second and a half longer run time with 2.073. Code import std.algorithm; import std.array; import std.range; import std.regex; import std.stdio; import std.typecons; import std.utf; static variants = [ "agggtaaa|tttaccct", "[cgt]gggtaaa|tttaccc[acg]", "a[act]ggtaaa|tttacc[agt]t", "ag[act]gtaaa|tttac[agt]ct", "agg[act]taaa|ttta[agt]cct", "aggg[acg]aaa|ttt[cgt]ccct", "agggt[cgt]aa|tt[acg]accct", "agggta[cgt]a|t[acg]taccct", "agggtaa[cgt]|[acg]ttaccct", ]; void main() { auto app = appender!string; app.reserve(5_000_000); app.put(stdin .byLineCopy(KeepTerminator.yes) .joiner .byChar); auto seq = app.data; auto regexLineFeeds = regex(">.*\n|\n"); seq = seq.replaceAll(regexLineFeeds, ""); foreach (pattern; variants) { writeln(pattern, " ", seq.matchAll(pattern).walkLength); } }
Comment #3 by dmitry.olsh — 2017-02-09T22:39:30Z
(In reply to Jack Stouffer from comment #2) > Bad news: I see a similar performance decrease for run-time regex as well. > > # 2.073.0 > $ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2 > ./test2 4.44s user 0.09s system 98% cpu 4.591 total > > # 2.072.2 > ~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt > | time ./test2 > ./test2 3.20s user 0.09s system 98% cpu 3.344 total > > I consistently get around a second and a half longer run time with 2.073. > This is interesting find, thanks for sharing! Will investigate the R-T issue, C-T is (sadly) to be expected.
Comment #4 by dmitry.olsh — 2017-02-09T23:03:35Z
(In reply to Dmitry Olshansky from comment #3) > (In reply to Jack Stouffer from comment #2) > > Bad news: I see a similar performance decrease for run-time regex as well. > > > > # 2.073.0 > > $ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2 > > ./test2 4.44s user 0.09s system 98% cpu 4.591 total > > > > # 2.072.2 > > ~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt > > | time ./test2 > > ./test2 3.20s user 0.09s system 98% cpu 3.344 total > > > > I consistently get around a second and a half longer run time with 2.073. > > > > This is interesting find, thanks for sharing! > > Will investigate the R-T issue, C-T is (sadly) to be expected. Mystery solved - in R-T version regex is parsed at C-T (because of static) therefore the disabling of Kickstart affect it too.
Comment #5 by jack — 2017-02-09T23:28:06Z
(In reply to Dmitry Olshansky from comment #3) > Will investigate the R-T issue, C-T is (sadly) to be expected. Is there anyway to revert the CT regex to 2.072 behavior? It would be great if a performance regression of this size on one of the selling points of D could be fixed immediately.
Comment #6 by github-bugzilla — 2017-02-12T18:07:33Z
Commits pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/5a2491a847beb035b37ee2a270029499065b1919 Fix Issue 17161 - Revert all changes to std.regex from 2.072.2 onwards https://github.com/dlang/phobos/commit/c4f4cfeda6ba60e2df6eef05bc1f8946982e9a99 Merge pull request #5113 from JackStouffer/revert-regex Issue 17161 - [REG 2.072.2] Massive Regex Slowdown
Comment #7 by github-bugzilla — 2017-02-16T17:26:01Z
Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/5a2491a847beb035b37ee2a270029499065b1919 Fix Issue 17161 - Revert all changes to std.regex from 2.072.2 onwards https://github.com/dlang/phobos/commit/c4f4cfeda6ba60e2df6eef05bc1f8946982e9a99 Merge pull request #5113 from JackStouffer/revert-regex
Comment #8 by github-bugzilla — 2017-02-24T18:15:50Z
Commits pushed to newCTFE at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/5a2491a847beb035b37ee2a270029499065b1919 Fix Issue 17161 - Revert all changes to std.regex from 2.072.2 onwards https://github.com/dlang/phobos/commit/c4f4cfeda6ba60e2df6eef05bc1f8946982e9a99 Merge pull request #5113 from JackStouffer/revert-regex