Created attachment 1636
The sample code
8x slower.
Not noticeable on smaller files. The input file to reproduce is too large to post here. You can generate it by running the python code here: https://benchmarksgame.alioth.debian.org/u64q/program.php?test=fasta&lang=python3&id=3
$ python3 fasta.py 5000000 > input5000000.txt
# 2.072.2
$ /Users/Jack/digger/result/bin/dmd -O -inline -release -boundscheck=off slow.d
$ cat input5000000.txt | time ./slow
./slow 2.19s user 0.09s system 97% cpu 2.330 total
# 2.073.0
$ dmd -O -inline -release -boundscheck=off slow.d
$ cat input5000000.txt | time ./slow
./slow 18.23s user 0.16s system 98% cpu 18.616 total
Bad news: I see a similar performance decrease for run-time regex as well.
# 2.073.0
$ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
./test2 4.44s user 0.09s system 98% cpu 4.591 total
# 2.072.2
~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
./test2 3.20s user 0.09s system 98% cpu 3.344 total
I consistently get around a second and a half longer run time with 2.073.
Code
import std.algorithm;
import std.array;
import std.range;
import std.regex;
import std.stdio;
import std.typecons;
import std.utf;
static variants = [
"agggtaaa|tttaccct",
"[cgt]gggtaaa|tttaccc[acg]",
"a[act]ggtaaa|tttacc[agt]t",
"ag[act]gtaaa|tttac[agt]ct",
"agg[act]taaa|ttta[agt]cct",
"aggg[acg]aaa|ttt[cgt]ccct",
"agggt[cgt]aa|tt[acg]accct",
"agggta[cgt]a|t[acg]taccct",
"agggtaa[cgt]|[acg]ttaccct",
];
void main()
{
auto app = appender!string;
app.reserve(5_000_000);
app.put(stdin
.byLineCopy(KeepTerminator.yes)
.joiner
.byChar);
auto seq = app.data;
auto regexLineFeeds = regex(">.*\n|\n");
seq = seq.replaceAll(regexLineFeeds, "");
foreach (pattern; variants)
{
writeln(pattern, " ", seq.matchAll(pattern).walkLength);
}
}
Comment #3 by dmitry.olsh — 2017-02-09T22:39:30Z
(In reply to Jack Stouffer from comment #2)
> Bad news: I see a similar performance decrease for run-time regex as well.
>
> # 2.073.0
> $ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
> ./test2 4.44s user 0.09s system 98% cpu 4.591 total
>
> # 2.072.2
> ~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt
> | time ./test2
> ./test2 3.20s user 0.09s system 98% cpu 3.344 total
>
> I consistently get around a second and a half longer run time with 2.073.
>
This is interesting find, thanks for sharing!
Will investigate the R-T issue, C-T is (sadly) to be expected.
Comment #4 by dmitry.olsh — 2017-02-09T23:03:35Z
(In reply to Dmitry Olshansky from comment #3)
> (In reply to Jack Stouffer from comment #2)
> > Bad news: I see a similar performance decrease for run-time regex as well.
> >
> > # 2.073.0
> > $ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
> > ./test2 4.44s user 0.09s system 98% cpu 4.591 total
> >
> > # 2.072.2
> > ~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt
> > | time ./test2
> > ./test2 3.20s user 0.09s system 98% cpu 3.344 total
> >
> > I consistently get around a second and a half longer run time with 2.073.
> >
>
> This is interesting find, thanks for sharing!
>
> Will investigate the R-T issue, C-T is (sadly) to be expected.
Mystery solved - in R-T version regex is parsed at C-T (because of static) therefore the disabling of Kickstart affect it too.
Comment #5 by jack — 2017-02-09T23:28:06Z
(In reply to Dmitry Olshansky from comment #3)
> Will investigate the R-T issue, C-T is (sadly) to be expected.
Is there anyway to revert the CT regex to 2.072 behavior? It would be great if a performance regression of this size on one of the selling points of D could be fixed immediately.
Comment #6 by github-bugzilla — 2017-02-12T18:07:33Z