Comment #0 by jrdemail2000-dlang — 2017-12-22T20:27:55Z
My standard performance benchmarks show a regex regression moving from DMD 2.077.1 to DMD 2.078.0-beta1. The benchmark is the one described here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md#regular-expression-filter-benchmark. No other performance regressions were seen, leading me to believe it is likely specific to regex, rather than some other element of the program.
This program takes a user supplied regex as a command line argument. It creates a Regex!char instance from the command line arg. Then it reads a file line-by-line, searching each line for presence of the regex using matchFirst. It's a fielded-search, so the matchFirst call is limited to a portion of the line.
The benchmark times (MacBook Pro, OS X, 16GB ram, SSD drives):
- DMD 2.077.1: 11.40 seconds
- DMD 2.078.0-beta1: 15.10 seconds
Approximately a 30% increase. I ran it about 10 times, with very consistent results.
If it would be helpful, I could produce a small standalone program to validate the behavior.
Comment #1 by jrdemail2000-dlang — 2017-12-22T20:34:21Z
Regex in the test: '[RD].*(ION[0-2])'
Compilation flags: dmd -release -O -boundscheck=off -inline
Comment #2 by jrdemail2000-dlang — 2017-12-23T20:16:25Z
Created attachment 1669
find_regex.d: Reduced program, it counts regex matches in input lines
Comment #3 by jrdemail2000-dlang — 2017-12-23T20:18:32Z
Created attachment 1670
gen_strings.d: Generates input for find_regex.d
Comment #4 by jrdemail2000-dlang — 2017-12-23T20:51:10Z
The two programs attached can be used to compare regex match performance. Compile find_regex.d with both DMD 2.077.1 and DMD 2.078.0-beta1. eg.
$ dmd2.078.0-beta1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.078.0-beta1
$dmd2.077.1 -release -O -inline -boundscheck=off find_regex.d -of=find_regex.dmd2.077.1
Build gen_strings also. Then run like:
$ ./gen_strings 10000000 | ./find_regex.dmd2.077.1 'abc'
106438 matches; 3780.769 milliseconds
$ ./gen_strings 10000000 | ./find_regex.dmd2.078.0-beta1 'abc'
106438 matches; 6315.638 milliseconds
Times on my MacBook Pro for a couple of regexes (replace 'abc' in the commands above):
| Regex | 2.077.1 | 2.078.0-beta1 |
| 'abc' | 3780.769 ms | 6315.638 ms |
| '(aa)+(cc)+g' | 5306.169 ms | 7753.435 ms |
| '(aa|ax).+[gxb][cyw]' | 9888.508 ms | 12324.285 ms |
Comment #8 by jrdemail2000-dlang — 2018-03-04T01:03:52Z
I re-ran the benchmark with the new DMD 2.079.0 release (OS X). Numbers below are fastest of several runs, but were generally consistent:
| Regex | 2.077.1 | 2.078.0-beta1 | 2.079.0 |
|-----------------------+--------------+---------------+--------------|
| 'abc' | 3819.314 ms | 6202.892 ms | 5077.937 ms |
| '(aa)+(cc)+g' | 5457.678 ms | 8209.269 ms | 6672.057 ms |
| '(aa|ax).+[gxb][cyw]' | 10121.181 ms | 12448.443 ms | 11254.978 ms |
These results are a material improvement over 2.078.0-beta1, but still a fair bit off 2.077.1, by 10-25% depending on the test.
Perhaps no longer a "performance regression", but there's still an opportunity for improvement. Would it be more appropriate to leave this open as an enhancement request?
Comment #9 by jrdemail2000-dlang — 2018-05-14T14:31:23Z
The final performance fix was included in LDC 1.10.0-beta1. For this release the standard benchmark I used for the TSV Utilities improved as follows:
LDC 1.7.0 (before regression): 8.37 seconds
LDC 1.8.0 (after regression): 10.01 seconds
LDC 1.9.0 (first fixes): 9.44 seconds
LDC 1.10.0-beta1 (second fix): 5.85 seconds
First fixes: Phobos PR 5981, DMD PR 7599
Second fix: Phobos PR 6268
The benchmark test used reads a TSV file line-by-line and checks individual fields for regex matches. A significant amount of processing time is IO, so the percentage gain on the regex portion is higher than the overall gain. The overall gain from LDC 1.7.0 is 30%.
Test was run on MacOS, MacMini with 16GB RAM, SSD drives. The file used was 2.7GB, 14 million lines. Test info can be found here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/ComparativeBenchmarks2018.md
Great result!