Bug 17820 – std.regex performance: .matchFirst allocates frequently; causes thread contention

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
x86_64
OS
Windows
Creation time
2017-09-09T22:16:19Z
Last change time
2024-12-01T16:30:50Z
Assigned to
No Owner
Creator
Chad Joan
Moved to GitHub: phobos#9720 →

Attachments

IDFilenameSummaryContent-TypeSize
1657main.dRegex .matchFirst benchmarktext/x-dsrc26499

Comments

Comment #0 by chadjoan — 2017-09-09T22:16:19Z
Created attachment 1657 Regex .matchFirst benchmark When a sufficiently complicated regular expression is used to analyze a large number of small strings, the memory allocation done by .matchFirst (and possibly other similar regex functions) causes punishing slowdown and thread contention. This happens even though the same Regex object (per thread) is reused for .matchFirst(). I have attached a test program which is also available on pastebin: https://pastebin.com/Am7BgYkL I strongly suspect this is caused by the call to malloc in .matchOnce in std.regex.package, as well as the call to calloc in Capture.newMatches also in std.regex.package. I was able to work around it by modifying Phobos and statically allocating the memory used by .matchOnce, which provided something like a 100x speedup on that project (with serial processing, nevermind that it made threading possible too). I was never going to call .matchFirst (or anything that generates captures) while still iterating over previous captures, so the workaround was fine for that project, but not acceptable for general use of course. Given that this slowdown was related to malloc, the exact ramifications and severity will change significantly depending on the host system's malloc implementation. It was really bad on Windows. Linux seems to fare much better. I suspect that Windows' malloc does a thread-global lock during allocation and then takes its sweet time, which results in some parallelized code actually running *slower* than the serial equivalent. And since this doesn't seem to happen to ALL regexes where the Captures (n > smallString) condition is met, I have to wonder if the allocation size, or some other yet undetermined factor, might change the behavior of Windows' malloc/calloc. This could make further testing difficult. Here is my run of the test program on a Windows 7 machine with dmd v2.076.0: -------------------------------------------------------- C:\dprojects\regex\benchmark>dmd main.d -release -O -m64 C:\dprojects\regex\benchmark>main Benchmarking non-regex parsing, parallel: processed 100000 elements in 0.061 seconds. Benchmarking regex parsing, no captures, parallel: processed 100000 elements in 202.893 seconds. Benchmarking regex parsing, many captures, parallel: processed 100000 elements in 977.339 seconds. Benchmarking non-regex parsing, serial: processed 100000 elements in 0.031 seconds. Benchmarking regex parsing, no captures, serial: processed 100000 elements in 202.395 seconds. Benchmarking regex parsing, many captures, serial: processed 100000 elements in 837.401 seconds. -------------------------------------------------------- Here it is again on a slightly slower (hardware-wise) Linux (Gentoo) machine with dmd v2.074.1: -------------------------------------------------------- $ dmd main.d -release -O -m64 $ ./main Benchmarking non-regex parsing, parallel: processed 100000 elements in 0.062 seconds. Benchmarking regex parsing, no captures, parallel: processed 100000 elements in 88.055 seconds. Benchmarking regex parsing, many captures, parallel: processed 100000 elements in 93.546 seconds. Benchmarking non-regex parsing, serial: processed 100000 elements in 0.043 seconds. Benchmarking regex parsing, no captures, serial: processed 100000 elements in 135.063 seconds. Benchmarking regex parsing, many captures, serial: processed 100000 elements in 135.250 seconds. -------------------------------------------------------- I noticed that my benchmark's non-regex numbers don't parallelize well. I'm not sure what's going on there. Maybe there are some hidden allocations happening? Either way, they only exist to provide a sense of scale. The more important numbers are the regex parsing benchmarks. When I used a different regular expression earlier, it *did* parallelize somewhat, so I believe this benchmark does represent what I experienced in production (in other words: to my knowledge the benchmark itself isn't causing any slowdowns, allocations, or thread contention). I have already begun working on a fix for this issue. I am not sure what to do with it now that Dmitry is kicking butt on regex bugs (https://forum.dlang.org/post/[email protected]), but I'm at least glad I was able to make this bug report and keep others informed. If Dmitry doesn't feel like taking this one on, then I suppose I will have to reintegrate my WIP changes later and hopefully benefit from the cleaner future code. I am worried that my WIP code uses std.experimental.allocator: I feel like freelists are a very elegant solution to this problem; I'm not sure if std.experimental is unused in the rest of Phobos by chance or by decree; and reimplementing a freelist just to avoid using the experimental modules seems like a bad idea to me. I'm not sure how others feel. You can find my WIP fix here: https://github.com/chadjoan/phobos/tree/regex_allocator_optimization
Comment #1 by chadjoan — 2017-09-09T22:19:52Z
The benchmark output seems like it might wrap poorly in bugzilla's page, so here is an edited version that should fit better: Windows 7 machine with dmd v2.076.0: -------------------------------------------------------- C:\dprojects\regex\benchmark>dmd main.d -release -O -m64 C:\dprojects\regex\benchmark>main non-regex parsing, parallel: 100000 elements in 0.061s. regex parsing, no captures, parallel: 100000 elements in 202.893 s. regex parsing, many captures, parallel: 100000 elements in 977.339 s. non-regex parsing, serial: 100000 elements in 0.031 s. regex parsing, no captures, serial: 100000 elements in 202.395 s. regex parsing, many captures, serial: 100000 elements in 837.401 s. -------------------------------------------------------- Slightly slower (hardware-wise) Linux (Gentoo) machine with dmd v2.074.1: -------------------------------------------------------- $ dmd main.d -release -O -m64 $ ./main non-regex parsing, parallel: 100000 elements in 0.062 s. regex parsing, no captures, parallel: 100000 elements in 88.055 s. regex parsing, many captures, parallel: 100000 elements in 93.546 s. non-regex parsing, serial: 100000 elements in 0.043 s. regex parsing, no captures, serial: 100000 elements in 135.063 s. regex parsing, many captures, serial: 100000 elements in 135.250 s. --------------------------------------------------------
Comment #2 by robert.schadek — 2024-12-01T16:30:50Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9720 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB