Bug 12844 – Absurd RAM Required for ctRegex

Status
NEW
Severity
major
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2014-06-02T20:57:04Z
Last change time
2024-12-01T16:21:22Z
Keywords
CTFE, performance
Assigned to
Dmitry Olshansky
Creator
Orvid King
Moved to GitHub: phobos#10059 →

Comments

Comment #0 by blah38621 — 2014-06-02T20:57:04Z
Currently both git head and 2.065 take an utterly absurd (I killed DMD's process at 9.5gb of ram usage, I only have 16gb of RAM so I can't test further than that) amount of RAM to compile the following code: module main; import std.file : readText, write; import std.math : round; import std.regex; enum planePattern = ctRegex!(`"plane" "\(([-]?[0-9]+(?:\.[0-9]+)? [-]?[0-9]+(?:\.[0-9]+)? [-]?[0-9]+(?:\.[0-9]+)?)\) \(([-]?[0-9]+(?:\.[0-9]+)? [-]?[0-9]+(?:\.[0-9]+)? [-]?[0-9]+(?:\.[0-9]+)?)\) \(([-]?[0-9]+(?:\.[0-9]+)? [-]?[0-9]+(?:\.[0-9]+)? [-]?[0-9]+(?:\.[0-9]+)?)\)"`, "g"); void main(string[] args) { import std.stdio; string dat = readText(args[0]); foreach (m; planePattern.matchAll(dat)) { writeln(m); } }
Comment #1 by blah38621 — 2014-06-02T20:58:31Z
Just let it run to 12.4gb with no end in sight.
Comment #2 by blah38621 — 2014-06-02T21:08:17Z
Also, dat and planePattern in the foreach loop should actually be swapped, but it never gets to the point where it actually reports this error.
Comment #3 by justin — 2014-06-02T21:22:29Z
I tested it out to 174GB before killing it. Perf says the top offenders are `Dsymbol::isTemplateMixin()`, `Dsymbol::pastMixin()`, and `CompoundStatement::interpret(InterState*)`.
Comment #4 by blah38621 — 2014-06-02T22:01:50Z
Alright, had someone else test on a machine with 256gb of memory, they ended up killing it after it hit 174gb of ram.
Comment #5 by dmitry.olsh — 2014-06-06T17:00:57Z
Matters are that made much worse by the fact that at CTFE all of [0-9] character sets get rebuilt for each occurrence, because charset cache is thread-local (no luck at CTFE). The memory leak itself though is a compiler issue. Hm, I could try and have an extra cache local to the scope of each regular expression compilation.
Comment #6 by blah38621 — 2014-06-06T19:52:18Z
Perhaps it would be possible to simulate a character set cache by using a template to represent the set of characters included? For the runtime version, is there any real reason to force the charset cache to be thread local? Could it perhaps be global with dirty read and a possible re-calculation? From my experience, most nievely implemented caching mechanisms fail to understand that a best-effort cache is usually more than enough.
Comment #7 by dmitry.olsh — 2014-06-07T10:48:52Z
Might help: https://github.com/D-Programming-Language/phobos/pull/2234 It still runs out of 4Gb on my machine over here though...
Comment #8 by blah38621 — 2014-06-07T16:49:31Z
Just let it run to 7gb even with that PR.
Comment #9 by dmitry.olsh — 2016-04-06T12:49:11Z
*** Issue 7442 has been marked as a duplicate of this issue. ***
Comment #10 by greensunny12 — 2018-03-31T13:43:48Z
From https://github.com/dlang/phobos/pull/6164#issuecomment-365175755: > Another idea is to deprecate ctRegex and stop testing it. It's 100% obvious that it doesn't work on anything significant in practice due to our compiler limitations. Then we enable it again once CTFE works with std.regex.
Comment #11 by bugzilla — 2019-12-03T09:41:04Z
On Linux I get: 631MB, compiled in 2.6 seconds. It's still a lot compared to 74MB when replaced by a single string comparison but much less than the gigabytes above. Don't know, if this is the same on windows nor if 631MB would be accepted as a fix.
Comment #12 by robert.schadek — 2024-12-01T16:21:22Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/10059 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB