Bug 8203 – Use of std.regex.match() generates "not enough preallocated memory" error

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-06-06T05:16:00Z
Last change time
2014-01-07T08:41:05Z
Keywords
pull
Assigned to
nobody
Creator
phshaffer

Attachments

IDFilenameSummaryContent-TypeSize
1110fold.txtFile to Comparetext/plain675713
1111fnew.txtFile to Comparetext/plain675713
1112icomp2.dSource Fileapplication/octet-stream1483
1113Capture.JPGConsole Screenshot with Error Showingimage/jpeg74849
1310hugeregex.txtregex exampletext/plain954

Comments

Comment #0 by phshaffer — 2012-06-06T05:16:26Z
Created attachment 1110 File to Compare
Comment #1 by phshaffer — 2012-06-06T05:17:42Z
Created attachment 1111 File to Compare
Comment #2 by phshaffer — 2012-06-06T05:21:26Z
Created attachment 1112 Source File
Comment #3 by phshaffer — 2012-06-06T05:23:23Z
Created attachment 1113 Console Screenshot with Error Showing
Comment #4 by phshaffer — 2012-06-06T05:43:36Z
Dmitry Olshansky recommended I submit this as a bug. The program is executed as : icomp2 fold.txt fnew.txt It should search fold.txt for certain text patterns and then see if all "found" text also appears in fnew.txt. Fold.txt and Fnew.txt are identical so all "found" text should appeart in Fnew.txt as well. I added some diagnostic loops counters for troubleshooting: writeln(cntOld," ",cntNew," ",matchOld.hit," ",matchNew.hit); As the screenshot shows after several iterations, it crashes with -> core.exception.AssertError@C:\D\dmd2\windows\bin\..\..\src\phobos\std\regex.d(60 50): not enough preallocated memory
Comment #5 by dmitry.olsh — 2012-06-06T06:13:01Z
(In reply to comment #4) > Dmitry Olshansky recommended I submit this as a bug. > Yup, case I'm the only one to fix it, at least in near future ;) > The program is executed as : icomp2 fold.txt fnew.txt > > It should search fold.txt for certain text patterns and then see if all "found" > text also appears in fnew.txt. Fold.txt and Fnew.txt are identical so all > "found" text should appeart in Fnew.txt as well. > > I added some diagnostic loops counters for troubleshooting: > writeln(cntOld," ",cntNew," ",matchOld.hit," ",matchNew.hit); > > As the screenshot shows after several iterations, it crashes with -> > core.exception.AssertError@C:\D\dmd2\windows\bin\..\..\src\phobos\std\regex.d(60 > 50): not enough preallocated memory Thanks, I'm on it. We'd better get fixed it in 2.060.
Comment #6 by dmitry.olsh — 2012-06-07T04:35:32Z
I've studied it a bit, and here is the details: it only happens, when re-running the same match object many times: foreach(v; match(...)) // no bug vs auto m = match(....) foreach(v; m) //does run out of memory In your case I see from comments that you try hard to do eager evalutaion, and first find all matches then work through two arrays of them. Yet it's not what program does, it still performes N*M regex searches because auto uniCapturesNew = match(uniFileOld, regex(...)); just starts the engine and finds 1st match. Then you copy engine state on each iteration of nested loop (this copy operation is bogus apparently) and run engine till all matches are found. Next iteration of loop - another copy. So in your case I strongly suggest to do this magic recipe, that work for all lazy ranges: auto allMatches = array(match(....); and work with arrays from now on. Anyway, the root cause is now clear and I've reduced it to: import std.regex; string data = " NAME = XPAW01_STA:STATION NAME = XPAW01_STA "; // Main function void main(){ auto uniFileOld = data; auto uniCapturesNew = match(uniFileOld, regex(r"^NAME = (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm")); for(int i=0; i<20; i++) { foreach (matchNew; uniCapturesNew) {} } }
Comment #7 by dmitry.olsh — 2012-06-07T14:38:19Z
Comment #8 by github-bugzilla — 2012-06-08T01:07:21Z
Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/0c35fcd694481753cebae9803906f6d857fe954f fix Issue 8203 Change RegexMatch objects to follow proper COW semantics https://github.com/D-Programming-Language/phobos/commit/245782bb6393b4a415c0e1e93b8a05f448e1457f unittest for bug 8203 https://github.com/D-Programming-Language/phobos/commit/f1757b88fa2fda9f5db74493be762c058d3e0111 Merge pull request #623 from blackwhale/nested-regex fix Issue 8203 std.regex.match() generates "not enough preallocated memory"
Comment #9 by github-bugzilla — 2012-06-08T01:17:51Z
Comment #10 by ilyayaroshenko — 2014-01-04T14:42:08Z
Created attachment 1310 regex example This regexp fails with "аллея Театральная, д. 3, стр. 1". Works fine in SublimeText3. ________________c ore.exception.AssertError@/usr/include/dmd/phobos/std/regex.d(5393): not enough preallocated memory ---------------- /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(_d_assert_msg+0x45) [0x5055f1] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(pure nothrow @trusted std.regex.Thread!(ulong).Thread* std.regex.ThompsonMatcher!(char, std.regex.Input!(char).Input.BackLooper).ThompsonMatcher.allocate()+0x88) [0x4e80b0] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(pure nothrow @trusted std.regex.Thread!(ulong).Thread* std.regex.ThompsonMatcher!(char, std.regex.Input!(char).Input.BackLooper).ThompsonMatcher.createStart(ulong, uint)+0x59) [0x4e84d9] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(@trusted std.regex.ThompsonMatcher!(char, std.regex.Input!(char).Input.BackLooper).ThompsonMatcher.MatchResult std.regex.ThompsonMatcher!(char, std.regex.Input!(char).Input.BackLooper).ThompsonMatcher.matchOneShot(std.regex.Group!(ulong).Group[], uint)+0xf9) [0x4e7eb1] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(@trusted void std.regex.ThompsonMatcher!(char).ThompsonMatcher.eval!(true).eval(std.regex.Thread!(ulong).Thread*, std.regex.Group!(ulong).Group[])+0x1672) [0x4e646a] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(@trusted std.regex.ThompsonMatcher!(char).ThompsonMatcher.MatchResult std.regex.ThompsonMatcher!(char).ThompsonMatcher.matchOneShot(std.regex.Group!(ulong).Group[], uint)+0x150) [0x4e2f88] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(@trusted bool std.regex.ThompsonMatcher!(char).ThompsonMatcher.match(std.regex.Group!(ulong).Group[])+0x9d) [0x4e2aa5] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(ref @trusted std.regex.__T10RegexMatchTAyaS273std5regex15ThompsonMatcherZ.RegexMatch std.regex.__T10RegexMatchTAyaS273std5regex15ThompsonMatcherZ.RegexMatch.__ctor!(std.regex.Regex!(char).Regex).__ctor(immutable(char)[], std.regex.Regex!(char).Regex)+0x1ae) [0x4ee856] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(@safe std.regex.__T10RegexMatchTAyaS273std5regex15ThompsonMatcherZ.RegexMatch std.regex.match!(immutable(char)[], std.regex.Regex!(char).Regex).match(immutable(char)[], std.regex.Regex!(char).Regex)+0x63) [0x4fa423] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(_Dmain+0x78ff) [0x4bd29f] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).runAll().void __lambda1()+0x18) [0x507b3c] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate())+0x2a) [0x507a96] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).runAll()+0x30) [0x507afc] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate())+0x2a) [0x507a96] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(_d_run_main+0x1a3) [0x507a17] /tmp/.rdmd-1000/rdmd-test.d-F2E4C955E1856CA0235A274413477A45/test(main+0x25) [0x502a7d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f967ddb0de5]
Comment #11 by dmitry.olsh — 2014-01-06T08:38:28Z
(In reply to comment #10) > Created an attachment (id=1310) [details] > regex example > > This regexp fails with > "аллея Театральная, д. 3, стр. 1". > Somewhat reduced test case: void main(){ import std.regex; auto r = regex(`([а-яА-Я\-_]+\s*)+(?<=[\s\.,\^])`); match("аллея Театральная", r); } Investigation shows it's related to lookaround. P.S. I suggest in future to post new bugs as new reports, even if the symptoms are similar to some older bug. REOPENED is for cases where the same issue happens again (regression, patch was reverted etc.).
Comment #12 by ilyayaroshenko — 2014-01-06T09:15:56Z
(In reply to comment #11) > (In reply to comment #10) > > Created an attachment (id=1310) [details] [details] > > regex example > > > > This regexp fails with > > "аллея Театральная, д. 3, стр. 1". > > > > Somewhat reduced test case: > void main(){ > import std.regex; > auto r = regex(`([а-яА-Я\-_]+\s*)+(?<=[\s\.,\^])`); > match("аллея Театральная", r); > } > > Investigation shows it's related to lookaround. > > P.S. I suggest in future to post new bugs as new reports, even if the symptoms > are similar to some older bug. REOPENED is for cases where the same issue > happens again (regression, patch was reverted etc.). Ok, Thanks!
Comment #13 by dmitry.olsh — 2014-01-07T06:57:28Z
Comment #14 by github-bugzilla — 2014-01-07T08:40:19Z
Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/8eb57d628bbe07d37f5b110d1c7e921fac1ab6c8 fix issue 8203, similar issue with lookaround When using a temporary engine as closure its generation counter should be tracked separately for each lookaround. For now just use built-in AA, later we could find better places to store this counter in. https://github.com/D-Programming-Language/phobos/commit/39b88e3a625d69c68a4928457216f2138ba9dd2a Merge pull request #1841 from blackwhale/issue-8203 fix issue 8203, similar issue with lookaround