Bug 6498 – [CTFE] copy-on-write is slow and causes huge memory usage

Status
NEW
Severity
critical
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2011-08-15T00:54:10Z
Last change time
2024-12-13T17:56:02Z
Keywords
bounty, CTFE
Assigned to
No Owner
Creator
Don
Blocks
7442
Moved to GitHub: dmd#17528 →

Comments

Comment #0 by clugdbug — 2011-08-15T00:54:10Z
This is the main reason why CTFE is so slow. int bug6498(int x) { int n = 0; while (n < x) ++n; return n; } static assert(bug6498(10_000_000)==10_000_000); --> Fails with an 'out of memory' error.
Comment #1 by clugdbug — 2012-11-26T07:14:57Z
Upgrading severity. I've done several commits to move towards a solution but I still need to do more restructuring to properly fix this.
Comment #2 by camille — 2014-02-11T17:24:58Z
There is a $105 bounty on this issue at Bountysource: https://www.bountysource.com/issues/1325927.
Comment #3 by per.nordlow — 2014-06-28T13:43:38Z
Don: Is there a Github PR or branch for your changes or are these things normally kept secret because this issue has a bounty?
Comment #4 by ibuclaw — 2014-06-28T16:08:41Z
FYI, all PR's have been merged in. I won't bother listing them all (there's a lot that was done over 2012/2013). There has been no work on this since June 2013 IIRC. https://github.com/D-Programming-Language/dmd/pull/1778#issuecomment-19964496 What should be focused on (thanks to Walter's idea of allocating but not freeing memory) is to limit just how much memory is allocated from CTFE. By possibly find ways to re-use and not re-allocate memory, or maybe giving CTFE its own allocator (it is a backend in its own right, afterall).
Comment #5 by razvan.nitu1305 — 2022-06-09T14:27:33Z
This seems to have been fixed. On my machine it takes 5 seconds to run this and it appears to use 2-3% of my 16 GB RAM. Should we close this?
Comment #6 by maxhaton — 2022-06-10T05:16:30Z
The memory usage has improved a lot but this is still ridiculously slow. Compare with a soon to be upstream-ed -preview=newCTFE: https://asciinema.org/a/zTHuVmXbsZ4ryWGfCd2bXoJG5 (roughly 10x faster) SDC does this in about 0.04 sec on my machine so 50x to 80x faster
Comment #7 by ibuclaw — 2022-06-10T11:45:45Z
Metrics of the code in this report ran by v2.080: --- Command being timed: "./generated/linux/release/64/dmd issue6498.d -c" User time (seconds): 6.44 System time (seconds): 0.29 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.75 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1104116 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 274715 Voluntary context switches: 1 Involuntary context switches: 256 Swaps: 0 File system inputs: 246 File system outputs: 6 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 --- As of v2.085.0 - when most of dinterpret had been converted over to returning UnionExp on the stack. --- Command being timed: "./generated/linux/release/64/dmd issue6498.d -c" User time (seconds): 6.64 System time (seconds): 0.19 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.84 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 636044 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1 Minor (reclaiming a frame) page faults: 157878 Voluntary context switches: 1 Involuntary context switches: 231 Swaps: 0 File system inputs: 386 File system outputs: 6 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 --- As of v2.089.0 - when a ctfeRegion allocator was introduced to free memory after exiting an interpret "scope". --- Command being timed: "./generated/linux/release/64/dmd issue6498.d -c" User time (seconds): 6.88 System time (seconds): 0.14 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.03 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 637204 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 158019 Voluntary context switches: 1 Involuntary context switches: 17 Swaps: 0 File system inputs: 474 File system outputs: 6 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 --- As of v2.100.0 --- Command being timed: "./generated/linux/release/64/dmd issue6498.d -c" User time (seconds): 7.13 System time (seconds): 0.07 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.22 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 482504 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 119238 Voluntary context switches: 1 Involuntary context switches: 223 Swaps: 0 File system inputs: 833 File system outputs: 6 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 --- With -lowmem. --- Command being timed: "./generated/linux/release/64/dmd issue6498.d -c -lowmem" User time (seconds): 7.64 System time (seconds): 0.05 Percent of CPU this job got: 103% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.42 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 28760 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1 Minor (reclaiming a frame) page faults: 5679 Voluntary context switches: 2376 Involuntary context switches: 774 Swaps: 0 File system inputs: 833 File system outputs: 6 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ---
Comment #8 by ibuclaw — 2022-06-10T11:53:21Z
(In reply to Iain Buclaw from comment #7) > v2.080: > Maximum resident set size (kbytes): 1104116 > v2.085.0: > Maximum resident set size (kbytes): 636044 > v2.089.0: > Maximum resident set size (kbytes): 637204 > v2.100.0: > Maximum resident set size (kbytes): 482504 > -lowmem (as of v2.090): > Maximum resident set size (kbytes): 28760 It's still nearly 500MB, so only 2x better than where we were 4 years ago, and still a far cry away from the possible 30MB we could instead by managing with. I also note that the compiler has slowed down by 1 second since v2.080 as well, so CTFE is not getting faster at all...
Comment #9 by robert.schadek — 2024-12-13T17:56:02Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/17528 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB