Bug 14451 – static-foreach uses huge stack for no reason

Status
NEW
Severity
major
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2015-04-16T10:22:02Z
Last change time
2024-12-13T18:42:13Z
Keywords
industry
Assigned to
No Owner
Creator
Tomer Filiba (weka)
Moved to GitHub: dmd#18976 →

Comments

Comment #0 by tomer — 2015-04-16T10:22:02Z
Using static foreach (on a compile-time known range) basically expands the code block so many times. That's excellent, except that variables declared inside the foreach-body are not scoped properly, which causes needless, huge stack allocation. Consider the following snippet: ================================================ import std.traits; struct S {ubyte a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;} string[] staticForeach1() { string[] arr; foreach(string name; __traits(allMembers, S)) { ulong[100] x; arr ~= name; } return arr; } string[] staticForeach2() { string[] arr; foreach(string name; __traits(allMembers, S)) { (){ ulong[100] x; arr ~= name; }(); } return arr; } ================================================ The first version compiles to Dump of assembler code for function staticForeach1: 0x0000000000426170 <+0>: push %rbp 0x0000000000426171 <+1>: mov %rsp,%rbp 0x0000000000426174 <+4>: sub $0x52f8,%rsp <<<<< 0x000000000042617b <+11>: push %rbx 0x000000000042617c <+12>: movq $0x0,-0x52f0(%rbp) 0x0000000000426187 <+23>: movq $0x0,-0x52e8(%rbp) requiring over 20K of stack space. The second version compiles to Dump of assembler code for function staticForeach2: 0x0000000000426c48 <+0>: push %rbp 0x0000000000426c49 <+1>: mov %rsp,%rbp 0x0000000000426c4c <+4>: sub $0x10,%rsp <<<<< 0x0000000000426c50 <+8>: movq $0x0,-0x10(%rbp) 0x0000000000426c58 <+16>: movq $0x0,-0x8(%rbp) 0x0000000000426c60 <+24>: mov %rbp,%rdi 0x0000000000426c63 <+27>: callq 0x426d40 <staticForeach2FZ9__lambda1MFNbNfZv> 0x0000000000426c68 <+32>: mov %rbp,%rdi 0x0000000000426c6b <+35>: callq 0x426db8 <staticForeach2FZ9__lambda2MFNbNfZv> 0x0000000000426c70 <+40>: mov %rbp,%rdi requiring only 16 bytes. I tried adding more scoping braces inside the foreach, e.g., foreach(string name; __traits(allMembers, S)) { { ulong[100] x; arr ~= name; } } but anything other than actually invoking a separate function does not work. Since I'm using fibers with small stacks, this is a real issue for me. I can usually find workarounds (as described here), but there's no reason why so much stack be wasted here.
Comment #1 by yebblies — 2015-04-16T16:01:44Z
Scoped properly or not, each variable is going to take up stack space. Space for all variables is always allocated at the start of the function, inner scopes just control when destructors are run. I can't really see this changing. This could be fixed by allowing the backend to re-use stack space for variables whose lifetimes do not overlap.
Comment #2 by tomer — 2015-04-16T16:29:15Z
you are correct: =================== void foo() { { ulong[32] x; writeln(x.ptr); } { ulong[32] x; writeln(x.ptr); } { ulong[32] x; writeln(x.ptr); } { ulong[32] x; writeln(x.ptr); } { ulong[32] x; writeln(x.ptr); } } =================== yields 7FFFB9A1E8A0 7FFFB9A1E9A0 7FFFB9A1EAA0 7FFFB9A1EBA0 7FFFB9A1ECA0 it seems a pity to waste so much stack... having the backend reuse dead parts of the stack is crucial for embedded applications or heavily-parallel ones that rely to thousands of small stacks. of course it could be re-written ==================== void foo() { (){ ulong[32] x; writeln(x.ptr); }(); (){ ulong[32] x; writeln(x.ptr); }(); (){ ulong[32] x; writeln(x.ptr); }(); (){ ulong[32] x; writeln(x.ptr); }(); (){ ulong[32] x; writeln(x.ptr); }(); } =================== which then works just fine: 7FFF4101C4A0 7FFF4101C4A0 7FFF4101C4A0 7FFF4101C4A0 7FFF4101C4A0 couldn't inner-scopes be rewritten like that, only making sure the lambda is fully-scoped so that external variables will be heap-allocated
Comment #3 by yebblies — 2015-04-17T01:01:48Z
(In reply to Tomer Filiba from comment #2) > couldn't inner-scopes be rewritten like that, only making sure the lambda is > fully-scoped so that external variables will be heap-allocated That is automatic outlining, and while possible, is something we're unlikely to see any time soon in dmd.
Comment #4 by robert.schadek — 2024-12-13T18:42:13Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/18976 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB