Bug 2356 – array literal as non static initializer generates horribly inefficient code.

Status
RESOLVED
Resolution
FIXED
Severity
major
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2008-09-11T13:46:00Z
Last change time
2015-06-09T05:11:51Z
Keywords
performance, pull
Assigned to
nobody
Creator
tomas
Blocks
6421, 1253

Comments

Comment #0 by tomas — 2008-09-11T13:46:29Z
this simple test: void main() { int[3] x = [1,2,3]; } should just initialize x with the values 1,2,3 what happens in reality is not as nice as this output of obj2asm shows: _Dmain: push EBP mov EBP,ESP sub ESP,0Ch push EBX lea EAX,-0Ch[EBP] push EAX push 3 push 3 push 2 push 1 push 3 mov ECX,offset FLAT:_D11TypeInfo_Ai6__initZ@SYM32 push ECX call near ptr _d_arrayliteralT@PC32 add ESP,014h mov EDX,EAX mov EBX,3 push EDX push EBX push 4 call near ptr _d_arraycopy@PC32 xor EAX,EAX add ESP,014h pop EBX leave ret I'm marking it wrong-code, even though the code does compile and link, it's just wrong ... I'm using 1.034 but I'd guess it's an older bug, so I'm not marking a specific version.
Comment #1 by tomas — 2008-09-15T08:52:08Z
I think this issue might have been introduced in DMD 1.034, as I've hit several issues in LLVMDC, that used to work, due to some initializers suddenly being ArrayLiterals instead of ArrayInitializers. Marking version 1.034.
Comment #2 by clugdbug — 2009-04-08T08:01:56Z
This is a performance issue, not wrong-code.
Comment #3 by bearophile_hugs — 2010-07-19T23:18:32Z
*** Issue 4488 has been marked as a duplicate of this issue. ***
Comment #4 by k.hara.pg — 2011-09-09T09:35:07Z
Comment #5 by k.hara.pg — 2011-09-09T09:36:49Z
After applying my patch, the sample code generates like follows: (output of ddbg in 64-bit Windows 7) c:\d\test.d:1 void main() 00402010: c80c0000 enter 0xc, 0x0 c:\d\test.d:3 int[3] x = [1,2,3]; 00402014: c745f401000000 mov dword [ebp-0xc], 0x1 0040201b: c745f802000000 mov dword [ebp-0x8], 0x2 00402022: c745fc03000000 mov dword [ebp-0x4], 0x3 00402029: 31c0 xor eax, eax test.obj 0040202b: c9 leave 0040202c: c3 ret
Comment #6 by code — 2011-10-07T19:09:44Z
This is also a huge issue when assigning .init to a static array. ubyte[1024] v; v = typeof(v).init; This will generate a dynamically allocated [0, 0, 0 ... ] Arrayliteral for the rhs expression before calling _d_arraycopy. Even worse the allocated ArrayLiteral is initialized using 1024 comma expressions. Using -O the compiler will subsequently almost starve from O(N^2) behavior during comsub eliminations.
Comment #7 by yebblies — 2012-02-01T19:27:49Z
*** Issue 4298 has been marked as a duplicate of this issue. ***
Comment #8 by yebblies — 2012-02-01T20:34:11Z
*** Issue 4881 has been marked as a duplicate of this issue. ***
Comment #9 by yebblies — 2012-02-02T01:49:34Z
*** Issue 2237 has been marked as a duplicate of this issue. ***
Comment #10 by k.hara.pg — 2012-05-05T00:42:06Z
Pull #375 was not sufficient, so removed 'patch' keyword.
Comment #11 by issues.dlang — 2012-10-14T11:36:34Z
*** Issue 8820 has been marked as a duplicate of this issue. ***
Comment #12 by yebblies — 2012-10-28T11:14:56Z
*** Issue 8903 has been marked as a duplicate of this issue. ***
Comment #13 by verylonglogin.reg — 2012-10-30T07:17:29Z
Workaround for those who like "a, b, c" initialization but need more performance (not: it still calls `_d_arraycopy`): --- T[n] makeStaticArray(T, size_t n)(T[n] data...) // { return data; } { T[n] res; res = data; return res; } // Issue 8914 workaround void setStaticArray(T, size_t n)(ref T[n] array, T[n] data...) { array = data; } void main() { auto x = makeStaticArray(1, 2, 3); static assert(is(typeof(x) == int[3])); assert(x == [1, 2, 3]); int[3] y; y.setStaticArray(1, 2, 3); assert(y == [1, 2, 3]); } ---
Comment #14 by k.hara.pg — 2013-04-10T10:31:22Z
Comment #15 by bearophile_hugs — 2013-04-10T19:20:51Z
(In reply to comment #14) > New D2 fix: > https://github.com/D-Programming-Language/dmd/pull/1883 From the pull request (dmd -O -inline -g test after): c:\d\test.d:18 int[3] y = [n, n, n]; 004020aa: 6a03 push 0x3 004020ac: 6a05 push 0x5 004020ae: 8d4c241c lea ecx, [esp+0x1c] 004020b2: 51 push ecx 004020b3: e880020000 call 0x402338 __memset32 Isn't calling memset for just 3 integers slower than inlining their assignments? I suggest to not call memset if the number of bytes to be copied is so small (I think LDC is already doing similar optimizations). Maybe a benchmark is also useful here. c:\d\test.d:20 S[3] z = [s2, s2, s2]; 004020b8: 8d542418 lea edx, [esp+0x18] 004020bc: 52 push edx 004020bd: 8d442430 lea eax, [esp+0x30] 004020c1: e86affffff call 0x402030 test.S.__cpctor c:\d\test.d:3 004020c6: 8d5c2418 lea ebx, [esp+0x18] 004020ca: 53 push ebx 004020cb: 8d442434 lea eax, [esp+0x34] 004020cf: e85cffffff call 0x402030 test.S.__cpctor c:\d\test.d:3 004020d4: 53 push ebx 004020d5: 8d442438 lea eax, [esp+0x38] 004020d9: e852ffffff call 0x402030 test.S.__cpctor c:\d\test.d:3 004020de: 83c40c add esp, 0xc 004020e1: 31c0 xor eax, eax If the s2 variable already contains the struct, then what's the purpose of those calls to 0x402030? In the "before" there are no calls to struct constructors: c:\d\test.d:20 S[3] z = [s2, s2, s2]; 00403913: 8d542474 lea edx, [esp+0x74] 00403917: b960014200 mov ecx, 0x420160 0040391c: 52 push edx 0040391d: 6a03 push 0x3 0040391f: 6a03 push 0x3 00403921: 51 push ecx 00403922: e8fd0a0000 call 0x404424 __d_arrayliteralTX 00403927: 83c408 add esp, 0x8 0040392a: 8d542470 lea edx, [esp+0x70] 0040392e: 52 push edx 0040392f: 89c6 mov esi, eax
Comment #16 by k.hara.pg — 2013-04-10T19:34:08Z
(In reply to comment #15) > (In reply to comment #14) > > New D2 fix: > > https://github.com/D-Programming-Language/dmd/pull/1883 > > From the pull request (dmd -O -inline -g test after): > > > c:\d\test.d:18 int[3] y = [n, n, n]; > 004020aa: 6a03 push 0x3 > 004020ac: 6a05 push 0x5 > 004020ae: 8d4c241c lea ecx, [esp+0x1c] > 004020b2: 51 push ecx > 004020b3: e880020000 call 0x402338 __memset32 > > Isn't calling memset for just 3 integers slower than inlining their > assignments? I suggest to not call memset if the number of bytes to be copied > is so small (I think LDC is already doing similar optimizations). Maybe a > benchmark is also useful here. It is lowered to: int[3] y = void; y[] = n; And currently dmd uses memset for `y[] = n;`. It is another optimization issue. > c:\d\test.d:20 S[3] z = [s2, s2, s2]; > 004020b8: 8d542418 lea edx, [esp+0x18] > 004020bc: 52 push edx > 004020bd: 8d442430 lea eax, [esp+0x30] > 004020c1: e86affffff call 0x402030 test.S.__cpctor c:\d\test.d:3 > 004020c6: 8d5c2418 lea ebx, [esp+0x18] > 004020ca: 53 push ebx > 004020cb: 8d442434 lea eax, [esp+0x34] > 004020cf: e85cffffff call 0x402030 test.S.__cpctor c:\d\test.d:3 > 004020d4: 53 push ebx > 004020d5: 8d442438 lea eax, [esp+0x38] > 004020d9: e852ffffff call 0x402030 test.S.__cpctor c:\d\test.d:3 > 004020de: 83c40c add esp, 0xc > 004020e1: 31c0 xor eax, eax > > If the s2 variable already contains the struct, then what's the purpose of > those calls to 0x402030? > > > In the "before" there are no calls to struct constructors: > > c:\d\test.d:20 S[3] z = [s2, s2, s2]; > 00403913: 8d542474 lea edx, [esp+0x74] > 00403917: b960014200 mov ecx, 0x420160 > 0040391c: 52 push edx > 0040391d: 6a03 push 0x3 > 0040391f: 6a03 push 0x3 > 00403921: 51 push ecx > 00403922: e8fd0a0000 call 0x404424 __d_arrayliteralTX > 00403927: 83c408 add esp, 0x8 > 0040392a: 8d542470 lea edx, [esp+0x70] > 0040392e: 52 push edx > 0040392f: 89c6 mov esi, eax Before, cpctor(==postblit) calls are done in __d_arrayliteralTX, so they are hidden. Now they are directly called on the stack memory z[0..3].
Comment #17 by bearophile_hugs — 2013-04-10T19:44:37Z
(In reply to comment #16) > It is lowered to: > int[3] y = void; > y[] = n; > > And currently dmd uses memset for `y[] = n;`. It is another optimization issue. OK. > Before, cpctor(==postblit) calls are done in __d_arrayliteralTX, so they are > hidden. Now they are directly called on the stack memory z[0..3]. Sorry I have missed it was the postblit, thank you.
Comment #18 by github-bugzilla — 2013-04-11T01:36:48Z
Commits pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/8cd5f790a78e7514e46618d0325e92cbd6e00e48 fix Issue 2356 - array literal as non static initializer generates horribly inefficient code. https://github.com/D-Programming-Language/dmd/commit/d4b20baee7a1c9ee8a9271724feb5d1031e773d4 Merge pull request #1883 from 9rnsr/fix2356 Issue 2356 - array literal as non static initializer generates horribly inefficient code.
Comment #19 by bearophile_hugs — 2013-04-11T05:26:29Z
The patch seems to work. With it I have removed five optimizations from my code. Very good.
Comment #20 by andrei — 2013-04-11T06:12:47Z
Thanks, Kenji!
Comment #21 by code — 2013-06-06T00:04:57Z
Awesome, thanks for fixing this. This was my number 1 most annoying bug in D. Because when not using a GC the old behavior always leaked memory.
Comment #22 by verylonglogin.reg — 2013-10-24T13:21:28Z
The fix is partially reverted in dmd pull 2682 [1] as it introduced Issue 11238. Only construction is optimized now: --- void main() { int[2] m = [4, 6]; // still optimized m = [m[1], m[0]]; // swap, currently calls `_d_arrayliteralTX` assert(m == [6, 4]); // was [6, 6] with original fix (Issue 11238) } --- Opened Issue 11345 for assignment case. Also some duplicates of current issue and duplicates of duplicates of current issue are now really duplicates of Issue 11345. [1] https://github.com/D-Programming-Language/dmd/pull/2682
Comment #23 by verylonglogin.reg — 2013-10-24T13:45:30Z
(In reply to comment #22) > Also some duplicates of current issue and duplicates of duplicates of current > issue are now really duplicates of Issue 11345. Looks like Issue 4881 was the only one. Changed its "DUPLICATE of".