Bug 1149 – Optimizer: obsolete array length loads, common subexpr. elimin. not working

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
x86
OS
All
Creation time
2007-04-15T12:05:36Z
Last change time
2019-08-26T09:21:08Z
Keywords
performance, pull
Assigned to
yebblies
Creator
Jascha Wetzel

Comments

Comment #0 by jascha — 2007-04-15T12:05:36Z
consider the following code: void main() { uint[] arr; arr.length = 4; arr[0] = 1; arr[1] = 1; arr[2] = 1; arr[3] = 1; } for the 4 assignments, "dmd -O -inline" generates mov ECX,1 mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov [EDX],ECX mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov 4[EDX],ECX mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov 8[EDX],ECX mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov 0Ch[EDX],ECX instead of mov ECX,1 mov EDX,010h[ESP] mov [EDX],ECX mov 4[EDX],ECX mov 8[EDX],ECX mov 0Ch[EDX],ECX - the length of the array is always loaded, even if it's not used - the array pointer is loaded multiple times into the same register the common subexpression elimination appears to be working for the value in ECX, but not for the pointer in EDX.
Comment #1 by yebblies — 2012-02-20T07:25:01Z
https://github.com/D-Programming-Language/dmd/pull/749 This makes the code generated for arrays identical to the code for pointers. The pointer is still reloaded before every assignment.
Comment #2 by github-bugzilla — 2012-02-22T14:22:53Z
Commit pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/a999486f9bb0b6ca2eeff1964b8f9c90b8841867 Merge pull request #749 from yebblies/issue1149 Improve codegen for OPmsw
Comment #3 by bearophile_hugs — 2012-02-22T15:38:35Z
With the test code, this is the X86 asm I was seeing before this patch (-O -release -inline): __Dmain comdat L0: sub ESP,0Ch lea EAX,[ESP] mov ECX,offset FLAT:_D11TypeInfo_Ak6__initZ mov [ESP],0 mov dword ptr 4[ESP],0 push EAX push 4 push ECX call near ptr __d_arraysetlengthT mov ECX,010h[ESP] mov EDX,1 mov EAX,0Ch[ESP] mov [ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 4[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 8[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 0Ch[ECX],EDX add ESP,0Ch add ESP,0Ch xor EAX,EAX ret ----------------------------- After the patch: __Dmain comdat L0: sub ESP,0Ch lea EAX,[ESP] mov ECX,offset FLAT:_D11TypeInfo_Ak6__initZ mov [ESP],0 mov dword ptr 4[ESP],0 push EAX push 4 push ECX call near ptr __d_arraysetlengthT mov ECX,010h[ESP] mov EDX,1 mov EAX,0Ch[ESP] mov [ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 4[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 8[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 0Ch[ECX],EDX add ESP,0Ch add ESP,0Ch xor EAX,EAX ret -------------------------------- objectdump after the patch, on a 64 bit systems (courtesy of q66 on IRC #D), compiled with -release: 0000000000000000 <_Dmain>: 0: 55 push %rbp 1: 48 8b ec mov %rsp,%rbp 4: 48 83 ec 18 sub $0x18,%rsp 8: 53 push %rbx 9: 48 c7 45 f0 00 00 00 movq $0x0,-0x10(%rbp) 10: 00 11: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp) 18: 00 19: 48 8d 55 f0 lea -0x10(%rbp),%rdx 1d: 48 be 04 00 00 00 00 mov $0x4,%rsi 24: 00 00 00 27: 48 bf 00 00 00 00 00 mov $0x0,%rdi 2e: 00 00 00 31: e8 00 00 00 00 callq 36 <_Dmain+0x36> 36: b8 01 00 00 00 mov $0x1,%eax 3b: 48 8b 55 f8 mov -0x8(%rbp),%rdx 3f: 48 8b 5d f0 mov -0x10(%rbp),%rbx 43: 89 02 mov %eax,(%rdx) 45: 48 8b 55 f8 mov -0x8(%rbp),%rdx 49: 48 8b 5d f0 mov -0x10(%rbp),%rbx 4d: 89 42 04 mov %eax,0x4(%rdx) 50: 48 8b 55 f8 mov -0x8(%rbp),%rdx 54: 48 8b 5d f0 mov -0x10(%rbp),%rbx 58: 89 42 08 mov %eax,0x8(%rdx) 5b: 48 8b 55 f8 mov -0x8(%rbp),%rdx 5f: 48 8b 5d f0 mov -0x10(%rbp),%rbx 63: 89 42 0c mov %eax,0xc(%rdx) 66: 31 c0 xor %eax,%eax 68: 5b pop %rbx 69: c9 leaveq 6a: c3 retq -------------------------------- So is this patch working?
Comment #4 by yebblies — 2012-02-22T19:20:18Z
(In reply to comment #3) > With the test code, this is the X86 asm I was seeing before this patch (-O > -release -inline): > [snip] > > So is this patch working? Oops. I tried to disable it for floating point and ended up disabling it for all non-integers.
Comment #5 by bearophile_hugs — 2012-02-22T20:00:29Z
(In reply to comment #4) > Oops. I tried to disable it for floating point and ended up disabling it for > all non-integers. I have tried it with this code: void main() { int[] arr; arr.length = 4; arr[0] = 1; arr[1] = 1; arr[2] = 1; arr[3] = 1; } And I am seeing something similar to the uint[] case.
Comment #6 by yebblies — 2012-02-22T20:20:57Z
(In reply to comment #5) > (In reply to comment #4) > > > Oops. I tried to disable it for floating point and ended up disabling it for > > all non-integers. > > I have tried it with this code: > > void main() { > int[] arr; > arr.length = 4; > arr[0] = 1; > arr[1] = 1; > arr[2] = 1; > arr[3] = 1; > } > > > And I am seeing something similar to the uint[] case. Yeah, the optimization in my patch is effectively disabled. I changed it to integer-only because it was crashing in some floating point code - but the most significant word of an array isn't an integer, it's a pointer. I've tracked to codegen down, it seems like 'fnstsw ax' destroys eax and it never gets restored. Without the optimization loading the pointer into eax never gets picked up as a cse and removed in the first place.
Comment #7 by github-bugzilla — 2012-02-23T00:05:21Z
Commit pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/1d1bc367c131144e764d9fb81c36651407e3337d Merge pull request #759 from yebblies/issue1149 Fix OPmsw codegen
Comment #8 by bearophile_hugs — 2012-02-23T04:31:27Z
Better: __Dmain comdat L0: sub ESP,0Ch mov ECX,offset FLAT:_D11TypeInfo_Ak6__initZ push EBX push ESI lea EAX,8[ESP] mov dword ptr 8[ESP],0 mov dword ptr 0Ch[ESP],0 push EAX push 4 push ECX call near ptr __d_arraysetlengthT mov EBX,018h[ESP] mov EDX,1 mov [EBX],EDX mov ESI,018h[ESP] mov 4[ESI],EDX mov EAX,018h[ESP] mov 8[EAX],EDX mov ECX,018h[ESP] xor EAX,EAX mov 0Ch[ECX],EDX add ESP,0Ch pop ESI pop EBX add ESP,0Ch ret
Comment #9 by yebblies — 2012-02-23T05:20:28Z
(In reply to comment #8) > Better: > Yeah. I have no idea why it's using so many registers though.
Comment #10 by razvan.nitu1305 — 2019-08-26T09:21:08Z
This seems to have been fixed by https://github.com/dlang/dmd/pull/749 .