← Back to index | Original Bugzilla link

Bug 1149 – Optimizer: obsolete array length loads, common subexpr. elimin. not working

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P2
Component: dmd
Product: D
Version: D2
Platform: x86
OS: All
Creation time: 2007-04-15T12:05:36Z
Last change time: 2019-08-26T09:21:08Z
Keywords: performance, pull
Assigned to: yebblies
Creator: Jascha Wetzel

Comments

Comment #0 by jascha — 2007-04-15T12:05:36Z

consider the following code: void main() { uint[] arr; arr.length = 4; arr[0] = 1; arr[1] = 1; arr[2] = 1; arr[3] = 1; } for the 4 assignments, "dmd -O -inline" generates mov ECX,1 mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov [EDX],ECX mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov 4[EDX],ECX mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov 8[EDX],ECX mov EDX,010h[ESP] mov EAX,0Ch[ESP] mov 0Ch[EDX],ECX instead of mov ECX,1 mov EDX,010h[ESP] mov [EDX],ECX mov 4[EDX],ECX mov 8[EDX],ECX mov 0Ch[EDX],ECX - the length of the array is always loaded, even if it's not used - the array pointer is loaded multiple times into the same register the common subexpression elimination appears to be working for the value in ECX, but not for the pointer in EDX.

Comment #1 by yebblies — 2012-02-20T07:25:01Z

https://github.com/D-Programming-Language/dmd/pull/749 This makes the code generated for arrays identical to the code for pointers. The pointer is still reloaded before every assignment.

Comment #2 by github-bugzilla — 2012-02-22T14:22:53Z

Commit pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/a999486f9bb0b6ca2eeff1964b8f9c90b8841867 Merge pull request #749 from yebblies/issue1149 Improve codegen for OPmsw

Comment #3 by bearophile_hugs — 2012-02-22T15:38:35Z

With the test code, this is the X86 asm I was seeing before this patch (-O -release -inline): __Dmain comdat L0: sub ESP,0Ch lea EAX,[ESP] mov ECX,offset FLAT:_D11TypeInfo_Ak6__initZ mov [ESP],0 mov dword ptr 4[ESP],0 push EAX push 4 push ECX call near ptr __d_arraysetlengthT mov ECX,010h[ESP] mov EDX,1 mov EAX,0Ch[ESP] mov [ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 4[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 8[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 0Ch[ECX],EDX add ESP,0Ch add ESP,0Ch xor EAX,EAX ret ----------------------------- After the patch: __Dmain comdat L0: sub ESP,0Ch lea EAX,[ESP] mov ECX,offset FLAT:_D11TypeInfo_Ak6__initZ mov [ESP],0 mov dword ptr 4[ESP],0 push EAX push 4 push ECX call near ptr __d_arraysetlengthT mov ECX,010h[ESP] mov EDX,1 mov EAX,0Ch[ESP] mov [ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 4[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 8[ECX],EDX mov ECX,010h[ESP] mov EAX,0Ch[ESP] mov 0Ch[ECX],EDX add ESP,0Ch add ESP,0Ch xor EAX,EAX ret -------------------------------- objectdump after the patch, on a 64 bit systems (courtesy of q66 on IRC #D), compiled with -release: 0000000000000000 <_Dmain>: 0: 55 push %rbp 1: 48 8b ec mov %rsp,%rbp 4: 48 83 ec 18 sub $0x18,%rsp 8: 53 push %rbx 9: 48 c7 45 f0 00 00 00 movq $0x0,-0x10(%rbp) 10: 00 11: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp) 18: 00 19: 48 8d 55 f0 lea -0x10(%rbp),%rdx 1d: 48 be 04 00 00 00 00 mov $0x4,%rsi 24: 00 00 00 27: 48 bf 00 00 00 00 00 mov $0x0,%rdi 2e: 00 00 00 31: e8 00 00 00 00 callq 36 <_Dmain+0x36> 36: b8 01 00 00 00 mov $0x1,%eax 3b: 48 8b 55 f8 mov -0x8(%rbp),%rdx 3f: 48 8b 5d f0 mov -0x10(%rbp),%rbx 43: 89 02 mov %eax,(%rdx) 45: 48 8b 55 f8 mov -0x8(%rbp),%rdx 49: 48 8b 5d f0 mov -0x10(%rbp),%rbx 4d: 89 42 04 mov %eax,0x4(%rdx) 50: 48 8b 55 f8 mov -0x8(%rbp),%rdx 54: 48 8b 5d f0 mov -0x10(%rbp),%rbx 58: 89 42 08 mov %eax,0x8(%rdx) 5b: 48 8b 55 f8 mov -0x8(%rbp),%rdx 5f: 48 8b 5d f0 mov -0x10(%rbp),%rbx 63: 89 42 0c mov %eax,0xc(%rdx) 66: 31 c0 xor %eax,%eax 68: 5b pop %rbx 69: c9 leaveq 6a: c3 retq -------------------------------- So is this patch working?

Comment #4 by yebblies — 2012-02-22T19:20:18Z

(In reply to comment #3) > With the test code, this is the X86 asm I was seeing before this patch (-O > -release -inline): > [snip] > > So is this patch working? Oops. I tried to disable it for floating point and ended up disabling it for all non-integers.

Comment #5 by bearophile_hugs — 2012-02-22T20:00:29Z

(In reply to comment #4) > Oops. I tried to disable it for floating point and ended up disabling it for > all non-integers. I have tried it with this code: void main() { int[] arr; arr.length = 4; arr[0] = 1; arr[1] = 1; arr[2] = 1; arr[3] = 1; } And I am seeing something similar to the uint[] case.

Comment #6 by yebblies — 2012-02-22T20:20:57Z

(In reply to comment #5) > (In reply to comment #4) > > > Oops. I tried to disable it for floating point and ended up disabling it for > > all non-integers. > > I have tried it with this code: > > void main() { > int[] arr; > arr.length = 4; > arr[0] = 1; > arr[1] = 1; > arr[2] = 1; > arr[3] = 1; > } > > > And I am seeing something similar to the uint[] case. Yeah, the optimization in my patch is effectively disabled. I changed it to integer-only because it was crashing in some floating point code - but the most significant word of an array isn't an integer, it's a pointer. I've tracked to codegen down, it seems like 'fnstsw ax' destroys eax and it never gets restored. Without the optimization loading the pointer into eax never gets picked up as a cse and removed in the first place.

Comment #7 by github-bugzilla — 2012-02-23T00:05:21Z

Commit pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/1d1bc367c131144e764d9fb81c36651407e3337d Merge pull request #759 from yebblies/issue1149 Fix OPmsw codegen

Comment #8 by bearophile_hugs — 2012-02-23T04:31:27Z

Better: __Dmain comdat L0: sub ESP,0Ch mov ECX,offset FLAT:_D11TypeInfo_Ak6__initZ push EBX push ESI lea EAX,8[ESP] mov dword ptr 8[ESP],0 mov dword ptr 0Ch[ESP],0 push EAX push 4 push ECX call near ptr __d_arraysetlengthT mov EBX,018h[ESP] mov EDX,1 mov [EBX],EDX mov ESI,018h[ESP] mov 4[ESI],EDX mov EAX,018h[ESP] mov 8[EAX],EDX mov ECX,018h[ESP] xor EAX,EAX mov 0Ch[ECX],EDX add ESP,0Ch pop ESI pop EBX add ESP,0Ch ret

Comment #9 by yebblies — 2012-02-23T05:20:28Z

(In reply to comment #8) > Better: > Yeah. I have no idea why it's using so many registers though.

Comment #10 by razvan.nitu1305 — 2019-08-26T09:21:08Z

This seems to have been fixed by https://github.com/dlang/dmd/pull/749 .