Bug 9963 – Absurdly Inefficient Codegen For Adding Boolean Predicates

Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-04-19T12:25:00Z
Last change time
2015-06-09T05:11:47Z
Keywords
performance
Assigned to
nobody
Creator
dsimcha

Comments

Comment #0 by dsimcha — 2013-04-19T12:25:32Z
D source Code: __gshared ulong n_less = 0, n_greater = 0; void doConditional(ubyte thresh, ubyte[] arr) { ulong l, g; foreach(val; arr) { l += (thresh < val); g += !(thresh < val); } n_less += l; n_greater += g; } DMD-generated ASM code (foreach loop only, from obj2asm, when compiled with -O -inline -release): L33: mov RDX,-018h[RBP] mov CL,[RDX][R8] cmp CL,R9B mov EAX,1 ja L47 xor EAX,EAX L47: cdqe add R11,RAX cmp R9B,CL sbb EAX,EAX inc EAX cdqe add RBX,RAX inc R8 cmp R8,-010h[RBP] jb L33 Why use sbb + neg + two cmp instructions instead of just using setb and setae? This executes in about 0.495 seconds for an array of 100 million elements. GCC's codegen for the same function: L20: movzx ECX,[RAX][RDX] xor R10D,R10D cmp ECX,EDI setnle R10B add R9,R10 cmp ECX,EDI setle CL add RAX,1 movzx ECX,CL add R8,RCX cmp RAX,RSI jne L20 This executes in about 0.095 seconds for an array of 100 million elements. My hand-compilation for this loop: LStart: cmp DL, byte ptr [RAX]; setae R9B; adc R10, 0; inc RAX; add R11, R9; cmp RAX, RBX; jb LStart; This executes in about 0.071 seconds for an array of 100 million elements.
Comment #1 by bugzilla — 2013-04-21T02:35:53Z
Comment #2 by bugzilla — 2013-04-22T16:05:32Z
Comment #3 by bugzilla — 2013-04-23T02:58:24Z
Comment #4 by bugzilla — 2013-04-23T03:03:07Z
Which brings the inner loop to: L2C: mov RBX,-8[RBP] mov CL,[RBX][R8] cmp CL,R9B setnbe DIL movzx EDI,DIL add R11,RDI cmp R9B,CL sbb RSI,0FFFFFFFFh inc R8 cmp R8,-010h[RBP] jb L2C
Comment #5 by bugzilla — 2013-04-24T00:08:26Z
https://github.com/D-Programming-Language/dmd/pull/1928 brings the code generated to: L2B: mov RBX,-8[RBP] mov CL,[RBX][R8] cmp R9B,CL adc R11,0 cmp R9B,CL sbb RSI,0FFFFFFFFh inc R8 cmp R8,-010h[RBP] jb L2B