← Back to index | Original Bugzilla link

Bug 9963 – Absurdly Inefficient Codegen For Adding Boolean Predicates

Status: RESOLVED
Resolution: FIXED
Severity: enhancement
Priority: P2
Component: dmd
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2013-04-19T12:25:00Z
Last change time: 2015-06-09T05:11:47Z
Keywords: performance
Assigned to: nobody
Creator: dsimcha

Comments

Comment #0 by dsimcha — 2013-04-19T12:25:32Z

D source Code: __gshared ulong n_less = 0, n_greater = 0; void doConditional(ubyte thresh, ubyte[] arr) { ulong l, g; foreach(val; arr) { l += (thresh < val); g += !(thresh < val); } n_less += l; n_greater += g; } DMD-generated ASM code (foreach loop only, from obj2asm, when compiled with -O -inline -release): L33: mov RDX,-018h[RBP] mov CL,[RDX][R8] cmp CL,R9B mov EAX,1 ja L47 xor EAX,EAX L47: cdqe add R11,RAX cmp R9B,CL sbb EAX,EAX inc EAX cdqe add RBX,RAX inc R8 cmp R8,-010h[RBP] jb L33 Why use sbb + neg + two cmp instructions instead of just using setb and setae? This executes in about 0.495 seconds for an array of 100 million elements. GCC's codegen for the same function: L20: movzx ECX,[RAX][RDX] xor R10D,R10D cmp ECX,EDI setnle R10B add R9,R10 cmp ECX,EDI setle CL add RAX,1 movzx ECX,CL add R8,RCX cmp RAX,RSI jne L20 This executes in about 0.095 seconds for an array of 100 million elements. My hand-compilation for this loop: LStart: cmp DL, byte ptr [RAX]; setae R9B; adc R10, 0; inc RAX; add R11, R9; cmp RAX, RBX; jb LStart; This executes in about 0.071 seconds for an array of 100 million elements.

Comment #1 by bugzilla — 2013-04-21T02:35:53Z

A partial fix: https://github.com/D-Programming-Language/dmd/pull/1917

Comment #2 by bugzilla — 2013-04-22T16:05:32Z

https://github.com/D-Programming-Language/dmd/pull/1921

Comment #3 by bugzilla — 2013-04-23T02:58:24Z

And this should finish it: https://github.com/D-Programming-Language/dmd/pull/1924

Comment #4 by bugzilla — 2013-04-23T03:03:07Z

Which brings the inner loop to: L2C: mov RBX,-8[RBP] mov CL,[RBX][R8] cmp CL,R9B setnbe DIL movzx EDI,DIL add R11,RDI cmp R9B,CL sbb RSI,0FFFFFFFFh inc R8 cmp R8,-010h[RBP] jb L2C

Comment #5 by bugzilla — 2013-04-24T00:08:26Z

https://github.com/D-Programming-Language/dmd/pull/1928 brings the code generated to: L2B: mov RBX,-8[RBP] mov CL,[RBX][R8] cmp R9B,CL adc R11,0 cmp R9B,CL sbb RSI,0FFFFFFFFh inc R8 cmp R8,-010h[RBP] jb L2B