← Back to index | Original Bugzilla link

Bug 5688 – Poor optimization of (long & 1)

Status: RESOLVED
Resolution: WORKSFORME
Severity: enhancement
Priority: P2
Component: dmd
Product: D
Version: D2
Platform: Other
OS: Windows
Creation time: 2011-03-03T00:47:06Z
Last change time: 2018-05-22T15:02:02Z
Keywords: performance
Assigned to: No Owner
Creator: Don

Comments

Comment #0 by clugdbug — 2011-03-03T00:47:06Z

The optimiser does a very poor job in a case like this: bool foo(long v) { return v&1; } It generates this: mov EAX,4[ESP] mov EDX,8[ESP] and EAX,1 xor EDX,EDX or EDX,EAX jne L17 xor EAX,EAX jmp short L1C L17: mov EAX,1 L1C: ret 8 That's terrible code! It should just do: mov EAX, 4[ESP] and EAX, 1 ret 8

Comment #1 by bugzilla — 2011-03-03T11:49:26Z

Interestingly, if the code is written as: bool foo(long v) { return (v & 1) == 1; } the code generated is: mov EAX,4[ESP] mov EDX,8[ESP] and EAX,1 xor EDX,EDX ret 8

Comment #2 by clugdbug — 2011-03-03T17:52:46Z

(In reply to comment #1) > Interestingly, if the code is written as: > > bool foo(long v) > { > return (v & 1) == 1; > } > > the code generated is: > > mov EAX,4[ESP] > mov EDX,8[ESP] > and EAX,1 > xor EDX,EDX > ret 8 I noticed that. And even though that's better, both uses of EDX are completely unnecessary. Changing cgelem.c, elcmp(), around line 3350 to this: case 8: - e = el_una(OP64_32,TYlong,e); + e->E1 = el_una(OP64_32,TYint,e->E1); + e->E2 = el_una(OP64_32,TYint,e->E2); break; makes it create optimal code, although that's probably incorrect for 64 bits. The way elcmp() works looks pretty bizarre to me. But it's the return ( v & 1); case that is the primary problem.

Comment #3 by dmitry.olsh — 2018-05-22T15:02:02Z

Now on 2.080 32-bit it's much better: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 8b 54 24 08 mov 0x8(%esp),%edx 8: 25 01 00 00 00 and $0x1,%eax d: 31 d2 xor %edx,%edx f: c2 08 00 ret $0x8 And 64-bit (barring the rbp/rsp that can be elided but a different matter): 0: 55 push %rbp 1: 48 8b ec mov %rsp,%rbp 4: 48 81 e7 01 00 00 00 and $0x1,%rdi b: 48 89 f8 mov %rdi,%rax e: 5d pop %rbp f: c3 retq