Bug 5688 – Poor optimization of (long & 1)

Status
RESOLVED
Resolution
WORKSFORME
Severity
enhancement
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
Other
OS
Windows
Creation time
2011-03-03T00:47:06Z
Last change time
2018-05-22T15:02:02Z
Keywords
performance
Assigned to
No Owner
Creator
Don

Comments

Comment #0 by clugdbug — 2011-03-03T00:47:06Z
The optimiser does a very poor job in a case like this: bool foo(long v) { return v&1; } It generates this: mov EAX,4[ESP] mov EDX,8[ESP] and EAX,1 xor EDX,EDX or EDX,EAX jne L17 xor EAX,EAX jmp short L1C L17: mov EAX,1 L1C: ret 8 That's terrible code! It should just do: mov EAX, 4[ESP] and EAX, 1 ret 8
Comment #1 by bugzilla — 2011-03-03T11:49:26Z
Interestingly, if the code is written as: bool foo(long v) { return (v & 1) == 1; } the code generated is: mov EAX,4[ESP] mov EDX,8[ESP] and EAX,1 xor EDX,EDX ret 8
Comment #2 by clugdbug — 2011-03-03T17:52:46Z
(In reply to comment #1) > Interestingly, if the code is written as: > > bool foo(long v) > { > return (v & 1) == 1; > } > > the code generated is: > > mov EAX,4[ESP] > mov EDX,8[ESP] > and EAX,1 > xor EDX,EDX > ret 8 I noticed that. And even though that's better, both uses of EDX are completely unnecessary. Changing cgelem.c, elcmp(), around line 3350 to this: case 8: - e = el_una(OP64_32,TYlong,e); + e->E1 = el_una(OP64_32,TYint,e->E1); + e->E2 = el_una(OP64_32,TYint,e->E2); break; makes it create optimal code, although that's probably incorrect for 64 bits. The way elcmp() works looks pretty bizarre to me. But it's the return ( v & 1); case that is the primary problem.
Comment #3 by dmitry.olsh — 2018-05-22T15:02:02Z
Now on 2.080 32-bit it's much better: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 8b 54 24 08 mov 0x8(%esp),%edx 8: 25 01 00 00 00 and $0x1,%eax d: 31 d2 xor %edx,%edx f: c2 08 00 ret $0x8 And 64-bit (barring the rbp/rsp that can be elided but a different matter): 0: 55 push %rbp 1: 48 8b ec mov %rsp,%rbp 4: 48 81 e7 01 00 00 00 and $0x1,%rdi b: 48 89 f8 mov %rdi,%rax e: 5d pop %rbp f: c3 retq