The optimiser does a very poor job in a case like this:
bool foo(long v)
{
return v&1;
}
It generates this:
mov EAX,4[ESP]
mov EDX,8[ESP]
and EAX,1
xor EDX,EDX
or EDX,EAX
jne L17
xor EAX,EAX
jmp short L1C
L17: mov EAX,1
L1C: ret 8
That's terrible code! It should just do:
mov EAX, 4[ESP]
and EAX, 1
ret 8
Comment #1 by bugzilla — 2011-03-03T11:49:26Z
Interestingly, if the code is written as:
bool foo(long v)
{
return (v & 1) == 1;
}
the code generated is:
mov EAX,4[ESP]
mov EDX,8[ESP]
and EAX,1
xor EDX,EDX
ret 8
Comment #2 by clugdbug — 2011-03-03T17:52:46Z
(In reply to comment #1)
> Interestingly, if the code is written as:
>
> bool foo(long v)
> {
> return (v & 1) == 1;
> }
>
> the code generated is:
>
> mov EAX,4[ESP]
> mov EDX,8[ESP]
> and EAX,1
> xor EDX,EDX
> ret 8
I noticed that. And even though that's better, both uses of EDX are completely unnecessary.
Changing cgelem.c, elcmp(), around line 3350 to this:
case 8:
- e = el_una(OP64_32,TYlong,e);
+ e->E1 = el_una(OP64_32,TYint,e->E1);
+ e->E2 = el_una(OP64_32,TYint,e->E2);
break;
makes it create optimal code, although that's probably incorrect for 64 bits.
The way elcmp() works looks pretty bizarre to me.
But it's the return ( v & 1); case that is the primary problem.
Comment #3 by dmitry.olsh — 2018-05-22T15:02:02Z
Now on 2.080 32-bit it's much better:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 8b 54 24 08 mov 0x8(%esp),%edx
8: 25 01 00 00 00 and $0x1,%eax
d: 31 d2 xor %edx,%edx
f: c2 08 00 ret $0x8
And 64-bit (barring the rbp/rsp that can be elided but a different matter):
0: 55 push %rbp
1: 48 8b ec mov %rsp,%rbp
4: 48 81 e7 01 00 00 00 and $0x1,%rdi
b: 48 89 f8 mov %rdi,%rax
e: 5d pop %rbp
f: c3 retq