Comment #0 by bearophile_hugs — 2011-03-10T14:10:33Z
I'd like the "rol" X86 instrinsic added to core.bitop module.
--------------------
Rationale:
I'd like the intrinsic because rotating bits in a 32 bit number is a common enough operation (there was recently a thread in D.learn of a person in need for the fastest rotation of the bytes of a uint), and in C-derived languages there isn't a bitwise operator for it, while most CPUs have four or more instructions to rotate a register.
In C-like languages the problem is sometimes solved making the compiler a bit smarter, but this little test shows DMD is not able to optimize two shifts plus or into a single rot instruction:
dmd -O -release -inline test.d
int main(string[] args) {
uint x = args.length;
return (x << 24) | (x >> 8);
}
__Dmain comdat
mov EAX,4[ESP]
mov ECX,4[ESP]
shl EAX,018h
shr ECX,8
or EAX,ECX
ret
--------------------
Using inline asm is an option:
int main(string[] args) {
uint x = args.length;
asm {
rol x, 8;
}
return x;
}
__Dmain comdat
push EBP
mov EBP,ESP
push EAX
mov EAX,8[EBP]
mov -4[EBP],EAX
rol -4[EBP],8
mov EAX,-4[EBP]
mov ESP,EBP
pop EBP
ret
-----------------
But in some cases the inline gives problems, so I think an inlined intrinsic is more handy, safer, shorter, simpler to use:
union Four {
uint u;
ubyte[4] a;
}
void main() {
Four f;
asm {
rol f.u, 8;
}
}
test.d(8): bad type/size of operands 'f.u'
-----------------
Note: an alternative to the intrinsic is to introduce in D/DMD a standard syntax for inlined asm expressions using __asm(), as present in LDC, but they are quite less easy to use than a rot() function-like intrinsic (but they are much more useful than a single intrinsic):
http://www.dsource.org/projects/ldc/wiki/InlineAsmExpressions
Comment #1 by bearophile_hugs — 2011-07-28T05:35:18Z
Comment #2 by destructionator — 2011-08-03T19:18:02Z
I just want to say that dmd commit is fantastic. We got a little embarrassed by that in a discussion a couple weeks ago!
Re bearophiles request: it probably doesn't need an intrinsic since the optimzation should let a plain library function do it just as well.
Comment #3 by bugzilla — 2011-08-03T22:45:51Z
Adam's right.
Comment #4 by bearophile_hugs — 2011-08-07T11:23:10Z
As reference, DMD 2.055head compiles this D2 function:
T rot(T)(T x, int shift) {
return (x >> shift) | (x << (T.sizeof * 8 - shift));
}
void main() {
int a = 0b_1111_1111;
int b = 0b_0000_0010;
rot(a, b);
}
To (-O -release):
_D4test10__T3rotTiZ3rotFiiZi comdat
push EAX
mov EAX,8[ESP]
mov ECX,[ESP]
sar EAX,CL
mov ECX,020h
mov EDX,8[ESP]
sub ECX,[ESP]
shl EDX,CL
or EAX,EDX
pop ECX
ret 4
Comment #5 by bugzilla — 2011-08-09T17:34:51Z
(In reply to comment #4)
> As reference, DMD 2.055head compiles this D2 function:
> T rot(T)(T x, int shift) {
> return (x >> shift) | (x << (T.sizeof * 8 - shift));
> }
The mistake is this is not a correct rotate function. (x >> shift) is a signed right shift. Rewrite as (x >>> shift).
Comment #6 by bearophile_hugs — 2011-08-09T18:08:29Z
(In reply to comment #5)
> The mistake is this is not a correct rotate function. (x >> shift) is a signed
> right shift. Rewrite as (x >>> shift).
Thank you, I'm wrong all the time. Now it shows the ror:
_D4test10__T3rotTiZ3rotFNaNbxixiZi
push EAX
mov EAX,8[ESP]
mov ECX,[ESP]
ror EAX,CL
pop ECX
ret 4