Bug 5728 – "rol" in core.bitop

Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2011-03-10T14:10:00Z
Last change time
2011-08-09T18:08:29Z
Assigned to
nobody
Creator
bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2011-03-10T14:10:33Z
I'd like the "rol" X86 instrinsic added to core.bitop module. -------------------- Rationale: I'd like the intrinsic because rotating bits in a 32 bit number is a common enough operation (there was recently a thread in D.learn of a person in need for the fastest rotation of the bytes of a uint), and in C-derived languages there isn't a bitwise operator for it, while most CPUs have four or more instructions to rotate a register. In C-like languages the problem is sometimes solved making the compiler a bit smarter, but this little test shows DMD is not able to optimize two shifts plus or into a single rot instruction: dmd -O -release -inline test.d int main(string[] args) { uint x = args.length; return (x << 24) | (x >> 8); } __Dmain comdat mov EAX,4[ESP] mov ECX,4[ESP] shl EAX,018h shr ECX,8 or EAX,ECX ret -------------------- Using inline asm is an option: int main(string[] args) { uint x = args.length; asm { rol x, 8; } return x; } __Dmain comdat push EBP mov EBP,ESP push EAX mov EAX,8[EBP] mov -4[EBP],EAX rol -4[EBP],8 mov EAX,-4[EBP] mov ESP,EBP pop EBP ret ----------------- But in some cases the inline gives problems, so I think an inlined intrinsic is more handy, safer, shorter, simpler to use: union Four { uint u; ubyte[4] a; } void main() { Four f; asm { rol f.u, 8; } } test.d(8): bad type/size of operands 'f.u' ----------------- Note: an alternative to the intrinsic is to introduce in D/DMD a standard syntax for inlined asm expressions using __asm(), as present in LDC, but they are quite less easy to use than a rot() function-like intrinsic (but they are much more useful than a single intrinsic): http://www.dsource.org/projects/ldc/wiki/InlineAsmExpressions
Comment #1 by bearophile_hugs — 2011-07-28T05:35:18Z
Comment #2 by destructionator — 2011-08-03T19:18:02Z
I just want to say that dmd commit is fantastic. We got a little embarrassed by that in a discussion a couple weeks ago! Re bearophiles request: it probably doesn't need an intrinsic since the optimzation should let a plain library function do it just as well.
Comment #3 by bugzilla — 2011-08-03T22:45:51Z
Adam's right.
Comment #4 by bearophile_hugs — 2011-08-07T11:23:10Z
As reference, DMD 2.055head compiles this D2 function: T rot(T)(T x, int shift) { return (x >> shift) | (x << (T.sizeof * 8 - shift)); } void main() { int a = 0b_1111_1111; int b = 0b_0000_0010; rot(a, b); } To (-O -release): _D4test10__T3rotTiZ3rotFiiZi comdat push EAX mov EAX,8[ESP] mov ECX,[ESP] sar EAX,CL mov ECX,020h mov EDX,8[ESP] sub ECX,[ESP] shl EDX,CL or EAX,EDX pop ECX ret 4
Comment #5 by bugzilla — 2011-08-09T17:34:51Z
(In reply to comment #4) > As reference, DMD 2.055head compiles this D2 function: > T rot(T)(T x, int shift) { > return (x >> shift) | (x << (T.sizeof * 8 - shift)); > } The mistake is this is not a correct rotate function. (x >> shift) is a signed right shift. Rewrite as (x >>> shift).
Comment #6 by bearophile_hugs — 2011-08-09T18:08:29Z
(In reply to comment #5) > The mistake is this is not a correct rotate function. (x >> shift) is a signed > right shift. Rewrite as (x >>> shift). Thank you, I'm wrong all the time. Now it shows the ror: _D4test10__T3rotTiZ3rotFNaNbxixiZi push EAX mov EAX,8[ESP] mov ECX,[ESP] ror EAX,CL pop ECX ret 4