Bug 16550 – Generic SIMD shuffle for Mir

Status
NEW
Severity
enhancement
Priority
P4
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2016-09-27T09:17:38Z
Last change time
2024-12-13T18:50:19Z
Keywords
performance, SIMD
Assigned to
No Owner
Creator
Илья Ярошенко
Moved to GitHub: dmd#19197 →

Comments

Comment #0 by ilyayaroshenko — 2016-09-27T09:17:38Z
Vec1 re0 im0 re1 im1 re2 im2 re3 im3 // __vector(float[8]) Vec2 re4 im4 re5 im5 re6 im6 re7 im7 // __vector(float[8]) < unpack ------ pack > VecReal re0 re1 re2 re3 re4 re5 re6 re7 // __vector(float[8]) VecIm im0 im1 im2 im3 im4 im5 im6 im7 // __vector(float[8])
Comment #1 by bugzilla — 2016-09-27T10:34:41Z
Could you please add some example code that should compile and what it should do?
Comment #2 by ilyayaroshenko — 2016-09-27T10:45:50Z
__vector(float[8]) a; // re0 im0 re1 im1 re2 im2 re3 im3 __vector(float[8]) b; // re4 im4 re5 im5 re6 im6 re7 im7 // Packing __vector(float[8]) re = extractRe(a, b); // re0 re1 re2 re3 re4 re5 re6 re7 __vector(float[8]) im = extractIm(a, b); // im0 im1 im2 im3 im4 im5 im6 im7 // Unpacking __vector(float[8]) c = mix0(re, im); //re0 im0 re1 im1 re2 im2 re3 im3 __vector(float[8]) d = mix1(re, im); //re4 im4 re5 im5 re6 im6 re7 im7 assert(c == a); assert(d == b);
Comment #3 by bugzilla — 2016-11-16T07:20:38Z
The example does not compile with gdc.
Comment #4 by bugzilla — 2016-11-16T07:22:37Z
I also cannot find any documentation on extractRe() and extractIm(). I googled for "ldc extractre extractim" and there were no results.
Comment #5 by ibuclaw — 2016-11-16T09:38:26Z
I can only think of movshdup/movsldup for extract(), but that doesn't quite do what is being asked. --- import gcc.builtins; float4 a = [1,2,3,4]; auto b = __builtin_ia32_movshdup(a); // [1,1,3,3] auto c = __builtin_ia32_movshdup(a); // [2,2,4,4] --- Maybe you should provide assembly of what you'd like to be done. But on the surface, it doesn't look like it falls into the category of compiler intrinsics, more like library code.
Comment #6 by ilyayaroshenko — 2016-11-16T10:24:23Z
(In reply to Walter Bright from comment #4) > I also cannot find any documentation on extractRe() and extractIm(). I > googled for "ldc extractre extractim" and there were no results. LLVM has a very generic instruction http://llvm.org/docs/LangRef.html#shufflevector-instruction / https://github.com/ldc-developers/druntime/blob/1fa60c4f5516e63a5050255c5757f48c31273ec3/src/ldc/simd.di#L121, which is used in GLAS to perform required permutations.
Comment #7 by bugzilla — 2016-11-16T11:35:25Z
Looks like a simple wrapper could be put around the PSHUFD instruction.
Comment #8 by turkeyman — 2016-11-20T02:16:59Z
I've wanted to put a SIMD helper library in std for ages. Ie, not intended to present raw arch-specific intrinsics like core.simd, but useful functions typically implemented as small compound operations. LLVM kinda does this already; it presents SIMD in a fairly abstract high-level way, and codegen's aggressively. We could get a lot of that value from a phobos lib I think.
Comment #9 by ibuclaw — 2016-11-26T22:02:03Z
(In reply to Manu from comment #8) > I've wanted to put a SIMD helper library in std for ages. Ie, not intended > to present raw arch-specific intrinsics like core.simd, but useful functions > typically implemented as small compound operations. > LLVM kinda does this already; it presents SIMD in a fairly abstract > high-level way, and codegen's aggressively. We could get a lot of that value > from a phobos lib I think. Yeah, two high level abstractions you could expose that should be well understood by compilers are vector permutation/shuffle, and vector conditions.
Comment #10 by robert.schadek — 2024-12-13T18:50:19Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/19197 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB