Comment #0 by WorksOnMyMachine — 2012-05-05T00:31:58Z
There are a number of opcodes that are missing, but some are far more critical than others, more or less listed here in order of most important first:
missing store instructions (and some loads)
STOSS
STOSD
STOAPS
STOAPD
STOD
STOQ
(there are a few others scattered in the enum table)
movemask (critical for doing branching tests against simd registers):
MOVMSKPD
MOVMSKPS
missing comparisons
CMPPS
CMPPD
CMPSD
CMPSS
missing conversions
CVTPS2PI
CVTSD2SI
CVTSI2SD
CVTSI2SS
CVTSS2SI
CVTTPD2PI
CVTTPS2PI
CVTTSD2SI
CVTTSS2SI
Comment #1 by Marco.Leise — 2013-11-22T16:43:26Z
Some mnemonics like PMOVMSKB cannot even be expressed with the interface that is offered. It returns a 32-bit word consisting of only the high bit of every byte in the MMX or SSE register.
Since I've tried other workarounds up inline asm and hard coding hex values and nothing worked, I've set this bug to 'major'.
The inline asm workaround usually ends in this:
Internal error: backend/cgcod.c 1561
But that's not what this bug report is about. I'm just stating that there are more SIMD bugs lurking under the surface.
Comment #2 by john.loughran.colvin — 2014-12-10T16:58:20Z
Also missing is PCMPGT[SDQ]
Can they just be added to the druntime file or are compiler modifications necessary?
//PMOVMSKB = 0x660FD7,
has been commented out in core.simd. We may as well comment out all instructions returning non-XMM values until this is resolved. The ones I could find so far are:
COMISD
COMISS
CVTSD2SI
CVTSS2SI
CVTTPD2PI
CVTTPS2PI
CVTTSD2SI
CVTTSS2SI
MASKMOVDQU
MASKMOVQ
MOVMSKPD
MOVMSKPS
PCMPESTRI
PCMPISTRI
PMOVMSKB
PTEST
UCOMISS
UCOMISD
CRC32, POPCNT and LZCNT don't belong in the XMM enum. They were introduced side-by-side with SSE4.2, but don't work on XMM registers and the latter two have their separate CPUID flags.
Comment #5 by bugzilla — 2016-11-22T01:04:30Z
These have been in core.simd for a while.
Comment #6 by Marco.Leise — 2016-11-22T08:36:34Z
(In reply to Walter Bright from comment #5)
> These have been in core.simd for a while.
While that is true for the original bug description, the hard issue is not missing enum values themselves, but a lack of support for them, namely returning something else than SIMD vectors as I outlined in comment #1 and #4 above. The XMM enum is still rather messy if you look at it from some distance:
There are some non-SSE opcodes in it as noted in their comment (i.e. POPCNT and LZCNT have nothing to do with SSE). They should be handled in core.bitop instead, IMHO.
Some non-working opcodes are rightfully commented out until this bug is resolved (i.e. PMOVMSKB).
Other non-working opcodes are NOT commented out (i.e. MOVMSKPD from the original description, see comment #4 for a list).
AMD's SSE4a seems to have an undecided fate with its opcodes commented out in entirety. This may be consider a separate bug, but then again, whoever works on this bug will probably look at them as well.
The ddoc for XMM still says: "XMM opcodes that conform to the following: opcode xmm1,xmm2/mem and do not have side effects (i.e. do not write to memory)." This description doesn't apply to e.g. CRC32 or PREFETCH.
DMD + core.simd still need some work to move SIMD support out of proof-of-concept phase. Admittedly I didn't run any tests since 2015, so if any of the above is in good shape now, shame on me. :)
Comment #7 by aliloko — 2021-01-07T13:55:33Z
Hello,
Can't implement the following intrinsics for DMD:
_mm_movemask_ps needs MOVMSKPS support, as Marco Leise said 7 years ago it is an instruction that return in a general purpose register instead of an XMM register.
----------------------------------------------------
int _mm_movemask_ps (__m128 a) pure @trusted
{
static if (DMD_with_DSIMD)
{
// suggested API ? This API returning an int doesn't exist in core.simd
int res = __simd_int(XMM.MOVMSKPS, a);
return res;
}
else static if (GDC_with_SSE)
{
return __builtin_ia32_movmskps(a);
}
else static if (LDC_with_SSE1)
{
return __builtin_ia32_movmskps(a);
}
else
{
int4 ai = cast(int4)a;
int r = 0;
if (ai.array[0] < 0) r += 1;
if (ai.array[1] < 0) r += 2;
if (ai.array[2] < 0) r += 4;
if (ai.array[3] < 0) r += 8;
return r;
}
}
----------------------------------------------------
Same remark for:
- _mm_movemask_epi8 (pmovmskb),
- _mm_movemask_pd (movmskpd),
Comment #8 by robert.schadek — 2024-12-07T13:31:59Z