Bug 20112 – __vector casts don't do type conversions

Status
RESOLVED
Resolution
INVALID
Severity
major
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2019-08-07T01:46:16Z
Last change time
2020-12-23T12:09:38Z
Keywords
SIMD
Assigned to
No Owner
Creator
thomas.bockman

Comments

Comment #0 by thomas.bockman — 2019-08-07T01:46:16Z
This program should print 3, but instead it prints 1077936128, as if I had asked for a reinterpret cast instead of conversion. void main() { import std.stdio; __vector(float[4]) f = [3, 2, 1, 0]; __vector(int[4]) i = cast(__vector(int[4])) f; writeln(i[0]); } This problem affects nearly all combinations of component types. The compiler needs to emit actual conversion instructions, not just typed moves. From my research, the fastest combination on X86_64 with AVX seems to be: 0) VPMOVSX or VPMOVZX to extend byte, ubyte, short, or ushort 1) VCVT to convert between any combination of int, uint, long, ulong, float, or double 2) VPSHUFB to truncate int/uint or long/ulong to short/ushort or byte/ubyte
Comment #1 by ibuclaw — 2019-08-09T04:44:33Z
That's because `__vector(int[4]) i = cast(__vector(int[4])) f;` is a reinterpret cast. Semantically, this can only be done by unrolling the assignment, but probably easier to do this in phobos std.conv instead. private T to(T, S)(S value) { alias E = typeof(T.init[0]); T res = void; static foreach (i; 0 .. S.length) res[i] = cast(E)value[i]; return res; } void main() { import std.stdio; __vector(float[4]) f = [3, 2, 1, 0]; __vector(int[4]) i = to!(__vector(int[4]) f; writeln(i[0]); }
Comment #2 by thomas.bockman — 2019-08-09T04:57:29Z
That is very surprising. There is already a way to express reinterpretation casts: `*cast(T*) &variable`. Why is it necessary to overload the conversion syntax in such a confusing fashion? Is this documented anywhere in the language standard?
Comment #3 by thomas.bockman — 2019-08-09T05:02:53Z
> Semantically, this can only be done by unrolling the assignment I've found that this is very unreliable. Sometimes the optimizer correctly replaces the individual component casts with the SIMD conversion instructions, and sometimes it doesn't. On LLVM, at least, inlining sometimes undoes the optimization. I haven't been able to get this working reliably without resorting to inline assembly language.
Comment #4 by aliloko — 2020-12-11T07:40:25Z
For intel-intrinsics it is very handy that this cast is a reinterpret cast (like it is in C and C++...)
Comment #5 by bugzilla — 2020-12-23T09:57:01Z
It is indeed a reinterpret cast, although https://issues.dlang.org/show_bug.cgi?id=21469 would cause that not to work sometimes. It is this way because of consistency with how casting of static arrays works: import core.stdc.stdio; void main() { byte[16] b = 3; int[4] ia = cast(int[4]) b; foreach (i; ia) printf("%x\n", i); } which prints: 3030303 3030303 3030303 3030303 It is working as designed. At this point, I don't think this can be changed even if we wanted to.
Comment #6 by bugzilla — 2020-12-23T10:25:34Z
Comment #7 by ibuclaw — 2020-12-23T12:09:38Z
(In reply to thomas.bockman from comment #3) > > Semantically, this can only be done by unrolling the assignment > > I've found that this is very unreliable. Sometimes the optimizer correctly > replaces the individual component casts with the SIMD conversion > instructions, and sometimes it doesn't. On LLVM, at least, inlining > sometimes undoes the optimization. > > I haven't been able to get this working reliably without resorting to inline > assembly language. Just having a quick look, it requires -O3 in order to coerce out a 'cvttps2dq' instruction. To make it consistent, you can set @optimize and @target attributes on the function (I think it works identically for both gdc and ldc).