← Back to index | Original Bugzilla link

Bug 20112 – __vector casts don't do type conversions

Status: RESOLVED
Resolution: INVALID
Severity: major
Priority: P1
Component: dmd
Product: D
Version: D2
Platform: x86_64
OS: Linux
Creation time: 2019-08-07T01:46:16Z
Last change time: 2020-12-23T12:09:38Z
Keywords: SIMD
Assigned to: No Owner
Creator: thomas.bockman

Comments

Comment #0 by thomas.bockman — 2019-08-07T01:46:16Z

This program should print 3, but instead it prints 1077936128, as if I had asked for a reinterpret cast instead of conversion. void main() { import std.stdio; __vector(float[4]) f = [3, 2, 1, 0]; __vector(int[4]) i = cast(__vector(int[4])) f; writeln(i[0]); } This problem affects nearly all combinations of component types. The compiler needs to emit actual conversion instructions, not just typed moves. From my research, the fastest combination on X86_64 with AVX seems to be: 0) VPMOVSX or VPMOVZX to extend byte, ubyte, short, or ushort 1) VCVT to convert between any combination of int, uint, long, ulong, float, or double 2) VPSHUFB to truncate int/uint or long/ulong to short/ushort or byte/ubyte

Comment #1 by ibuclaw — 2019-08-09T04:44:33Z

That's because `__vector(int[4]) i = cast(__vector(int[4])) f;` is a reinterpret cast. Semantically, this can only be done by unrolling the assignment, but probably easier to do this in phobos std.conv instead. private T to(T, S)(S value) { alias E = typeof(T.init[0]); T res = void; static foreach (i; 0 .. S.length) res[i] = cast(E)value[i]; return res; } void main() { import std.stdio; __vector(float[4]) f = [3, 2, 1, 0]; __vector(int[4]) i = to!(__vector(int[4]) f; writeln(i[0]); }

Comment #2 by thomas.bockman — 2019-08-09T04:57:29Z

That is very surprising. There is already a way to express reinterpretation casts: `*cast(T*) &variable`. Why is it necessary to overload the conversion syntax in such a confusing fashion? Is this documented anywhere in the language standard?

Comment #3 by thomas.bockman — 2019-08-09T05:02:53Z

> Semantically, this can only be done by unrolling the assignment I've found that this is very unreliable. Sometimes the optimizer correctly replaces the individual component casts with the SIMD conversion instructions, and sometimes it doesn't. On LLVM, at least, inlining sometimes undoes the optimization. I haven't been able to get this working reliably without resorting to inline assembly language.

Comment #4 by aliloko — 2020-12-11T07:40:25Z

For intel-intrinsics it is very handy that this cast is a reinterpret cast (like it is in C and C++...)

Comment #5 by bugzilla — 2020-12-23T09:57:01Z

It is indeed a reinterpret cast, although https://issues.dlang.org/show_bug.cgi?id=21469 would cause that not to work sometimes. It is this way because of consistency with how casting of static arrays works: import core.stdc.stdio; void main() { byte[16] b = 3; int[4] ia = cast(int[4]) b; foreach (i; ia) printf("%x\n", i); } which prints: 3030303 3030303 3030303 3030303 It is working as designed. At this point, I don't think this can be changed even if we wanted to.

Comment #6 by bugzilla — 2020-12-23T10:25:34Z

Added a couple spec pulls to clarify: https://github.com/dlang/dlang.org/pull/2924 https://github.com/dlang/dlang.org/pull/2925

Comment #7 by ibuclaw — 2020-12-23T12:09:38Z

(In reply to thomas.bockman from comment #3) > > Semantically, this can only be done by unrolling the assignment > > I've found that this is very unreliable. Sometimes the optimizer correctly > replaces the individual component casts with the SIMD conversion > instructions, and sometimes it doesn't. On LLVM, at least, inlining > sometimes undoes the optimization. > > I haven't been able to get this working reliably without resorting to inline > assembly language. Just having a quick look, it requires -O3 in order to coerce out a 'cvttps2dq' instruction. To make it consistent, you can set @optimize and @target attributes on the function (I think it works identically for both gdc and ldc).