← Back to index | Original Bugzilla link

Bug 17484 – high penalty for vbroadcastsd with -mcpu=avx

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P3
Component: dmd
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2017-06-09T03:58:02Z
Last change time: 2017-08-16T13:23:43Z
Assigned to: No Owner
Creator: Martin Nowak

Comments

Comment #0 by code — 2017-06-09T03:58:02Z

With -mcpu=avx, the compiler emits vbroadcastsd ymm2, qword ptr [rsp] even when initializing only 128-bit wide double2 variables. This causes a high 50-80 cycle penalty when later some legacy SSE instruction is used with such a register value (or a derived value), because the CPU does not know that the upper bits are zero, and apparently preserves them in an internal register buffer. https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx We should A not write to 256-bit wide YMM registers when only 128-bit wide XMM registers are used, and B avoid mixing legacy encoded SSE instructions (movsd) with vex encoded AVX-128 instructions, i.e. use vmovsd instead of movsd.

Comment #1 by github-bugzilla — 2017-07-17T19:52:40Z

Commit pushed to master at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/1f11aa0eb8f6087b7dbadeb770e4526ec9808ccc fix Issue 17484 - high penalty for AVX-256 instructions with AVX-128 regs - as the upper 128-bits are no longer zero, the CPU will save/restore them when that register is used with legacy SSE instructions - avoid using vbroadcastsd which is a AVX-256 only instruction to initialize 128-bit XMM vectors

Comment #2 by github-bugzilla — 2017-08-07T13:17:30Z

Commit pushed to newCTFE at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/1f11aa0eb8f6087b7dbadeb770e4526ec9808ccc fix Issue 17484 - high penalty for AVX-256 instructions with AVX-128 regs

Comment #3 by github-bugzilla — 2017-08-16T13:23:43Z

Commit pushed to stable at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/1f11aa0eb8f6087b7dbadeb770e4526ec9808ccc fix Issue 17484 - high penalty for AVX-256 instructions with AVX-128 regs