← Back to index | Original Bugzilla link

Bug 20838 – on modern (x86_64) CPUs, dmd emit cmpxchg8b instead of CMPXCHG16B

Status: RESOLVED
Resolution: INVALID
Severity: blocker
Priority: P1
Component: dmd
Product: D
Version: D2
Platform: x86_64
OS: Linux
Creation time: 2020-05-17T19:35:01Z
Last change time: 2022-05-18T00:03:35Z
Assigned to: No Owner
Creator: mw

Attachments

ID	Filename	Summary	Content-Type	Size
1788	c.d	on modern (x86_64) CPUs, dmd emit cmpxchg8b instead of CMPXCHG16B	text/plain	159

Comments

Comment #0 by mingwu — 2020-05-17T19:35:01Z

Created attachment 1788 on modern (x86_64) CPUs, dmd emit cmpxchg8b instead of CMPXCHG16B $ cat c.d -------------------------------------------------------------------------------- import std.stdio; import core.atomic; struct N { N* prev; N* next; } shared(N) n; void main() { cas(&n, n, n); writeln(size_t.sizeof*2, N.sizeof); // output 16 16 } -------------------------------------------------------------------------------- $ dmd -m64 c.d $ /usr/bin/obj2asm c.o > c.o.asm $ grep -i xchg c.o.asm cmpxchg8b [R8] However $ grep flags /proc/cpuinfo | head -1 | grep 16 -------------------------------------------------------------------------------- flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d -------------------------------------------------------------------------------- in particular, the flag cx16 is there: CX16 * Supports CMPXCHG16B instruction https://docs.microsoft.com/en-us/sysinternals/downloads/coreinfo $ uname -a Linux titan 4.15.0-99-generic #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ dmd --version DMD64 D Compiler v2.092.0 Copyright (C) 1999-2020 by The D Language Foundation, All Rights Reserved written by Walter Bright

Comment #1 by mingwu — 2020-05-17T19:37:48Z

BTW, I only tested on x86_64 Linux, I think this bug on other platform too MacOS, Windows (on modern x86_64 CPUs with CX16 support).

Comment #2 by pro.mathias.lang — 2020-05-17T20:26:18Z

Doesn't affect OSX (used objdump, grepped for cmpxchg).

Comment #3 by mingwu — 2020-05-17T22:40:38Z

Yes, verified, not on MacOS. Thank you (at least there is a system I can use now). $ objdump -disassemble-all c.o > c.o.asm $ grep -i cmpxchg c.o.asm 80a: 49 0f c7 08 cmpxchg16b (%r8) $ uname -a Darwin 19.4.0 Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64 x86_64

Comment #4 by greeenify — 2020-05-17T23:38:37Z

> at least there is a system I can use now). Well, you could just use LDC like almost everyone else ;-)

Comment #5 by mingwu — 2020-05-18T02:15:27Z

LDC ? did I miss sth? -------------------------------------------------------------------------------- $ ldc2 -m64 -c c.d $ obj2asm c.o > c.o.asm $ grep -i xchg c.o.asm cmpxchg8b [RSI] cmpxchg8b [RSI] cmpxchg8b [RSI] cmpxchg8b [RSI] cmpxchg8b [RSI] .data._D100TypeInfo_S3ldc10intrinsics__T13CmpXchgResultTS4core8internal6atomic__T11_AtomicTypeTS1c1NZ5UCentZQCq6__initZ segment _D100TypeInfo_S3ldc10intrinsics__T13CmpXchgResultTS4core8internal6atomic__T11_AtomicTypeTS1c1NZ5UCentZQCq6__initZ: .data._D100TypeInfo_S3ldc10intrinsics__T13CmpXchgResultTS4core8internal6atomic__T11_AtomicTypeTS1c1NZ5UCentZQCq6__initZ ends $ ldc2 --version LDC - the LLVM D compiler (1.21.0): based on DMD v2.091.1 and LLVM 10.0.0 built with LDC - the LLVM D compiler (1.21.0) Default target: x86_64-unknown-linux-gnu Host CPU: skylake http://dlang.org - http://wiki.dlang.org/LDC -------------------------------------------------------------------------------- Want to file a bug against LDC ?

Comment #6 by kinke — 2020-05-18T15:27:00Z

(In reply to mw from comment #5) > Want to file a bug against LDC ? No need, cmpxchg16 is used for all x86_64 CPUs: https://d.godbolt.org/z/HesA24 For the few old CPUs not supporting it, it can be disabled via `-mattr=-cx16` (but then it doesn't fall back to cmpxchg8 anway, so no idea how your results came about).

Comment #7 by mingwu — 2020-05-18T16:45:26Z

Hi kinke, > so no idea how your results came about I downloaded directly from: https://github.com/ldc-developers/ldc/releases/download/v1.21.0/ldc2-1.21.0-linux-x86_64.tar.xz And I just downloaded 1.20, which is on the d.godbolt.org page you mentioned, but the result is the same: -------------------------------------------------------------------------------- $ wget https://github.com/ldc-developers/ldc/releases/download/v1.20.0/ldc2-1.20.0-linux-x86_64.tar.xz $ ldc2 -m64 -c c.d $ obj2asm c.o > c.o.asm $ grep -i xchg c.o.asm cmpxchg8b [RSI] cmpxchg8b [RSI] cmpxchg8b [RSI] cmpxchg8b [RSI] cmpxchg8b [RSI] .data._D100TypeInfo_S3ldc10intrinsics__T13CmpXchgResultTS4core8internal6atomic__T11_AtomicTypeTS1c1NZ5UCentZQCq6__initZ segment _D100TypeInfo_S3ldc10intrinsics__T13CmpXchgResultTS4core8internal6atomic__T11_AtomicTypeTS1c1NZ5UCentZQCq6__initZ: .data._D100TypeInfo_S3ldc10intrinsics__T13CmpXchgResultTS4core8internal6atomic__T11_AtomicTypeTS1c1NZ5UCentZQCq6__initZ ends $ ldc2 --version LDC - the LLVM D compiler (1.20.0): based on DMD v2.090.1 and LLVM 9.0.1 built with LDC - the LLVM D compiler (1.20.0) Default target: x86_64-unknown-linux-gnu Host CPU: skylake http://dlang.org - http://wiki.dlang.org/LDC $ uname -a Linux titan 4.15.0-99-generic #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux -------------------------------------------------------------------------------- What else should I check? Or, can try my step on a Linux box?

Comment #8 by mingwu — 2020-05-18T16:56:00Z

And for GDC: -------------------------------------------------------------------------------- $ gdc-10 -m64 -c c.d $ obj2asm c.o > c.o.asm $ grep -i xch c.o.asm extrn __atomic_compare_exchange call __atomic_compare_exchange@PLT32 -------------------------------------------------------------------------------- On this page: https://d.godbolt.org/z/HesA24 I changed the compiler to "gdc 9.2.0", and searched the window, and search for 'xch': mov rsi, rax mov edi, 16 call __atomic_compare_exchange mov BYTE PTR [rbp-1], al .loc 3 1413 9 movzx eax, BYTE PTR [rbp-1] I'm not an asm guy, can someone help to read where is this __atomic_compare_exchange? and confirm it's calling the correct CMPXCHG16B? Thanks.

Comment #9 by mingwu — 2020-05-18T16:56:37Z

Oh, $ gdc-10 --version gdc-10 (Debian 10.1.0-1) 10.1.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Comment #10 by kinke — 2020-05-18T17:27:44Z

(In reply to mw from comment #7) > What else should I check? Wrt. LDC, I'm almost certain this is solely an issue with your 'workflow' involving obj2asm. - Godbolt runs on Linux (but you can inspect the produced assembly for any target using LDC's -mtriple option).

Comment #11 by mingwu — 2020-05-18T21:37:46Z

> this is solely an issue with your 'workflow' involving obj2asm. Thank you. It's indeed the problem of obj2asm, with gnu's tool objdump: -------------------------------------------------------------------------------- $ ldc2 -m64 c.d $ objdump -S --disassemble c > c.asm $ grep -i cmpxch c.asm f5b3: f0 48 0f c7 0e lock cmpxchg16b (%rsi) 1ecc3: f0 48 0f b1 3a lock cmpxchg %rdi,(%rdx) 1ed32: f0 40 0f b0 3a lock cmpxchg %dil,(%rdx) 1ed42: 66 f0 0f b1 3a lock cmpxchg %di,(%rdx) 30563: 48 8d 15 ba ea 01 00 lea 0x1eaba(%rip),%rdx # 4f024 <_D4core5cpuid13_hasCmpxchg8byb> 30573: 48 8d 15 ab ea 01 00 lea 0x1eaab(%rip),%rdx # 4f025 <_D4core5cpuid14_hasCmpxchg16byb> 30da6: f0 48 0f b1 3c ce lock cmpxchg %rdi,(%rsi,%rcx,8) -------------------------------------------------------------------------------- I found the cmpxchg16b instruction. But I'm not sure what the other 'cmpxchg' is. Can some asm expert help explain? BTW: I find another issue with LDC: with this code on https://d.godbolt.org/z/HesA24 i.e. remove the import std.stdio and writeln -------------------------------------------------------------------------------- $ cat c.d //import std.stdio; import core.atomic; struct N { N* prev; N* next; } shared(N) n; void main() { cas(&n, n, n); //writeln(size_t.sizeof*2, N.sizeof); } $ ldc2 -m64 c.d $ ./c Segmentation fault (core dumped) $ ldc2 --version LDC - the LLVM D compiler (1.20.0): based on DMD v2.090.1 and LLVM 9.0.1 built with LDC - the LLVM D compiler (1.20.0) Default target: x86_64-unknown-linux-gnu Host CPU: skylake http://dlang.org - http://wiki.dlang.org/LDC -------------------------------------------------------------------------------- Although on https://d.godbolt.org/z/HesA24 the "Output" dropdown has an option "Run the compiled binary", I select that, but didn't see the result. With import std.stdio and writeln, the LDC output behave normally (no segfault): -------------------------------------------------------------------------------- $ ldc2 -m64 c.d $ ./c 1616 -------------------------------------------------------------------------------- Can you try if you can reproduce this segfault on a local Linux box?

Comment #12 by kinke — 2020-05-18T22:09:40Z

(In reply to mw from comment #11) > Can you try if you can reproduce this segfault on a local Linux box? We're abusing DMD's bug tracker, but anyway: you need to manually take care of required 16-bytes alignment: align(16) shared(N) n; // or `align(2 * size_t.sizeof)`

Comment #13 by mingwu — 2020-05-18T22:26:23Z

> you need to manually take care of required 16-bytes alignment: > align(16) shared(N) n; // or `align(2 * size_t.sizeof)` Thank you again! (I'm a newbie to D, not sure where is the best place to continue discuss this? pls let me know.) BUT: can the DMD compiler (after seeing the 'cas' call) take care of this alignment? either silently, or issue an warning message to the programmer? Can I log another bug for this suggestion of DMD compiler improvement? The current behavior that I just discovered is definitely a puzzle for a D newbie like me. With a smarter compiler, it will help new users.

Comment #14 by kinke — 2020-05-19T15:44:42Z

(In reply to mw from comment #13) > Can I log another bug for this suggestion of DMD compiler improvement? Sure. The druntime library is supposed to take care of this already, at least with enabled contracts, see https://github.com/dlang/druntime/blob/48082ac4e4aa1a3c9f1a1ef87659c941dae0f7f6/src/core/atomic.d#L624-L655. It only checks for insufficient size_t alignment though, so that needs to be fixed. Wrt. original DMD issue here, DMD is supposed to use cmpxchg16b already, see https://github.com/dlang/druntime/blob/48082ac4e4aa1a3c9f1a1ef87659c941dae0f7f6/src/core/internal/atomic.d#L582. As it apparently doesn't, I guess the bug is in DMD's codegen.

Comment #15 by maxhaton — 2022-05-17T20:30:28Z

I think this is a bug in the dmd inline assembler implementation............................................ Fun.

Comment #16 by maxhaton — 2022-05-17T20:47:38Z

Actually I should've read the thread. Turns out it is indeed a problem with Walters disassembler. Even more fun.

Comment #17 by maxhaton — 2022-05-18T00:03:35Z

Closing as invalid