Bug 17454 – ABI non-conformity: produces unaligned placement of struct on stack
Status
RESOLVED
Resolution
INVALID
Severity
blocker
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Windows
Creation time
2017-05-30T04:47:00Z
Last change time
2017-06-02T06:11:03Z
Assigned to
nobody
Creator
turkeyman
Comments
Comment #0 by turkeyman — 2017-05-30T04:47:13Z
I have this problem that I don't understand, and I can't reproduce in a reduced test-case.
When I simplify the program structure (or change it at all really), the addresses of variables all change and the problem may disappear.
I can only produce a high-level outline of what my code does. My code has a lot more 'stuff'.
Compiling for Win64, given something like this:
D:
--------
class C {}
struct S { C c; } // struct with a single pointer member
extern (C++) S d_fun()
{
return S(new C); // <- extern(C++) function returns the struct S by value
}
C++:
--------
class C {};
struct S
{
C *c;
S(S &&rval)
: c(rval.c)
{
rval.c = nullptr;
}
};
void test()
{
S result(d_fun()); // move-construct result from d-function
}
--------
This is the context, but there is lots of additional noise in d_fun().
So, what goes wrong is, d_fun returns a struct by value which appears by looking at the disassembly that d_fun allocates room for the result value on it's stack, and then returns a pointer to the result in RAX. When the function returns, the C++ calling code expects a pointer to the result value in RAX, and it goes from there.
This appears to be happening correctly; when d_fun returns, RAX is a valid pointer to the result struct (looking at the debug info, it appears to be a value called '__HID1'), but I see a crash in C++ shortly after inside the destructor for the result rvalue.
I eventually realised the reason for the crash; even though the rvalue pointer is a pointer to valid data, I noticed that the rvalue pointer (ie, __HID1) is always out by 1 byte, in this case: 0x000000884398e917.
Debug info confirms that __HID1's address is in that odd location.
The struct S has a single pointer member, which is 8-byte aligned, which means the whole S struct must be 8-byte aligned, I can confirm that is(typeof(__HID1) == S), but __HID1 is not aligned to 8 bytes.
When C++ receives this pointer from d_fun in RAX, it accesses the pointer directly a couple of times and gets proper data, but then MSVC seems to make some assumption a little later that the pointer is 8-byte-aligned and allows code generation that truncates the bottom 3 bits, and acts on data out of alignment causing a crash.
In my case, it is during the move constructor I illustrated; &&rval is the unaligned pointer to __HID1. First it initialises this->c from rval.c, which follows the unaligned pointer and correctly initialises this->c. It then writes nullptr to rval.c, but it writes `0` to *aligned* rval instead of the unaligned it received as function argument. The destructor later operates on a bad pointer because `c` is not nullptr as expected.
So TL;DR: allocation of the result value, when returning a struct by-value, known by the debug info as __HID1, in an extern(C++) function, appears to be capable on not aligning the result struct correctly. If I change my code around randomly, the value often becomes properly aligned, and then the crash disappears as expected.
I can't find any pattern that reliably affects the placement of __HID1, I even tried adding `align(8) struct S { C c; }`, but that had no affect and the rvalue was still unaligned.
Can anybody imagine any reason that the placement of the rvalue struct may become unaligned in this case where returning struct from extern(C++) by-value?
Like I say, it doesn't happen in simple cases; when I try and fabricate a repro, I can't reproduce the problem. Something about the complexity of my d_fun affects placement of __HID1.
Perhaps put an assert in the compiler that the rvalue struct is aligned correctly in these conditions, and ICE if it's not? We'll have a better chance of catching it then, as it might show up unnoticed in more common cases, but rarely find itself in conjunction with MSVC++ code that (reasonably) assumes proper alignment.
Comment #1 by bugzilla — 2017-05-30T06:45:46Z
Let's start by compiling the D code, and disassembling d_fun() with obj2asm to get:
push RBP
mov RBP,RSP
sub RSP,010h
lea RCX,[00h][RIP]
sub RSP,020h
call _d_newclass
add RSP,020h
mov -8[RBP],RAX
mov RAX,-8[RBP]
mov RSP,RBP
pop RBP
ret
So RAX becomes the instance of C, and RAX is returned. Nothing wrong with the D code here. Now looking at the C++ code, it doesn't compile. Adding in the declaration of d_fun() and disassembling:
sub RSP,038h
lea RCX,020h[RSP]
call d_fun
add RSP,038h
ret
which is what one expects. I don't know where to go from here. Since you say there is a misalignment, I suggest disassembling the code with the misalignment in it and seeing if it is on the D side or the C++ side.
Comment #2 by bugzilla — 2017-05-30T06:48:27Z
> d_fun returns a struct by value which appears by looking at the disassembly that d_fun allocates room for the result value on it's stack, and then returns a pointer to the result in RAX.
D never does that, because then d_fun() would be returning a pointer into its own stack, which has expired and hence is a big no-no.
What does happen is d_fun()'s caller allocates S, and then passes a pointer to S to d_fun(), and d_fun() fills it in and returns that pointer to S.
But if S fits in a register, this does not happen, as the disassembly shows.
Comment #3 by doob — 2017-05-30T06:58:26Z
(In reply to Manu from comment #0)
> I have this problem that I don't understand, and I can't reproduce in a
> reduced test-case.
Have you tried running Digger [1] to try to reduce the test case?
[1] https://github.com/CyberShadow/Digger
Comment #4 by turkeyman — 2017-05-30T08:28:57Z
> What does happen is d_fun()'s caller allocates S, and then passes a pointer to S to d_fun(), and d_fun() fills it in and returns that pointer to S.
Interesting... so, what is __HID1 as expressed in the debuginfo? It appears to be local to d_fun. It is(typeof(__HID1) == S), not S*, and it's the address of __HID1 that is returned...
Comment #5 by turkeyman — 2017-05-30T08:37:46Z
So, you're saying that MSVC is passing an output pointer as first arg? I'll try and confirm that. I didn't see that happening.
Comment #6 by kinke — 2017-05-30T17:51:46Z
What might be the issue here is that you need to take care of PODness when declaring structs in D and C++. The C++ move ctor may make it a non-POD (which are returned via hidden sret pointer, even if they fit in a register), while the D definition is a POD.
I had such an issue a while back when I added a C++ ctor (for convenience).
Comment #7 by turkeyman — 2017-06-02T06:11:03Z
Okay, I've spent days debugging, and this is all wrong. I've discovered more, and have a repro, I'll post a new bug.