← Back to index | Original Bugzilla link

Bug 17448 – Move semantics cause memory corruption and cryptic bugs

Status: RESOLVED
Resolution: FIXED
Severity: critical
Priority: P1
Component: dmd
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2017-05-28T11:28:46Z
Last change time: 2022-05-30T17:14:37Z
Keywords: industry, safe
Assigned to: No Owner
Creator: Tomer Filiba (weka)
See also: https://issues.dlang.org/show_bug.cgi?id=17450, https://issues.dlang.org/show_bug.cgi?id=17457, https://issues.dlang.org/show_bug.cgi?id=18575, https://issues.dlang.org/show_bug.cgi?id=18576

Comments

Comment #0 by tomer — 2017-05-28T11:28:46Z

Using DMD64 D Compiler v2.074.0 on Linux (ubuntu 14.04) D's move-semantics are really bug-prone and have caused a bug we were chasing for roughly a week. `@disable this(this)` won't help, and it can be reproduced in totally safe code. Basically the ctor of a struct registers a timer to be called within XX seconds, and this timer is a delegate to a member of this struct. This simple example demonstrates it: void delegate() timeoutCallback; struct CallContext { ulong[10] data; @disable this(this); @disable void opAssign(ref CallContext); this(int seconds) { // this would be reactor.callIn(seconds, &handleTimeout); timeoutCallback = &handleTimeout; } void handleTimeout() {} } auto f() { return CallContext(18); } void main() { auto x = f(); writeln(&x, " vs ", timeoutCallback.ptr); // game over: 7FFC3C072640 vs 7FFC3C0725B0 } With this, it's very easy to produce memory corruption in @safe code: @safe: void delegate() timeoutCallback; struct CallContext { ulong[10] data; @disable this(this); @disable void opAssign(ref CallContext); this(int seconds) { timeoutCallback = &handleTimeout; } void handleTimeout() { data[0] = 12345; } } auto f() {auto x = g();} auto g() {return CallContext(17);} void h() { ulong[20] tmp; timeoutCallback(); foreach(i, n; tmp) { writeln(i, "=", n); } } void main() { f(); h(); } Which produces 0=0 1=0 2=0 3=0 4=12345 5=0 6=0 7=0 8=0 9=0 10=0 11=0 12=0 13=0 14=0 15=0 16=0 17=0 18=0 19=0 Note that I'm not holding any self-reference, only registering a timer. Obviously, since the object may be moved several times, it means I can never be sure the object "won't move anymore" and register this timer...

Comment #1 by tomer — 2017-05-28T11:32:56Z

I understand changing this might break backwards compatibility, but something like a `pragma(dontmove);` in the struct is a must... that way it won't have compiled, and we would pass it by ref instead of returning it from (several) functions

Comment #2 by tomer — 2017-05-28T11:37:51Z

alternatively a move-operator is required

Comment #3 by uplink.coder — 2017-05-28T11:50:18Z

easy workaround. stop abusing delegates :)

Comment #4 by eyal — 2017-05-28T11:56:47Z

@uplink: Why are struct delegates even supported if pretty much all uses of them are always abuse?

Comment #5 by shachar — 2017-05-28T12:01:16Z

Also, this can happen without delegates at all. Think about an intrusive linked list to which we register during construction and unregister during destruction.

Comment #6 by b2.temp — 2017-05-28T16:34:48Z

It looks like a bad codegen and not a @safe issue. `ulong[20] tmp;` is not corrupted if i compile and run your example with LDC2.

Comment #7 by tomer — 2017-05-28T17:18:11Z

it is obviously not a codegen issue, since i have a dangling pointer at hand. it's just a matter of how the much stack the compiler requires. in LDC i reproduce this by enlarging the tmp to 30 elements: dub run --compiler=/home/tomer/bin/ldc2-1.3.0-beta1-linux-x86_64/bin/ldc2 Performing "debug" build using /home/tomer/bin/ldc2-1.3.0-beta1-linux-x86_64/bin/ldc2 for x86_64. dtest ~master: building configuration "application"... Running ./dtest 0=0 1=0 2=0 3=0 4=0 5=0 6=0 7=0 8=0 9=12345 <<<<<< 10=0 11=0 12=0 13=0 14=0 15=0 16=0 17=0 18=0 19=0 20=0 21=0 22=0 23=0 24=0 25=0 26=0 27=0 28=0 29=0

Comment #8 by andrei — 2017-05-30T16:27:06Z

The problem here is the delegate escapes a pointer to a struct. The compiler is free to move around structs that are manipulated by value, therefore escaping &this from any struct method is not safe. (Technically the type of `this` inside a method of struct T is `scope ref T`, not `ref T`.) Sadly the way to go from here is disallow the code entirely in @safe mode. The notion that the compiler can freely move around value types is deeply embedded in the language, and has many benefits. There are no plans to change that. For data that doesn't change location in memory, dynamic or user-defined allocation is the go-to solution.

Comment #9 by shachar — 2017-05-30T18:56:16Z

There are two issues here. The first is that @safe does not warn about unsafe behavior. Interesting, but unrelated to the Weka use case. The more interesting problem here is that certain types of programming are impossible in D. I was unable to find the statement in the most recent language specs, but the phrasing I remember from "The D Programming Language" was along the lines of "Structs in D may not store references to themselves, and therefor D may move them around memory at will". Except that is not true. First, the order is reverse. D moves them around, and that's why they may not store references to themselves. More to the point, however, the requirement should be made much more stringent: D structs may not manage, directly or indirectly, any reference to themselves, either internally as a member, or externally. Like I stated above, that's a fairly big restriction. At the very least, the training material should be updated to reflect it correctly, possibly also the specs. As things currently stand, there is no way to write a struct the manages its own registration in an outside container without using dynamic memory (which is never mentioned in the specs as an exception to that rule).

Comment #10 by andrei — 2017-05-30T19:33:23Z

(In reply to Shachar Shemesh from comment #9) > There are two issues here. The first is that @safe does not warn about > unsafe behavior. Affirmative. > Interesting, but unrelated to the Weka use case. > > The more interesting problem here is that certain types of programming are > impossible in D. I was unable to find the statement in the most recent > language specs, but the phrasing I remember from "The D Programming > Language" was along the lines of "Structs in D may not store references to > themselves, and therefor D may move them around memory at will". > > Except that is not true. First, the order is reverse. D moves them around, > and that's why they may not store references to themselves. More to the > point, however, the requirement should be made much more stringent: > > D structs may not manage, directly or indirectly, any reference to > themselves, either internally as a member, or externally. > > Like I stated above, that's a fairly big restriction. At the very least, the > training material should be updated to reflect it correctly, possibly also > the specs. > > As things currently stand, there is no way to write a struct the manages its > own registration in an outside container without using dynamic memory (which > is never mentioned in the specs as an exception to that rule). Indeed, the spec should clarify under what circumstances objects can be moved. Functions manipulating objects by value can be expected to take the liberty to move their parameters around. Functions that manipulate objects by means of pointers and reference will not do that, even if their arguments are allocated on the stack or statically. So certain uses of internals and mutual pointers are possible. Currently there is no rigorous wording for such in the spec. Contributions are welcome.

Comment #11 by bugzilla — 2017-05-30T22:27:16Z

One possibility is to not allow escaping references to 'this' in @safe constructors.

Comment #12 by bugzilla — 2017-05-31T03:35:49Z

Interestingly, if you change: auto f() { return CallContext(18); } to: auto f() { CallContext c = CallContext(18); return c; } it will work, i.e. no moving is done. This is because the compiler does NRVO (Named Return Value Optimization) on it, where the object is constructed directly into its final destination. It's obviously not doing this for the original return statement, likely because it does not have a name :-) I'll look into correcting this, as NRVO is an important optimization. But at least you have a workaround for the moment.

Comment #13 by eyal — 2017-05-31T07:06:21Z

(In reply to Andrei Alexandrescu from comment #10) > Indeed, the spec should clarify under what circumstances objects can be > moved. Could we get compile-time errors when we accidentally do something that moves a struct that should be unmovable? Currently we do: @disable this(this); @disable void opAssign(typeof(this)); Could we also do: @disable move; // or any other syntax So that we can remove a constant worry about accidental code that moves structs that shouldn't? Getting compile-time detection of these issues is far more important than which exact code triggers these issues in the first place. We can always work around detected issues. But we can't get our debugging time back...

Comment #14 by andrei — 2017-05-31T14:10:15Z

(In reply to Eyal from comment #13) > (In reply to Andrei Alexandrescu from comment #10) > > > Indeed, the spec should clarify under what circumstances objects can be > > moved. > > Could we get compile-time errors when we accidentally do something that > moves a struct that should be unmovable? > > Currently we do: > > @disable this(this); > @disable void opAssign(typeof(this)); > > Could we also do: > > @disable move; // or any other syntax > > So that we can remove a constant worry about accidental code that moves > structs that shouldn't? > > Getting compile-time detection of these issues is far more important than > which exact code triggers these issues in the first place. > > We can always work around detected issues. But we can't get our debugging > time back... That would be nice to have, but seemingly a major project both on the definition and implementation side. * On the language definition side: the proposal would need to differentiate among (a) constructs that are guaranteed to not move; (b) operations that may move if possible (for optimization purposes) or not depending on "@disable move", and (c) operations that must always move and therefore are disallowed for "@disable move" objects. * On the implementation side: currently compilers embed the notion that they can move structs around at all levels, from the front-end down to the optimizer. Each compiler decides to move data on its own, i.e. takes full advatage of the freedom to move. LDC for example is very adept at "destructuring", i.e. exploding a struct into registers and "assembling" it back if necessary. Carrying the notion of "immovable structs" through all layers would be a major effort for all implementations. We'd want to focus effort on things that currently make it impossible for folks to do things (e.g. use shared portably and effectively) instead of matters that have workarounds. In this case, we think of having the compiler disallow escaping of &this in safe code, i.e. constructors are scoped. For the few cases that require escaping of the pointer, that code can be @trusted and validated by hand. On the language definition side, we should clarify that the address of an lvalue stays unchanged through the lifetime of the lvalue, but rvalues don't provide such guarantees. We're also considering a means to disable scope (e.g. by scope(false)) so certain struct constructors and methods _are_ allowed to escape &this in a way that is verifiable by the compiler. On the library side, an obvious category of immovable objects are classes, so a class in conjunction with "scope" may be a good starting point for an efficient immovable object.

Comment #15 by shachar — 2017-05-31T18:17:16Z

(In reply to Andrei Alexandrescu from comment #14) > * On the language definition side: the proposal would need to differentiate > among (a) constructs that are guaranteed to not move; (b) operations that > may move if possible (for optimization purposes) or not depending on > "@disable move", and (c) operations that must always move and therefore are > disallowed for "@disable move" objects. > You do not need to disallow anything if you require that "@disable move" on struct "Struct" always be accompanied by "this(Struct rhs)" constructor (i.e. - a move constructor). This way, you can move when applicable, and move through the provided callback where applicable.

Comment #16 by stanislav.blinov — 2017-05-31T21:53:26Z

To all participants: if you don't mind, please add your ideas to the pool: http://forum.dlang.org/thread/[email protected] These things tend to go mostly unnoticed, until they do come up in such issue reports.

Comment #17 by bugzilla — 2017-06-03T06:50:00Z

Another case: https://github.com/dlang/dmd/pull/6852

Comment #18 by bugzilla — 2017-06-03T06:53:08Z

(In reply to Walter Bright from comment #17) > Another case: > > https://github.com/dlang/dmd/pull/6852 Also: https://github.com/dlang/dmd/pull/6849 https://github.com/dlang/dmd/pull/6847 The three of them eliminates the moving in this test case. I know it is not what you're asking for, but it is a worthwhile optimization.

Comment #19 by dfj1esp02 — 2017-06-06T14:15:58Z

(In reply to Andrei Alexandrescu from comment #14) > LDC for example is very adept at > "destructuring", i.e. exploding a struct into registers and "assembling" it > back if necessary. Carrying the notion of "immovable structs" through all > layers would be a major effort for all implementations. I think it's transparent to LDC: optimizers are conservative, and such explosion is possible if the optimizer understands the addressing pattern of the struct content.

Comment #20 by bugzilla — 2018-03-04T07:14:59Z

With the current compiler, and the addition of @safe: test.d(13): Error: address of variable `this` assigned to `timeoutCallback` with longer lifetime Line 13 is: timeoutCallback = &handleTimeout; I believe this should resolve the issue.

Comment #21 by tomer — 2018-03-04T12:29:45Z

Walter, the @safe-ty aspects of the issue are one thing. In real code, @safe is hardly workable, i.e. void main() { int x; writeln(&x); // Error: cannot take address of local `x` in `@safe` function `main` } It's either you go whole nine yards and implement a full-blown borrow-checker like rust, or impose very strict (and sometimes arbitrary) limitations that practically make it unusable. But @safe-aside, the *more important* aspect here that the compiler must provide a guarantee of *never moving* structs that are marked `@disable this(this)` or `pragma(immovable)` or with any other syntax. It's a semantic contract with the compiler, not an optimization. So for example, a desired outcome might be for this not to compile: pragma(immovable) struct S { int x; } S func() { return S(100); } void main() { S s = func(); } Should the compile be unable to rewrite this as pass-by-reference. I hope it makes the problem clear, again, @safe is really not the issue here. It's the guarantees provided by move semantics.

Comment #22 by tomer — 2018-03-04T14:10:07Z

void delegate() callback; struct S { int x; @disable this(this); this(int x) { this.x = x; callback = &inc; } void inc() { x++; } } auto f() { return g(); } auto g() { return h(); } auto h() { return S(100); } void main() { auto s = f(); writeln(&s, " ", callback.ptr); // 7FFF0C838400 7FFF0C8383A0 callback(); writeln(s.x); // 100 instead of 101 }

Comment #23 by andrei — 2018-03-06T13:52:05Z

(In reply to Tomer Filiba (weka) from comment #21) > Walter, the @safe-ty aspects of the issue are one thing. In real code, @safe > is hardly workable, i.e. > > void main() { > int x; > writeln(&x); // Error: cannot take address of local `x` in `@safe` > function `main` > } We should address this situation by having writeln take scope inputs. It does not escape any pointers. > It's either you go whole nine yards and implement a full-blown > borrow-checker like rust, or impose very strict (and sometimes arbitrary) > limitations that practically make it unusable. We believe there's a third way, but the burden of proof is indeed on us. > But @safe-aside, the *more important* aspect here that the compiler must > provide a guarantee of *never moving* structs that are marked `@disable > this(this)` or `pragma(immovable)` or with any other syntax. It's a semantic > contract with the compiler, not an optimization. > > So for example, a desired outcome might be for this not to compile: > > pragma(immovable) struct S { > int x; > } > S func() { > return S(100); > } > void main() { > S s = func(); > } > > Should the compile be unable to rewrite this as pass-by-reference. > > I hope it makes the problem clear, again, @safe is really not the issue > here. It's the guarantees provided by move semantics. I think immovability is a red herring. The problem is a pointer is escaped and it shouldn't be. Even if the struct were immovable, you could construct other cases in which &this gets messed up. (Thinking of supporting the notion of immovable objects - it would have huge ripples. Already supporting structs with @disable this(); and/or @disable this(this); in the standard library is a recurring (and still ongoing) nightmare. Dealing with immovability on top of that would make a lot of code a lot heavier.)

Comment #24 by tomer — 2018-03-06T14:47:00Z

> We should address this situation by having writeln take scope inputs. It does > not escape any pointers. So... more special cases? > I think immovability is a red herring. The problem is a pointer is escaped > and it shouldn't be. Even if the struct were immovable, you could construct > other cases in which &this gets messed up. Of course I could *make* it fail, but that would require me doing memcpy or what not. It escapes the type system. I'm trying to make it work or fail inside the type system. The whole point is that `struct S` cannot tell how people will use it. The author thought people would just do void func() { auto s = S(100); // ... // when s goes out of scope it will cancel the callback } The author tried to prevent people from shooting themselves in the foot by making the struct immovable. But then someone added a wrapper function that returns a `struct S`. This second author knew the struct is immovable and trusted the type system so his changes would either not compile, or compile but never move the object. The code works in the happy flow, but when things go south, all of the sudden you get cryptic bugs that take many days -- and a lot of luck -- to track down. In this specific example, the struct registers a timer to be called in XX seconds and kill the operation. Other examples could be a struct that adds itself to a link-list and removes itself when it's destroyed. It doesn't make sense that C++ gives me this guarantee, but D fools me into thinking my code is okay when in fact it isn't. It's not an implementation detail or an optimization -- it's a semantic guarantee.

Comment #25 by andrei — 2018-03-06T15:29:21Z

(In reply to Tomer Filiba (weka) from comment #24) > > We should address this situation by having writeln take scope inputs. It does > > not escape any pointers. > > So... more special cases? That's a straight use of scope per DIP 1000, in fact the very design intent: scope in a function signature specifies the function won't escape the parameter. > > I think immovability is a red herring. The problem is a pointer is escaped > > and it shouldn't be. Even if the struct were immovable, you could construct > > other cases in which &this gets messed up. > > Of course I could *make* it fail, but that would require me doing memcpy or > what not. It escapes the type system. I'm trying to make it work or fail > inside the type system. > > The whole point is that `struct S` cannot tell how people will use it. The > author thought people would just do > > void func() { > auto s = S(100); > // ... > // when s goes out of scope it will cancel the callback > } > > The author tried to prevent people from shooting themselves in the foot by > making the struct immovable. But then someone added a wrapper function that > returns a `struct S`. This second author knew the struct is immovable and > trusted the type system so his changes would either not compile, or compile > but never move the object. The code works in the happy flow, but when > things go south, all of the sudden you get cryptic bugs that take many days > -- and a lot of luck -- to track down. > > In this specific example, the struct registers a timer to be called in XX > seconds and kill the operation. Other examples could be a struct that adds > itself to a link-list and removes itself when it's destroyed. > > It doesn't make sense that C++ gives me this guarantee, but D fools me into > thinking my code is okay when in fact it isn't. It's not an implementation > detail or an optimization -- it's a semantic guarantee. What I'm saying is the problem is different: the address of this should not escape a non-scope method (including the constructor), we're in a world of trouble regardless whether we introduce a new feature regarding movability.

Comment #26 by eyal — 2018-03-06T18:33:54Z

What solution is there for this use-case then? We need objects to register themselves (i.e: set out-of-scope pointers to point at them) and RAII-wise have them de-register themselves. While they are registered, we need a guarantee that they won't be moved. Doesn't sound like D has any solution for us here? 1) We can't use classes, because: 1.A) they are hard to use without GC 1.B) cannot reasonably nest classes so they re-use the same allocation as the container class 1.C) cannot easily allocate them on the stack 2) We can't use structs, because there's no way to make sure structs aren't moved when they're registered. In my experience, this use-case of registered objects is extremely common. Immovable structs, even if those require some effort, sound like they should be well worth virtually any effort they'd require. In practice, we use structs for this, and we don't really have a choice so we'll keep using structs. But D is fighting us here, instead of helping us.

Comment #27 by tomer — 2018-03-06T18:51:33Z

(In reply to Andrei Alexandrescu from comment #25) > > So... more special cases? > > That's a straight use of scope per DIP 1000, in fact the very design intent: > scope in a function signature specifies the function won't escape the > parameter. I don't suppose this would help. It seems the & operator is just not allowed in safe code: void main() @safe { int x; auto tmp = &x; // Error: cannot take address of local x in @safe function }

Comment #28 by tomer — 2018-03-06T18:52:57Z

My point is, @safe is so constrained that it's practically unusable, so I don't consider it a viable solution for this problem.

Comment #29 by issues.dlang — 2018-03-06T19:04:52Z

(In reply to Tomer Filiba (weka) from comment #27) > I don't suppose this would help. It seems the & operator is just not allowed > in safe code: > > void main() @safe { > int x; > auto tmp = &x; // Error: cannot take address of local x in @safe > function > } That code becomes @safe with -dip1000, because then it's inferred as scope, and the compiler verifies that it doesn't escape, whereas without DIP 1000 and its improvements to scope, the compiler doesn't have any way to ensure that the resulting pointer is used in an @safe manner. So, DIP 1000 should have a fairly large impact on how @safe certain operations are. (In reply to Tomer Filiba (weka) from comment #28) > My point is, @safe is so constrained that it's practically unusable, so I > don't consider it a viable solution for this problem. That would be highly dependent on what your code is doing and how willing you are to vet code and mark functions @trusted where appropriate or use @trusted lambdas to mark sections of code as @trusted (which isn't the most ideal solution for marking a section of code as @trusted, but it's the best we have right now). If you're constantly doing stuff like taking the address of a local variable, then yes, @safe is going to be a miserable mess (though DIP 1000 may fix that). But a lot of code can be @safe with no problem. It really depends on the type of stuff your code is doing.

Comment #30 by andrei — 2018-03-06T20:58:14Z

Indeed it seems we are not supporting registration by address with ease for D value types. There are good reasons for that; by allowing value types to be moved around we avoid a swath of complications and difficulties that C++ encounters with their definition of rvalue references and move constructors. That C++ feature complicates all code considerably and still has a variety of safety, correctness, and efficiency issues (I am not exaggerating; all three problems are present) that make the bread and butter of advanced C++ instructors worldwide, including myself. I think we got the better deal there. The disadvantage is that indeed an object can be born and die at different addresses. So self-registering objects by address, or objects holding internal pointers, do not work with automatic allocation. Such is the nature of automatic allocation in D. (It should be noted that C++ has its own, distinct issues with self-registering objects, to which I dedicate several slides in http://erdani.com/index.php/training/mc1xd.) Allow me to make a few suggestions for workarounds: * Avoid automatic/stack allocation and also return by value. A struct may hold internal pointers and a constant address as long as the memory it's in is not automatic/stack. If you use raw memory in conjunction with functions such as emplace and destroy, you have good control. As a perk, you avoid the creation, copying, and destruction of spurious objects - which comes along with calls to register/unregister, which I assume has a significant cost. * Use a "cookie", not an address, in the registration. A classic registration pattern returns a cookie/handle, usually an integer, which the object stores. Then, the object passes that cookie back to the unregister function. In other words, if the address of an object isn't invariant, let's define something that is. * Use dynamic allocation in conjunction with reference counting. I know this has been mentioned and dismissed as too expensive, but I'm mentioning it for completeness. Also, it's one of those cases in which engineering can go a long way: use a high-performance allocator (for which we have a solid framework in the standard library), duplicate objects lazily, etc. If after exploring these and other solutions you come to the conclusion they are not satisfactory, I encourage you to create a DIP. Two possible lines of attack are: (1) Allow specifying that an object can't be moved (2) Allow a type to intercept its move by means of a nothrow hook Drawing from extensive experience with Phobos and its very generic code, I can tell you I foresee difficulties with (1). Anything that makes types "more special than others" is bound to cause a smorgasbord of special handling in the library. I think (2) would be a better angle because it encapsulates the handling with the type. Thanks!

Comment #31 by shachar — 2018-03-08T01:57:39Z

Here goes: (In reply to Andrei Alexandrescu from comment #30) > Indeed it seems we are not supporting registration by address with ease for > D value types. Make that: at all. I understand that there are cases where it is safe to do, but I don't think we have a good guarantee about those cases. Without them, it is simply impossible. > That C++ ... still has a variety of safety, > correctness, and efficiency issues (I am not exaggerating; all three > problems are present) In the interest of better understanding the trade-offs, can you please elaborate? I'm asking because D's insistence on moving things around is one of the language features I understand less. I don't see what the huge gains are, and some of the actual compiler behaviors I've seen are downright counter-productive. I've seen cases where the same object gets moved around so much, and for no apparent good reason, that I seriously worry for the performance bleed from this feature. > (It should be noted that C++ has its own, distinct issues with > self-registering objects, to which I dedicate several slides in > http://erdani.com/index.php/training/mc1xd.) Unless I've missed where in that link the actual slides are, not helping. > > Allow me to make a few suggestions for workarounds: > > * Avoid automatic/stack allocation and also return by value This is a HUGE increase in the cost of allocating the struct. There are efficient mechanisms for dynamic allocations (e.g. - maintaining a pool), but they cost in other overheads (e.g. - predetermining the number of instances, adding an initialization stage). > As a perk, you avoid the > creation, copying, and destruction of spurious objects - which comes along > with calls to register/unregister, which I assume has a significant cost. I don't think registering an object with a linked list is as big a cost as dynamic allocation. > * Use a "cookie", not an address, in the registration. A classic > registration pattern returns a cookie/handle, usually an integer, which the > object stores. No, that's not correct. There is indeed a classic pattern where the registrar returns a cookie, but it *stores* the pointer internally (or, in the Windows way, obfuscates it into the handle). I don't think you can come up with a scheme where the Linux kernel cannot translate an integer FD into a file pointer without user-space's help. Someone outside of the struct has to know the pointer, or the struct is unlocatable without already knowing where it is. Needless to say, this would defeat the purpose of registering it. > If after exploring these and other solutions you come to the conclusion they > are not satisfactory, I encourage you to create a DIP. Two possible lines of > attack are: > > (1) Allow specifying that an object can't be moved > (2) Allow a type to intercept its move by means of a nothrow hook (2) is what we've been asking for all along (e.g. - comment #15). Give us a way to update the external reference when the address changes. I'll try to phrase a DIP. It would be my first one, so help would be appreciated.

Comment #32 by bugzilla — 2018-03-11T08:45:47Z

I filed two bug reports based on Tomer's work: https://issues.dlang.org/show_bug.cgi?id=18575 https://issues.dlang.org/show_bug.cgi?id=18576 which have PRs: https://github.com/dlang/dmd/pull/7989 https://github.com/dlang/dmd/pull/7990 The second example in comment 1 is a bug in the example, as: auto f() {auto x = g();} should be: auto f() { return g();} This addresses all the issues in examples provided. I.e. the escapes are properly detected with -dip1000, and the structs don't move due to RVO (Return Value Optimization) where the results are written into the caller's stack frame. If you discover more problems with escape detection and moved stack objects, please let me know with examples. Thanks!

Comment #33 by shachar — 2018-03-11T08:51:17Z

Like we said earlier, the safe implications are important, but are not the core of this bug report. The problem is that, as things stand with D, certain types of programming become impossible. For example, consider a RAII object protecting a resource (say, a file descriptor). For debugging purposes, we want to have a version(debug) option for tracking all such objects within the system. Under C++, what we'd do is to have a global linked list head, and have each RAII object register itself with the linked list in the constructor and deregister itself in the destructor. If necessary, we would have a move constructor that shifts the registration when the RAII is moved. This last step is impossible in D. You cannot be notified when the object moves, and once it does, the global linked list will get completely corrupted.

Comment #34 by eyal — 2018-03-11T11:41:53Z

Walter: Thanks, the RVO fixes are very useful and important - and will be part of any future solution we have!

Comment #35 by bugzilla — 2018-03-11T21:25:09Z

(In reply to Shachar Shemesh from comment #33) > You cannot be notified when the object > moves, and once it does, the global linked list will get completely > corrupted. That is correct, and you are right to be concerned about it. However, from a pragmatic point of view, the compiler cannot move objects that have external pointers to them (unless it can unambiguously find them all and update them). This is why, for example, there is an API in the GC to register pointers to objects that are unknown to the GC. Why was it moving the stack allocated objects in these examples? Because when it went out of scope, it is safe to assume that there are no extant pointers to it. This is why the compiler (now) complains when the code registers a pointer to the stack object. Of course, making the code @trusted enables code like this to be written, leaving it up to the programmer to ensure the safety. But there's another wrinkle. With RVO (Return Value Optimization), if the stack object is returned, and it is not returned in registers, it is actually not returned. It is allocated in the caller's stack frame! and is not moved. This is now the case with all the examples you've provided. So, to guarantee RVO: 1. the object must be returned on the stack, not in registers. This can be tested as shown in the documentation: https://dlang.org/spec/traits.html#isReturnOnStack Generally speaking, a struct that is larger than two registers, or is not POD (Plain Old Data), will return on the stack and not in registers. 2. use only one return statement in the function. Having multiple returns can defeat RVO. 3. do not pass the object as a parameter to other functions, as the compiler may 'move' it onto the parameter stack if there are no more references to the object. If you must pass it as a parameter, pass it by `ref`. Objects can also be constructed directly into their final destination by passing them as `out` parameters. Obviously, I am not familiar with your code and usage patterns. But the above ought to offer some hope (!) for getting your code working properly. If there are more pernicious problems with moving that aren't covered by this, please let me know. Perhaps I can think of a solution.

Comment #36 by shachar — 2018-03-12T07:35:21Z

Hello Walter, I see two problems with your comment. The first is that I think you are missing the real point of this bug report. It is not about controlling movability. It is about being able to hook the object's lifetime. The second problem is that I think you are trying to use optimization as a tool for solving semantic problems. I take issue with that, as there are always cases where optimizations are not possible. (In reply to Walter Bright from comment #35) > Why was it moving the stack allocated objects in these examples? Because > when it went out of scope, it is safe to assume that there are no extant > pointers to it. Before an object goes out of scope, its destructor needs to be called. Else, you give me no tool to make sure the above assertion is actually true. In this case, the object "went out of scope" but its destructor was not called. This is the real trigger for the dangling external pointer. > This is why the compiler (now) complains when the code > registers a pointer to the stack object. Of course, making the code @trusted > enables code like this to be written, leaving it up to the programmer to > ensure the safety. I have no problem with the saying that some code that is safe is not @safe. My issue is with saying that I am facing a lifetime management problem that has no solution within the language, no matter how aware I am of it. > But there's another wrinkle. With RVO (Return Value Optimization) ... There are cases where it is impossible to RVO: SomeStruct func(bool which) { SomeStruct struct1, struct2; // Do something on both struct1 and struct2 to make this meaningful. if( which ) return struct1; else return struct2; } There is simply no way to allocate the correct object on the caller's memory. > So, to guarantee RVO: ... That's, to me, missing the point. The point, for me, is to be able to design a struct that will be safe without asking the users of the struct to live up to some conditions. I don't think anything else is a viable answer. > Obviously, I am not familiar with your code and usage patterns. But the > above ought to offer some hope (!) for getting your code working properly. RVO is important, and definitely a welcome enhancement. The issue here, however, is not about Weka's code (the bug that triggered this bug report is long solved now). The point here is that there is an exceedingly useful tool designed to create code that is easier to maintain that D makes impossible. I am going to write a DIP for introducing a mechanism for hooking the move, as discussed in comment #31. There reason I'm keeping this discussion alive is because I will need your buy-in to get it integrated, and for that I need you to see the need. > If there are more pernicious problems with moving that aren't covered by > this, please let me know. Perhaps I can think of a solution. See above.

Comment #37 by bugzilla — 2018-10-02T09:28:36Z

Looking at your example, I rewrote it into a nunnable form: import core.stdc.stdio; struct S { int i; this(int i) { printf("this() %d\n", i); this.i = i; } this(this) { printf("this(this) %d\n", i); } ~this() { printf("~this() %d\n", i); } } int main() { S s = test(true); printf("s %d\n", s.i); return 0; } S test(bool b) { S s1 = S(1), s2 = S(2); if (b) return s1; else return s2; } You are correct that copy elision does not take place. However, it does not leave the code in a bad state. Running it yields: this() 1 this() 2 this(this) 1 ~this() 2 ~this() 1 s 1 ~this() 1 I.e. there are 3 constructions, and 3 destructions. It is correct code. It is not as efficient as a move constructor, but it is correct and you are not left with a dangling pointer.

Comment #38 by bugzilla — 2018-10-02T09:31:52Z

DIP 1014: https://github.com/dlang/DIPs/blob/38cec74a7471735559e3b8a7553f55102d289d28/DIPs/DIP1014.md

Comment #39 by thomas.bockman — 2021-02-17T20:39:02Z

Is this resolved by DIP 1018's copy constructors, https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1018.md https://dlang.org/spec/struct.html#struct-copy-constructor And the pseudo-deprecation of postblits? https://dlang.org/spec/struct.html#struct-postblit I'm asking because someone on the forums told me not to trust copy constructors to work correctly, and linked me here. Are copy constructors broken too, or just legacy postblits? Are postblits even still broken? If so, shouldn't they be officially deprecated?

Comment #40 by thomas.bockman — 2021-02-17T20:51:06Z

(In reply to thomas.bockman from comment #39) > Are postblits even still broken? Also, I tested all of the examples posted to this issue. On recent compilers they all seem to either work correctly, or generate a compile-time error with a good message.

Comment #41 by alphaglosined — 2021-02-17T20:52:38Z

As of 2.077.1 the memory corrupting example in the original post no longer compiles due to improvements with @safe.

Comment #42 by razvan.nitu1305 — 2021-08-17T15:27:10Z

Closing this since neither of the code samples produce memory corruption (some run as intended, while others are rejected as unsafe).

Comment #43 by schveiguy — 2022-05-29T18:08:17Z

I'm interested to hear back from the weka folks. As the original no longer is moved by the compiler, and also, DIP1014 is implemented in library code already, is there still a need to have any support for DIP1014 in the compiler? I tried to find a case where the compiler does a move, and after conversing with both Walter and Iain, I think there aren't any remaining cases.

Comment #44 by eyal — 2022-05-30T05:58:57Z

Hi Steven, thanks for reaching out. This example still fails on newest dmd: struct S { S* addr; this(int dummy) { this.addr = &this; } static S create() { return S(1); } } unittest { auto s1 = S(1); assert(s1.addr == &s1); auto s2 = S.create(); assert(s2.addr == &s2); // <-- fails } While "@safe" disallows it, it also prevents all use of '&' here, so it is not really a practical option for such code.

Comment #45 by schveiguy — 2022-05-30T14:47:29Z

I'm assuming you are missing some extra machinery here. You aren't expecting D to automatically update your references for you, right? You at least need a postblit/copy ctor.

Comment #46 by schveiguy — 2022-05-30T14:50:42Z

e.g.: ```d struct S { S* addr; this(int dummy) { this.addr = &this; } this(ref const S other) { this.addr = &this; } static S create() { return S(1); } } ```

Comment #47 by schveiguy — 2022-05-30T15:12:03Z

I should rephrase my query. currently, if you create an appropriately written `opPostMove`, then the D library calls that when it's doing a semantic move of your type (e.g. if you use `std.algorithm.move`) But I can't find a case where the *compiler* is doing a semantic move of the type. If there are no cases, then DIP1014 is already implemented, right? My query is really does anyone have any cases where the compiler performs the move? Re-parsing this bug report, I think the original request (provide a way to have an actual registration system that updates pointers when a move is done) is possible now. I feel this bug was closed for the wrong reason (i.e. toy examples aren't compiling any more), and I want to be sure it's actually possible now. The language *should* be more forthcoming on what constitutes a move, and what the compiler must do if it decides to move a struct.

Comment #48 by tomer — 2022-05-30T16:57:35Z

Steven, can we agree that the following snippet is NOT okay? Regardless of @safe, let's leave it out of question. As a bonus, it this does not involve any library code/special moves/self references. And I used the latest dmd from run.dlang.io. ```d __gshared S* lastS = null; struct S { int x; this(int x) { this.x = x; lastS = &this; } } S f(int x) { return S(x); } void main() { S s1 = S(6); writeln("&s1=", &s1, " lastS=", lastS, " diff=", cast(void*)&s1 - cast(void*)lastS); // &s1=7FFD2262F468 lastS=7FFD2262F468 diff=0 S s2 = f(5); writeln("&s2=", &s2, " lastS=", lastS, " diff=", cast(void*)&s2 - cast(void*)lastS); // &s2=7FFD2262F46C lastS=7FFD2262F3F8 diff=116 } ``` * The first example, s1, is created using the ctor directly, which means there's no move and `&s1 == lastS`. * The second example, s2, return a struct via a function. Here we get an implicit move and lastS points to garbage, 116 bytes away from s2. * This means that if I use lastS, I'm either reading garbage from it, or corrupting my stack (i.e., assume I'm calling another function afterwards, which does `lastS.x = 17`)

Comment #49 by schveiguy — 2022-05-30T17:02:31Z

On discord, we worked out a case where the compiler is moving a struct and not calling any opPostMove or other features (copy constructor, postblit, etc): ```d import std.stdio; struct S{ S* ptr; this(int dummy) {ptr = &this;} this(this) {ptr = &this;} void opPostMove(const ref S oldLocation) nothrow { ptr = &this; } } void foo(S s){ assert(&s == s.ptr); // fails } S bar(){ auto s=S(1); auto t=S(1); if(true){ return t; } return s; } void main(){ foo(bar()); } ``` So there is still a need for the compiler to be aware of opPostMove. In this example, the return from bar cannot be NRVO optimized, and so a copy is made and the postblit called. Then in the call to foo, the return value of bar is moved into the parameter for foo.

Comment #50 by tomer — 2022-05-30T17:08:43Z

Btw, I tried adding to S ``` @disable void opPostMove(const ref S oldLocation) nothrow {} ``` But it doesn't have any effect (problem compiles and produces the same issue). In C++, you can make types un-movable, which means that if there's need to move them, you'll get a compile time error. I would love to be able to replicate this behavior.

Comment #51 by schveiguy — 2022-05-30T17:09:33Z

(In reply to Tomer Filiba (weka) from comment #48) > Steven, can we agree that the following snippet is NOT okay? > Regardless of @safe, let's leave it out of question. You are still not hooking copies or destructors. In this case, if I add a postblit to your struct like: ```d this(this) { lastS = &this; } ``` then it works as expected. In this case, the compiler is copying the S(x) instance to the return value, and then destroying the original, though I'm somewhat surprised NRVO didn't kick in here. In any case, I have found an answer to my question that the compiler *does* perform moves in certain situations, and it does not call opPostMove.

Comment #52 by schveiguy — 2022-05-30T17:14:37Z

(In reply to Steven Schveighoffer from comment #51) > then it works as expected. In this case, the compiler is copying the S(x) > instance to the return value, and then destroying the original, though I'm > somewhat surprised NRVO didn't kick in here. Indeed, it's enough to disable postblit, and then NRVO kicks in. So the compiler prefers to move vs. using NRVO when nobody cares about the movement/copy. And then it becomes apparent that the compiler must obey `@disable opPostMove` if it is going to make decisions like this!