See sample code:
https://issues.dlang.org/show_bug.cgi?id=2043
int main() {
void delegate()[] dgs;
for (int i = 0; i < 10; i++) {
dgs ~= () {
import std.stdio;
writeln(i);
};
}
dgs ~= () {
import std.stdio;
writeln("With cached variables");
};
for (int i = 0; i < 10; i++) {
int index = i;
dgs ~= () {
import std.stdio;
writeln(index);
};
}
foreach (ref dg; dgs) {
dg();
}
return 0;
}
It outputs:
10
10
10
10
10
10
10
10
10
10
With cached variables
9
9
9
9
9
9
9
9
9
9
While the series of 10 is expected, as the same variable is captured every time, the series of 9 is not. A new variable is declared with each iteration of the loop, and therefore a new closure ought to be created.
Failure to do allows for very strange things to happen, such as observing mutation in immutable variables.
The expected output would be:
10
10
10
10
10
10
10
10
10
10
With cached variables
0
1
2
3
4
5
6
7
8
9
That semantic is consistent with what is done in other languages, such as C#: https://unicorn-dev.medium.com/how-to-capture-a-variable-in-c-and-not-to-shoot-yourself-in-the-foot-d169aa161aa6 (link provided courtesy of Andrei Alexandrescu).
Possibly related issue: https://issues.dlang.org/show_bug.cgi?id=2043
I decided to open a new one because the discussion in the old one refers to a ton of outdated language constructs and is therefore not super helpful.
Comment #1 by dlang-bugzilla — 2021-05-18T17:35:43Z
In JavaScript:
---------------------------------------------------------------
var dgs = [];
for (var i = 0; i < 10; i++) {
dgs.push(function() {
console.log(i);
});
}
dgs.push(function() {
console.log("With cached variables");
});
for (var i = 0; i < 10; i++) {
var index = i;
dgs.push(function() {
console.log(index);
});
}
for (var dg of dgs) {
dg();
}
---------------------------------------------------------------
Prints:
---------------------------------------------------------------
10
10
10
10
10
10
10
10
10
10
With cached variables
9
9
9
9
9
9
9
9
9
9
---------------------------------------------------------------
So, I don't think D's behavior is unexpected compared to other languages.
Note that if you change "var" to "let" in JS then it behaves closer as to what you may expect.
At this point this is part of the language and is not a bug, so I don't think the bugtracker is the proper place to post this.
The workaround in D is the same as in JavaScript (before "let"), pass the mutable values to the lambda so that their current values become part of the lambda's state (or create an outer lambda which does this).
Comment #2 by dlang-bugzilla — 2021-05-18T17:40:06Z
(In reply to deadalnix from comment #0)
> Failure to do allows for very strange things to happen, such as observing
> mutation in immutable variables.
This however is a good reason to possibly deprecate/disallow doing so at least in @safe code.
Comment #3 by deadalnix — 2021-05-18T19:47:17Z
(In reply to Vladimir Panteleev from comment #1)
> In JavaScript:
>
> [...]
>
> So, I don't think D's behavior is unexpected compared to other languages.
>
> Note that if you change "var" to "let" in JS then it behaves closer as to
> what you may expect.
>
> At this point this is part of the language and is not a bug, so I don't
> think the bugtracker is the proper place to post this.
>
> The workaround in D is the same as in JavaScript (before "let"), pass the
> mutable values to the lambda so that their current values become part of the
> lambda's state (or create an outer lambda which does this).
This code is erroneous. In JS, var declare a variable at the function scope level.
Try again declaring the variable using `let`, which creates a properly scoped variable in JS and see how it goes.
Comment #4 by bugzilla — 2021-05-19T03:08:33Z
Let's rewrite it to something that does not use closures:
int test() @safe {
int j;
int*[20] ps;
for (int i = 0; i < 10; i++) {
ps[j++] = &i;
}
for (int i = 0; i < 10; i++) {
int index = i;
ps[j++] = &index;
}
int x;
foreach (p; ps) {
x += *p;
}
return x;
}
This code is equivalent in terms of what is happening with references and scopes.
Compiling it with -dip1000 yields:
Error: address of variable i assigned to ps with longer lifetime
Error: address of variable index assigned to ps with longer lifetime
Which is pragmatically what the behavior of the delegate example would be, because the delegate is also storing a pointer to the variable.
Comment #5 by bugzilla — 2021-05-19T03:10:01Z
The general rule for determining "what should happen here" when there are abstractions around pointers (such as arrays, delegates, refs, outs, class references, etc.), is to rewrite it in explicit terms of those pointers. The (sometimes baffling) behavior is then exposed for what it actually is, and the behavior should match.
Granted, there is a special kludge in the compiler to sometimes put the variables referenced by the delegate into a closure allocated by the gc, but that doesn't account for variables that go out of scope before the function scope ends. There is no process to make a closure for them, and adding such a capability is likely much more complication than added value, and so should just be an error.
Comment #6 by deadalnix — 2021-05-19T07:23:51Z
(In reply to Walter Bright from comment #4)
> Let's rewrite it to something that does not use closures:
>
> int test() @safe {
> int j;
> int*[20] ps;
>
> for (int i = 0; i < 10; i++) {
> ps[j++] = &i;
> }
>
> for (int i = 0; i < 10; i++) {
> int index = i;
> ps[j++] = &index;
> }
>
> int x;
> foreach (p; ps) {
> x += *p;
> }
>
> return x;
> }
>
> This code is equivalent in terms of what is happening with references and
> scopes.
>
> Compiling it with -dip1000 yields:
>
> Error: address of variable i assigned to ps with longer lifetime
> Error: address of variable index assigned to ps with longer lifetime
>
> Which is pragmatically what the behavior of the delegate example would be,
> because the delegate is also storing a pointer to the variable.
Except it is not. In D, closures do allocate on heap.
Comment #7 by black80 — 2021-05-20T07:53:22Z
agree with bug.
man creating new delegate/lambda with new internal state.
what is reason to store just one shared state for created multiple delegetes in loop? what is using case for it?
if D-team don't want to break current behavior let add new keyword "lambda" that is 100% compatible with "delegate" but it grabs totally new state when created - just changed compile time generated code.
C++ byRef & byValue capturing is more viable.
Comment #8 by deadalnix — 2021-05-20T10:58:26Z
(In reply to a11e99z from comment #7)
> C++ byRef & byValue capturing is more viable.
You'll not that C++'s std::function will allocate on heap if you capture. The equivalent code in C++ WILL allocate in a loop too.
Comment #9 by stanislav.blinov — 2021-12-05T22:28:57Z
(In reply to deadalnix from comment #8)
> You'll not that C++'s std::function will allocate on heap if you capture.
??? std::function MAY allocate on the heap, and whether it will would depend on its implementation and size of lambda's state. A decent implementation of std::function surely would not allocate if all you capture is a single reference.
> The equivalent code in C++ WILL allocate in a loop too.
Whether std::function would allocate is irrelevant. Equivalent C++ code would, generally speaking, print unspecified values for all but the string:
#include <functional>
#include <vector>
#include <iostream>
int main(int argc, char** argv) {
std::vector<std::function<void()>> dgs;
for (int i = 0; i < 10; i++) {
// Capture by reference, as that's D's semantics
dgs.emplace_back([&i] () {
std::cout << i << "\n";
});
}
dgs.emplace_back([] () {
std::cout << "With cached variables" << "\n";
});
for (int i = 0; i < 10; i++) {
int index = i;
// Capture by reference, as that's D's semantics
dgs.emplace_back([&index] () {
std::cout << index << "\n";
});
}
for (auto& dg: dgs) {
dg();
}
return 0;
}
Debug build with clang prints 10s followed by 9s. Debug build with gcc prints garbage values. Optimized builds with either print garbage values. Walter's rewrite clearly demonstrates why. D's output is at least predictable non-garbage, if not expected.
Note the comments, as that's the crux of the problem. D's captures are always by reference, so equivalent C++ code should do the same. If you capture by copy, then sure, you'll see a printout of [0, 10) after "With cached variables", but then the code would not be equivalent.
Back to D land, as mentioned both here and in https://issues.dlang.org/show_bug.cgi?id=2043, a, uh... "workaround" would be to force a new frame by turning loop body into e.g. an anonymous lambda call:
for (int i = 0; i < 10; i++) (int index) {
dgs ~= () {
import std.stdio;
writeln(index);
};
} (i);
...but then bye-bye go break and continue :\
Ideally, it'd be great to see by-copy and moving captures in D.
I do support the concern that the behavior is not at all obvious at a glance, however, mimicking the behavior of C# would be too restrictive, and would make a different problem not obvious at a glance, as it would hide allocations. Not that you can't detect that, just that detection won't be "at a glance".
...C++ does this right: forbids implicit captures and makes the programmer specify exactly how a capture should be performed. IMO, D should follow suit.
But at the very least, as Vladimir suggested, forming a closure that captures loop variables by reference should be @system. Code from first message should fail to compile in @safe, reporting "Closure over variable `<insert_variable_name_here>` defined in loop scope is not allowed in @safe code". Further language enhancements should follow, allowing for solutions that don't require silly wrappers. There shouldn't be ad-hoc fixes though. D has enough special cases as is.
Could someone make an executive decision as to which of the two reports should stay open? That this issue is nearing second half of its second decade should also warrant this to become critical.
Comment #10 by bugzilla — 2022-02-26T05:57:20Z
Replying to comment #1:
The observed behavior is a direct result of lambdas in D not capturing by value, they capture by reference.
It is also a direct result of loop variables, `i` and `index` although disappearing at the end of the scope, are recreated in the exact same place. Hence all the lambda references refer to the same location where each of the references points to.
The bug here is that the lambda's lifetime, in both loops, exceeds the lifetime of the variables `i` and `index`.
The error should be emitted when the first lambda, which refers to `i`, is assigned to dgs[], which has a lifetime longer than `i`. The same goes for the second lambda and `index`.
Hence, the following simplified code should fail to compile:
@safe
void test() {
int delegate() dg;
foreach (i; 0 .. 10) {
dg = () { return i; };
}
}
Comment #11 by bugzilla — 2022-02-26T06:05:24Z
Even allocating a closure on the gc heap is not a solution here, because only one allocation will be made which will be shared by each lambda, exhibiting the same behavior.
Comment #12 by deadalnix — 2022-05-24T16:21:23Z
(In reply to Stanislav Blinov from comment #9)
> (In reply to deadalnix from comment #8)
>
> > You'll not that C++'s std::function will allocate on heap if you capture.
>
> ??? std::function MAY allocate on the heap, and whether it will would depend
> on its implementation and size of lambda's state. A decent implementation of
> std::function surely would not allocate if all you capture is a single
> reference.
>
> > The equivalent code in C++ WILL allocate in a loop too.
>
> Whether std::function would allocate is irrelevant. Equivalent C++ code
> would, generally speaking, print unspecified values for all but the string:
>
> #include <functional>
> #include <vector>
> #include <iostream>
>
> int main(int argc, char** argv) {
> std::vector<std::function<void()>> dgs;
>
> for (int i = 0; i < 10; i++) {
> // Capture by reference, as that's D's semantics
> dgs.emplace_back([&i] () {
> std::cout << i << "\n";
> });
> }
>
> dgs.emplace_back([] () {
> std::cout << "With cached variables" << "\n";
> });
>
> for (int i = 0; i < 10; i++) {
> int index = i;
> // Capture by reference, as that's D's semantics
> dgs.emplace_back([&index] () {
> std::cout << index << "\n";
> });
> }
>
> for (auto& dg: dgs) {
> dg();
> }
>
> return 0;
> }
>
This code is invalid C++, whatever you deduce from it engages only yourself and does not involve C++ in any way. In fact, this code could format your hard drive and and would still be within the C++ spec.
Comment #13 by deadalnix — 2022-05-24T16:32:54Z
(In reply to Walter Bright from comment #10)
> Replying to comment #1:
>
> The observed behavior is a direct result of lambdas in D not capturing by
> value, they capture by reference.
>
This is true of the first exemple, for which things behave apropriately.
This is not true of the second one, as, even though it capture by reference, it capture a reference to a new variable every time.
> Even allocating a closure on the gc heap is not a solution here, because only one allocation will be made which will be shared by each lambda, exhibiting the same behavior.
What's nice when most options are incorrect, is that it's fairly obvious what the correct ones are. in this cases, there are 3:
- All variables have function scope, like in python.
- A new closure is allocated for every loop iteration, like in JavaScript or C#.
- Don't pretend you are a safe language in any way like C++ and put the burden of getting things right on the user.
Considering D as it stands, it is obvious that the second choice is the correct one. But can also chose to do away with RAII and go for the first option. Or we can get rid of @safe .
Or we can ignore the issue for one more decade and hope it goes away as everybody moves to rust.
Comment #14 by bugzilla — 2022-07-28T06:09:03Z
I don't much care for the "allocate every iteration of the loop" because it may not be obvious to the user that this is happening, and may cause an awful lot of memory to be allocated.