Bug 18016 – using uninitialized value is considered @safe but has undefined behavior

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P3
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2017-11-27T12:18:02Z
Last change time
2023-01-05T18:53:41Z
Keywords
safe, spec
Assigned to
No Owner
Creator
ag0aep6g

Comments

Comment #0 by ag0aep6g — 2017-11-27T12:18:02Z
Spin-off from issue 15542. Difference is that this issue is about `@safe` where 15542 was about `pure`. Code: ---- int f() @safe { int x = void; return x; } ---- This code is currently accepted by dmd, but per the spec it would have to be rejected. On `@safe` the spec says: "Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior." [1] On void initialization it says: "If the Initializer is void, however, the variable is not initialized. If its value is used before it is set, undefined program behavior will result." [2] So `return x;` has undefined behavior, but `@safe` is supposed to prevent undefined behavior. `@safe` should make the code not compile. I see a couple different ways to approach to this: 1) Disallow void initialization in `@safe` code. This would also fix issue 17561 and issue 17566. It would also break existing code that uses void initialization correctly. 2) Give some defined behavior to void initialized values (without indirections). For example, say that the value of `x` is unpredictable, but that the program at large is unaffected (no undefined behavior). With this approach, issue 15542 would become actual. That is, `pure` would need special consideration. 3) Water down `@safe` so that it doesn't protect against all undefined behavior, but only against those instances that break memory safety in practice. This would seem dangerously unprincipled to me. [1] https://dlang.org/spec/function.html#function-safety [2] https://dlang.org/spec/declaration.html#void_init
Comment #1 by razvan.nitu1305 — 2017-11-27T13:52:13Z
How about letting void initialization be acceptable in @safe code only if the value is initialized before being used? In the example from the bug report, the code would error since x is returned before being initialized. But this should be acceptable @safe code: int f() @safe { int x = void; do_some_work(); x = 7; return x; } That would imply an AST walker for the current scope to see if x is initialized anywhere.
Comment #2 by schveiguy — 2017-11-27T14:41:09Z
I personally thought this was not required for memory safety -- since @safe is not allowed to break the type system, having data that is garbage isn't going to corrupt memory, as long as it's not a reference. Note that =void is not allowed in @safe already for reference types (even though the spec doesn't outline that rule). I'd want Walter's opinion on this. I thought @safe was specifically for memory safety, and not preventing all undefined behavior. But the way the spec is currently written, void initialization should be disallowed. My vote would be to relax the undefined behavior of =void for value types (for reference types or types that contain references, keep it UB). (In reply to RazvanN from comment #1) > How about letting void initialization be acceptable in @safe code only if > the value is initialized before being used? I'm not sure any of the rules for @safe functions require such checking. Not only that, but I'm not sure it's completely solvable. That could result in cases where it's clear from reading the code that the function initializes, but it's something the compiler can't tell. It's much more straightforward to disallow it. I think the cost of initialization is so low, that we aren't going to affect code that much. Disallowing =void for value types in safe code would be my second choice.
Comment #3 by uplink.coder — 2017-11-27T15:16:23Z
Checking if a void values escapes a function is as difficult as the scope stuff. void initialized escapes should certainly be disallowed in @safe code.
Comment #4 by bugzilla — 2018-03-04T07:58:04Z
(In reply to Steven Schveighoffer from comment #2) > I'd want Walter's opinion on this. I agree with you. The spec should be fixed to continue to disallow void initialization for reference types, and say that void initialization for value types is implementation defined, not undefined.
Comment #5 by bugzilla — 2018-03-04T08:14:55Z
Comment #6 by turkeyman — 2019-06-05T20:58:33Z
I think that's the wrong solution. Allowing interaction with invalid memory is antithetical to @safe. Why would you want @safe to allow this?
Comment #7 by schveiguy — 2019-06-05T22:14:16Z
It's garbage data, but it's not garbage pointers. As long as the memory is not used to reference anything, it's not going to cause a memory corruption to use it. Why would you want to use this? Because it's more efficient to not initialize stack data before overwriting it with the real value. Can you explain a way that f() is unsafe in the example above? That is, it results in corrupted memory? Or alternatively, show how you can write code that is exploitable or could cause memory corruption? Would you consider this function @safe? int[] allocate(int size) { auto result = cast(int *)malloc(size * int.sizeof); return result[0 .. size]; } It doesn't corrupt any memory, the data is not left dangling, as it's not freed, but it's also not initialized. Is that a big problem?
Comment #8 by turkeyman — 2019-06-05T22:45:38Z
(In reply to Steven Schveighoffer from comment #7) > It's garbage data, but it's not garbage pointers. As long as the memory is > not used to reference anything, it's not going to cause a memory corruption > to use it. You can't know what the memory is going to be used for. You would need astonishingly competent flow-analysis to make judgements of that kind. It could be given as an argument to any operation that references something, perhaps as an offset, or any conceivable thing could be done with that data, and it's 100% guaranteed to be a rubbish operation. It's a varifiably rubbish value, how could that inject valid program flow into any usage context? > Why would you want to use this? Because it's more efficient to not > initialize stack data before overwriting it with the real value. Right, but it requires very special-case handling, and it's error-prone; for instance, you might think you can simply: T x = void; x = T(); For some subset of possible T's that might be fine, but then some T arrives with elaborate assignment semantics and it's a spectacular crash. It would be easy for that to slip through the cracks, or not demonstrate issue on the lib author's projects, but then a customer exposes the issue. @trusted should be a locator for dangerous code, exactly like such an assignment above. That code above is absolutely not @safe, it's making assumptions way outside the language semantics, anything could happen if you're not careful. D has many semantics when operating on objects that make the basic assumption that objects are *valid*. `init` exists for this reason. Every semantic that assumes a valid object is violated by `= void`, which makes every such operation `risky` at best. > Can you explain a way that f() is unsafe in the example above? f() is potentially @safe, assuming that `x` is a type without elaborate assignment (it is `int` above), but it depends on the compiler having powerful flow analysis to determine those facts. So it *could* be @safe, but I don't think DMD has the technology required to prove that at this time? > That is, it > results in corrupted memory? Or alternatively, show how you can write code > that is exploitable or could cause memory corruption? Exposing uninitialised memory is a data leak at best. Many forms of exploit take advantage of leaking private or inaccessible data, but typically it can be used to source or craft values that lead to unexpected or otherwise invalid program flow or improper array offsets. > Would you consider this function @safe? > > int[] allocate(int size) > { > auto result = cast(int *)malloc(size * int.sizeof); > return result[0 .. size]; > } > > It doesn't corrupt any memory, the data is not left dangling, as it's not > freed, but it's also not initialized. Is that a big problem? malloc's not @safe (it's not even D), neither is dynamically slicing a pointer, and the memory is uninitialised. This function is certainly not @safe. @system functions are perfectly fine. There's nothing wrong with @system code, it's just up to the caller to confirm a valid interaction.
Comment #9 by schveiguy — 2019-06-06T13:53:06Z
I'll start by saying, I think the operation is @safe, but I'm not sure it's necessary for @safe code. You can indeed escape @safe with @trusted, so there are likely ways around this. The easiest to prove route here is that we just disable void initialization, and tough, you just have to deal with that. (In reply to Manu from comment #8) > (In reply to Steven Schveighoffer from comment #7) > > It's garbage data, but it's not garbage pointers. As long as the memory is > > not used to reference anything, it's not going to cause a memory corruption > > to use it. > > You can't know what the memory is going to be used for.> You would need > astonishingly competent flow-analysis to make judgements of that kind. > It could be given as an argument to any operation that references something, > perhaps as an offset, or any conceivable thing could be done with that data, > and it's 100% guaranteed to be a rubbish operation. The garbage offset argument is already handled, @safe code will throw an error if you try to escape the bounds of an array. > It's a varifiably rubbish value, how could that inject valid program flow > into any usage context? Only if written incorrectly. The above certainly is useless as is. It's clearly not something you would want to have in your code. But the problem @safe is trying to prevent is corrupting memory. Tailoring @safe to be as narrow as possible allows more leeway in programs that do not corrupt memory. You should be able to say, if you see the @safe tag, this will NOT corrupt memory. > > Why would you want to use this? Because it's more efficient to not > > initialize stack data before overwriting it with the real value. > > Right, but it requires very special-case handling, and it's error-prone; for > instance, you might think you can simply: > T x = void; > x = T(); > > For some subset of possible T's that might be fine, but then some T arrives > with elaborate assignment semantics and it's a spectacular crash. Spectacular crashes can happen in @safe code. This is @safe: int* foo; *foo = 1; // crash However, this raises a good point that =void overrides the expectations of the type itself. If it's expected the type is at least default initialized, setting it to garbage originally can possibly have safety problems, if there is any @trusted code inside the type itself. We could potentially limit =void to POD types that contain no references. > > Can you explain a way that f() is unsafe in the example above? > > f() is potentially @safe, assuming that `x` is a type without elaborate > assignment (it is `int` above), but it depends on the compiler having > powerful flow analysis to determine those facts. > So it *could* be @safe, but I don't think DMD has the technology required to > prove that at this time? The compiler knows the type of an item, it can determine whether it has elaborate assignment without powerful flow analysis. We just "is it OK to =void this type?". We already have it for pointers, maybe we also need it for types that have member functions, or elaborate assignment, or something that determines it's possible to exploit this for memory corruption. > Exposing uninitialised memory is a data leak at best. Many forms of exploit > take advantage of leaking private or inaccessible data, but typically it can > be used to source or craft values that lead to unexpected or otherwise > invalid program flow or improper array offsets. So it comes down to the questions: is it @safe's charter to prevent such things? and can @safe actually guarantee such things don't happen? > > Would you consider this function @safe? > > > > int[] allocate(int size) > > { > > auto result = cast(int *)malloc(size * int.sizeof); > > return result[0 .. size]; > > } > > > > It doesn't corrupt any memory, the data is not left dangling, as it's not > > freed, but it's also not initialized. Is that a big problem? > > malloc's not @safe (it's not even D), neither is dynamically slicing a > pointer, and the memory is uninitialised. This function is certainly not > @safe. I didn't express what I wanted clearly enough. What I meant was, would you consider calling this function to be a safe call? Personally, I would have no problem marking this @trusted, even though none of the integers are initialized. To give you an idea, this is allowed currently in d: https://github.com/dlang/phobos/blob/c5664d4436235cba2606103f8729341ac79a4487/std/array.d#L811-L825
Comment #10 by dfj1esp02 — 2019-06-07T09:01:35Z
AFAIK, Walter's suggestion is not supported by LLVM. Currently LLVM removes code that uses uninitialized value. To work it around LDC will need to initialize variables initialized with void and provide an different way to declare uninitialized variables. Likely not a problem, but results in minor fragmentation of language. I believe LDC way will have a priority, because DMD is not really about performance anyway, so default initialized variables for it are good enough.
Comment #11 by turkeyman — 2019-06-07T17:25:51Z
Please don't do that; allow LDC to continue to remove such code as usual. `= void` initialisation is used specifically to avoid that work. DMD is a reference, we don't want to emulate its had habits.
Comment #12 by schveiguy — 2019-06-10T23:29:52Z
(In reply to anonymous4 from comment #10) > AFAIK, Walter's suggestion is not supported by LLVM. Currently LLVM removes > code that uses uninitialized value. To work it around LDC will need to > initialize variables initialized with void and provide an different way to > declare uninitialized variables. Likely not a problem, but results in minor > fragmentation of language. I believe LDC way will have a priority, because > DMD is not really about performance anyway, so default initialized variables > for it are good enough. I don't understand how the suggestion that the behavior is implementation defined doesn't jive with LLVM's chosen behavior.
Comment #13 by ag0aep6g — 2019-06-11T00:08:21Z
(In reply to Steven Schveighoffer from comment #12) > I don't understand how the suggestion that the behavior is implementation > defined doesn't jive with LLVM's chosen behavior. As a whole, using an uninitialized variable wouldn't be implementation defined. That would be silly. Walter's PR doesn't do that. It only says that the value you get is up to the implementation. Everything else must work as usual. So LLVM would have to give you some value. It wouldn't be allowed to just omit the whole access and everything that depends on it (as it apparently does at the moment).
Comment #14 by turkeyman — 2019-06-11T00:59:21Z
> As a whole, using an uninitialized variable wouldn't be implementation defined. > That would be silly. I don't think it's silly at all. It's perfectly fine. Please don't butcher LDC to accommodate DMD's quirks. Reading from uninitialised memory is an invalid operation, and unnecessarily initialising memory has real cost.
Comment #15 by schveiguy — 2019-06-11T01:16:03Z
(In reply to ag0aep6g from comment #13) > So LLVM would have to give you some value. It wouldn't be allowed to just > omit the whole access and everything that depends on it (as it apparently > does at the moment). How would it omit the return statement? The code above seems to compile with ldc and writeln(f()) seems to print something (0 in fact on my test). Where is the omission? Not saying your assertion is false, I just don't understand how it can omit everything that depends on an access of uninitialized data.
Comment #16 by dfj1esp02 — 2019-06-11T09:07:30Z
(In reply to Steven Schveighoffer from comment #12) > I don't understand how the suggestion that the behavior is implementation > defined doesn't jive with LLVM's chosen behavior. Walter suggests that value should be implementation defined, not behavior. LLVM's optimization is inspired by gcc's understanding of undefined behavior, I suspect it won't provide safety.
Comment #17 by ag0aep6g — 2019-06-11T10:31:36Z
(In reply to Manu from comment #14) > > As a whole, using an uninitialized variable wouldn't be implementation defined. > That would be silly. > > I don't think it's silly at all. It's perfectly fine. > Please don't butcher LDC to accommodate DMD's quirks. > Reading from uninitialised memory is an invalid operation, and unnecessarily > initialising memory has real cost. If we made it implementation defined, the operation would become valid. You would expressly be allowed to write `void main() { int x = void; return x; }`. The spec would say: "I am 100% sure that that's a valid program, and I have no opinion whatsoever on what it should do." I think that'd be silly. As it is, the operation is invalid, because its behavior is undefined. That's not silly. As far as I can tell, you're arguing for this, not for implementation defined behavior. (In reply to Steven Schveighoffer from comment #15) > How would it omit the return statement? > > The code above seems to compile with ldc and writeln(f()) seems to print > something (0 in fact on my test). Where is the omission? Not saying your > assertion is false, I just don't understand how it can omit everything that > depends on an access of uninitialized data. I think I was wrong there. I had tested something like this: ---- int main(string[] args) { int x = void; int y = cast(int) args.length; return x + y; } ---- `ldc2 -O test.d` generates this: ---- 0000000000000000 <_Dmain>: 0: c3 ret ---- Here, everything's gone, including args.length. The result isn't "return args.length plus garbage", it's just "return garbage". That looked to me like LDC omits code that depends on the value of x. But it can be interpreted differently: The value of `x` is arbitrary. LDC can choose it so that whatever `ret` returns is the same as returning x + args.length. Then omitting the args.length line is just normal optimization. So I don't have an example where LDC's current behavior is incompatible with Walter's PR. In fact, when I try to restrict the values that `x` can possibly have, e.g. by returning `x & 1`, LDC catches that and emits code that matches. Maybe LDC already adheres to the "implementation defined value" rule.
Comment #18 by ag0aep6g — 2023-01-05T18:53:41Z
(In reply to ag0aep6g from comment #0) > I see a couple different ways to approach to this: [...] > 2) Give some defined behavior to void initialized values (without > indirections). For example, say that the value of `x` is unpredictable, but > that the program at large is unaffected (no undefined behavior). With this > approach, issue 15542 would become actual. That is, `pure` would need > special consideration. [...] Walter has chosen this option in <https://github.com/dlang/dlang.org/pull/2990> which has been merged. But he didn't bother updating this issue. Closing as fixed. As mentioned before, this revives issue 15542.