Bug 21565 – @safe code allows modification of a scalar that overlaps with a pointer

Status
NEW
Severity
major
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2021-01-20T19:30:39Z
Last change time
2024-12-13T19:14:10Z
Keywords
safe, spec
Assigned to
No Owner
Creator
Steven Schveighoffer
Blocks
19916
Moved to GitHub: dmd#19858 →

Comments

Comment #0 by schveiguy — 2021-01-20T19:30:39Z
According to https://dlang.org/spec/function.html#safe-aliasing It states: When one memory location is accessible with two different types, that aliasing is considered safe if: 1. both types are const or immutable; or 2. one of the types is mutable while the other is a const-qualified basic data type; or 3. both types are mutable basic data types; or 4. one of the types is a static array type with length zero; or 5. one of the types is a static array type with non-zero length, and aliasing of the array's element type and the other type is safe; or 6. both types are pointer types, and aliasing of the target types is safe, and the target types have the same size. All other cases of aliasing are considered unsafe. However, if you access a scalar overlapping a pointer, the access and even mutation of the scalar is considered @safe by the compiler: union T {int x; int *y;} void main() @safe { T t; t.x = 5; // *t.y = 5; // error in @safe, but not in @trusted } Such access should be considered illegal as it does not fit into any of the categories. 1, 2, 4, 5, and 6 trivially do not apply. Whether int * is considered a "basic type" is possibly open to interpretation, but I would say it is not, considering that if it were, then rule 2 would allow arbitrary pointer usage. This disqualifies 3, or at least suggests an edit is in order. One might suggest that there is no harm in allowing mutating a scalar that overlaps with a pointer if the pointer cannot be accessed. But this is a naive view of code. Not all code is @safe, and if @trusted code cannot be reasoned about without also having to manually verify all @safe code, then there is no point to @trusted code. The access should be disallowed.
Comment #1 by ag0aep6g — 2021-01-20T19:43:43Z
(In reply to Steven Schveighoffer from comment #0) > Such access should be considered illegal as it does not fit into any of the > categories. 1, 2, 4, 5, and 6 trivially do not apply. Whether int * is > considered a "basic type" is possibly open to interpretation, but I would > say it is not, considering that if it were, then rule 2 would allow > arbitrary pointer usage. This disqualifies 3, or at least suggests an edit > is in order. Just clarifying one little thing: `int*` is definitely not a "basic data type". As a pointer, it's a "derived data type". Definitions are here: <https://dlang.org/spec/type.html>.
Comment #2 by snarwin+bugzilla — 2021-01-20T22:47:53Z
> One might suggest that there is no harm in allowing mutating a scalar that > overlaps with a pointer if the pointer cannot be accessed. But this is a > naive view of code. Not all code is @safe, and if @trusted code cannot be > reasoned about without also having to manually verify all @safe code, then > there is no point to @trusted code. I think there are two possible interpretations here. Background: @trusted code is permitted by the language spec to assume its arguments are free from unsafe aliasing and unsafe values [1], so anything that allows unsafe values or unsafe aliasing to be created in @safe code is a bug in @safe, not in any particular piece of @trusted code. The question is: should the value of `t` after `t.x = 5`, in comment 1's example, be considered an unsafe value? If we go by the definition in the spec, the answer is clearly "yes": > A struct/union instance is safe when: > > * the values of its accessible fields are safe, and > * it does not introduce unsafe aliasing with unions. The union type `T` overlaps an integer and a pointer, so *any* value of type `T` is automatically unsafe. It follows from this interpretation that we should not be allowed to use `T` in @safe code *at all*, and that *any* @trusted function that receives an argument of type `T` is allowed to cause undefined behavior, regardless of that argument's value. This is probably not the conclusion we wanted to end up at, so let's try again. If we amend the spec as follows: > A struct/union instance is safe when: > > * the values of its accessible fields are safe, and > * it does not introduce unsafe aliasing with unions **that is accessible > from @safe code**. ...then `t`'s value becomes safe, and we are allowed to use it in @safe and @trusted code as long as we are careful not to let @safe code access `t.x` and `t.y` at the same time. I think this interpretation is much more useful, and almost certainly the intended one, so I suggest that this is really a bug in the spec, not the implementation. [1] https://dlang.org/spec/function.html#safe-interfaces
Comment #3 by schveiguy — 2021-01-20T23:48:32Z
A union between a pointer and integer is most definitely unsafe in all instances. If you never intend to access the int*, in any circumstance, then why have a union? If you do intend to access the int *, then having any safe code anywhere just change the integer ruins the any safety assumptions that the @trusted or @system code can make. Essentially, it means @trusted code can never access such a union reliably except to access just the integer. This means that T is OK to use ONLY in @system code, or ONLY in @safe code, but NEVER in @trusted code (unless you just follow the @safe rules). I don't feel like we should bend the spec over backwards to fit with the implementation, when there isn't really a benefit (other than being able to close a bug report).
Comment #4 by snarwin+bugzilla — 2021-01-21T04:15:41Z
Consider the following example: --- union T { int x; int* y; } @trusted void example(T t) { import std.stdio; t.x = 123; writeln(t.x); t.y = new int; writeln(t.y); } --- This code is memory-safe. It contains no undefined behavior. Any @safe function can call this code with any possible value of `t`, and it will not corrupt memory. It also accesses both members of `t` and would not compile if annotated with @safe (i.e., it does not "follow the @safe rules"). The *intent* of the spec is clearly to allow code like this to be marked as @trusted. If the current wording of the spec does not allow that, then the spec's wording does not match its intent, and the wording should be changed.
Comment #5 by razvan.nitu1305 — 2021-01-21T10:26:30Z
(In reply to Steven Schveighoffer from comment #3) > A union between a pointer and integer is most definitely unsafe in all > instances. If you never intend to access the int*, in any circumstance, then > why have a union? It may be safe if the user sets the integer part with valid memory addresses. However, the compiler cannot know that. > If you do intend to access the int *, then having any safe code anywhere > just change the integer ruins the any safety assumptions that the @trusted > or @system code can make. Essentially, it means @trusted code can never > access such a union reliably except to access just the integer. Trusted does not offer any guarantees. You can do whatever you want there. If you want to access a pointer that is overlapped with an integer that is the users' problem not the typesystems'. You cannot assume anything with regards to that pointer, that is the reason why it is not allowed in @safe code. In case you do use and you have a segfault, then the developer will have to audit the trusted blocks, not the @safe ones. > This means that T is OK to use ONLY in @system code, or ONLY in @safe code, > but NEVER in @trusted code (unless you just follow the @safe rules). > > I don't feel like we should bend the spec over backwards to fit with the > implementation, when there isn't really a benefit (other than being able to > close a bug report). I think that this is an issue were reasonable people may disagree, but the fact is that @safe is checked with regards to operations not data structures. There is no concept of @safe union or @system union in D. It is the way you use it that makes it @safe/@system. From this point of view, setting an integer that is overlapped with a pointer is not unsafe, however accessing a pointer that is overlapped with an integer is.
Comment #6 by ag0aep6g — 2021-01-21T15:51:17Z
(In reply to Paul Backus from comment #2) > The question is: should the value of `t` after `t.x = 5`, in comment 1's > example, be considered an unsafe value? [...] > If we amend the spec as follows: > > > A struct/union instance is safe when: > > > > * the values of its accessible fields are safe, and > > * it does not introduce unsafe aliasing with unions **that is accessible > > from @safe code**. > > ...then `t`'s value becomes safe, and we are allowed to use it in @safe and > @trusted code as long as we are careful not to let @safe code access `t.x` > and `t.y` at the same time. > > I think this interpretation is much more useful, and almost certainly the > intended one, so I suggest that this is really a bug in the spec, not the > implementation. > > [1] https://dlang.org/spec/function.html#safe-interfaces I can say that I did not intend that interpretation when I added "safe aliasing" to the spec. But I might have been overly conservative. Currently, the spec explicitly says that @safe code "Cannot access unions that have pointers or references overlapping with other types."[1] If that's so, then I guess it doesn't matter whether the union's aliasing is safe or not, because @safe code can't access it at all. And I guess it's still okay even if we change that to allow access to the non-pointy bits of the union, because the @safe code just sees another harmless int (or whatever). As far as I can tell, that means all possible union values can be considered safe (just like all ints are safe), and we can remove the parts about "[introducing] unsafe aliasing with unions" from the "save values" section. One must still be careful not to expose the unsafe aliasing to @safe code in other ways, but that's already covered by the spec. I wouldn't have designed it this way, but it seems to be what DMD is going for. [1] https://dlang.org/spec/function.html#safe-functions
Comment #7 by schveiguy — 2021-01-21T16:39:00Z
(In reply to Paul Backus from comment #4) > The *intent* of the spec is clearly to allow code like this to be marked as > @trusted. If the current wording of the spec does not allow that, then the > spec's wording does not match its intent, and the wording should be changed. I'm not disagreeing with the requirement that system/trusted code should be needed to access aliased values. I'm disagreeing with the ability of safe code to access any part of this. And the spec currently says that. Consider this function: void example(ref T t) @trusted; This function has to assume that t only is valid as an integer, never as a pointer. Because safe code can *only* access and/or mutate the integer. In that case, what's the point of the union? Even in your example, you simply ignore the parameter (it might as well not be there). Not only that, but even if it sets the pointer in t, it must be automatically assumed once the function ends that the pointer value is no longer valid (it went back into safe-land where the code can happily mutate anything it wants in t). The union becomes an unnecessarily complicated integer. the current rules are sound, just nonsensical. It makes such unions pointless when writing safe code.
Comment #8 by schveiguy — 2021-01-21T17:12:51Z
(In reply to RazvanN from comment #5) > (In reply to Steven Schveighoffer from comment #3) > > If you do intend to access the int *, then having any safe code anywhere > > just change the integer ruins the any safety assumptions that the @trusted > > or @system code can make. Essentially, it means @trusted code can never > > access such a union reliably except to access just the integer. > > Trusted does not offer any guarantees. This is why I said "assumptions" and not "guarantees". It's in fact impossible for a trusted function to guarantee the union state between calls, because safe code can do anything it wants with the union so long as there is a mutable basic type involved. > You can do whatever you want there. > If you want to access a pointer that is overlapped with an integer that is > the users' problem not the typesystems'. You cannot assume anything with > regards to that pointer, that is the reason why it is not allowed in @safe > code. In case you do use and you have a segfault, then the developer will > have > to audit the trusted blocks, not the @safe ones. Auditing is not possible. The trusted code cannot make any assumptions about the union, it must ALWAYS treat it as an integer (at least when it comes in). Inside the function, sure, it can assign things. But as soon as it leaves the function, it must be treated as an integer again. This means, such unions serve no purpose as parameters to trusted code. Ever. In which case, using such a union in safe code is pointless. It's more productive to follow the rules already set in the spec. > I think that this is an issue were reasonable people may disagree, but the > fact is that @safe is checked with regards to operations not data > structures. There is no concept of @safe union or @system union in D. It is > the way you use it that makes it @safe/@system. From this point of view, > setting an integer that is overlapped with a pointer is not unsafe, however > accessing a pointer that is overlapped with an integer is. This is a misunderstanding. @safe has everything to do with data structures, and the semantics surrounding those data structures. If we don't have data rules for @safe code, then @trusted code cannot make any reasonable assumptions about the incoming parameters. Remember that @trusted code MUST ASSUME it is being called from @safe code. For example, @safe ascribes a semantic meaning to the length field of an array. It means "all the items that are pointed to by the pointer that are up to that length are accessible." Without that, arrays of data could not be passed to a @trusted function. We also have rules for an incoming pointer, which means, it only can point at a single valid value, or null. Using these rules, one can write useful @trusted code. Without such rules, either trusted code would have to assume any incoming parameters are suspect, or we have to review all code including @safe code. With a rule of "Incoming union values can only ever have scalar values set" is unnecessarily limiting. If you can only set the scalar values, why have a union that is usable in safe code that has pointer and scalar values mixed? It's what makes the DIP1035 proposal so promising -- you can ascribe your own semantic rules to types that the compiler doesn't have built in.
Comment #9 by snarwin+bugzilla — 2021-01-21T17:19:16Z
> I'm disagreeing with the ability of safe code to access any part of this. On what grounds? The point of @safe is to prevent undefined behavior, and allowing access to the integer cannot possibly lead to undefined behavior, because all integer values are safe values. > the current rules are sound, just nonsensical. It makes such unions pointless when writing safe code. I agree--which is why I would like to replace them with rules that are both sound *and* sensical. Can we agree that that's a desirable goal?
Comment #10 by schveiguy — 2021-01-21T17:23:01Z
(In reply to Paul Backus from comment #9) > > I'm disagreeing with the ability of safe code to access any part of this. > > On what grounds? The point of @safe is to prevent undefined behavior, and > allowing access to the integer cannot possibly lead to undefined behavior, > because all integer values are safe values. Read-only access is fine. Write access is not. > > > the current rules are sound, just nonsensical. It makes such unions pointless when writing safe code. > > I agree--which is why I would like to replace them with rules that are both > sound *and* sensical. Can we agree that that's a desirable goal? I can't say no to the agreement ;) I just don't know what the definition of "sensical" means, based on your prior messages. What rules do you have in mind?
Comment #11 by snarwin+bugzilla — 2021-01-21T17:45:58Z
> Read-only access is fine. Write access is not. Again, on what grounds do you make this claim? Can writing to the integer member cause undefined behavior in @safe-only code? If so, please provide an example. > I just don't know what the definition of "sensical" means, based on your > prior messages. What rules do you have in mind? What I have in mind is to change the definition of "unsafe value" for unions to the following: > A struct/union instance is safe when: > > * the values of its accessible fields are safe, and > * it does not introduce unsafe aliasing with unions that is accessible > from @safe code. This change does not, as far as I can tell, introduce unsoundness into the language. It does not allow undefined behavior to occur in @safe code. If you believe I am mistaken about this, please correct me. The reason I call this "sensical" is that *unnecessarily* excluding values from the definition of "safe value" makes the language more difficult to use without any benefit to soundness or memory-safety. Ideally, we would like @safe to impose on the programmer only those restrictions that are truly necessary in order to avoid undefined behavior.
Comment #12 by schveiguy — 2021-01-21T18:29:35Z
(In reply to Paul Backus from comment #11) > > Read-only access is fine. Write access is not. > > Again, on what grounds do you make this claim? Can writing to the integer > member cause undefined behavior in @safe-only code? If so, please provide an > example. On the grounds that it's not desirable. It does not cause undefined behavior, just useless behavior. We are better off disallowing it. > What I have in mind is to change the definition of "unsafe value" for unions > to the following: > > > A struct/union instance is safe when: > > > > * the values of its accessible fields are safe, and What does this mean? All individual values are safe according to D. > > * it does not introduce unsafe aliasing with unions that is accessible > > from @safe code. This is not very specific. > This change does not, as far as I can tell, introduce unsoundness into the > language. It does not allow undefined behavior to occur in @safe code. If > you believe I am mistaken about this, please correct me. It's not about being @safe or not. That's why I said the rules are sound. It's just that the rules leave us with the reality that using such unions usable in @safe or @trusted code has no utility. > The reason I call this "sensical" is that *unnecessarily* excluding values > from the definition of "safe value" makes the language more difficult to use > without any benefit to soundness or memory-safety. More difficult than just using an integer instead of a union to represent an integer? > Ideally, we would like > @safe to impose on the programmer only those restrictions that are truly > necessary in order to avoid undefined behavior. There has to be consideration of what semantic meanings the application needs to be able to enforce. Disallowing @safe access to the scalars actually INCREASES the amount of code that is allowed to be marked @safe. That really should be the goal for @safe. Leaving the rules as-is just means such unions, when passed into @trusted code, must treat it the same as @safe code, and therefore they become simply integers. While that's sound, and not allowing undefined behavior, it means writing e.g. a tagged union that has any @safe code is impossible. It must all be @trusted or inferred w/ review.
Comment #13 by schveiguy — 2021-01-21T18:32:59Z
(In reply to Steven Schveighoffer from comment #12) > It's just that the rules leave us with the reality that using such unions > usable in @safe or @trusted code has no utility. I rewrote this several times, and it looks terrible. I mean: It's just that the rules leave us with the reality that such unions in @safe or @trusted code have no utility.
Comment #14 by snarwin+bugzilla — 2021-01-21T19:12:33Z
(In reply to Steven Schveighoffer from comment #12) > > On the grounds that it's not desirable. It does not cause undefined > behavior, just useless behavior. We are better off disallowing it. "I don't like it" is not a technical argument, and should have no place in a technical discussion. > What does this mean? All individual values are safe according to D. If you really believe this, then you do not understand D's memory-safety system well enough to contribute usefully to this discussion, and I am wasting both my time and yours by continuing to respond. > It's not about being @safe or not. That's why I said the rules are sound. > It's just that the rules leave us with the reality that using such unions > usable in @safe or @trusted code has no utility. If it's "not about being @safe or not", then what on Earth *is* it about? Personally, I think @safe should allow all code that the compiler can prove is memory-safe, regardless of whether you, I, or anyone else thinks it "has utility" or not. I am rather surprised that this is a controversial point of view.
Comment #15 by schveiguy — 2021-01-21T22:01:48Z
(In reply to Paul Backus from comment #14) > (In reply to Steven Schveighoffer from comment #12) > > > > On the grounds that it's not desirable. It does not cause undefined > > behavior, just useless behavior. We are better off disallowing it. > > "I don't like it" is not a technical argument, and should have no place in a > technical discussion. That's not my argument. > > What does this mean? All individual values are safe according to D. > > If you really believe this, then you do not understand D's memory-safety > system well enough to contribute usefully to this discussion, and I am > wasting both my time and yours by continuing to respond. Basic types and pointers are all accessible using @safe. I can access int, int * perfectly fine in @safe code. It's the aliasing of the two that is a problem. Frankly I think you are misinterpreting what I'm saying, or I am doing the same for you. Wasting time might definitely be what you are doing. > > > It's not about being @safe or not. That's why I said the rules are sound. > > It's just that the rules leave us with the reality that using such unions > > usable in @safe or @trusted code has no utility. > > If it's "not about being @safe or not", then what on Earth *is* it about? The whole point of @safe is to avoid code review. Otherwise it's a glamorized linter. If you have to review @safe code to make sure things outside the safe code are actually memory safe, then you have lost the battle. Imagine that D does not have builtin slices (and get rid of the rules safe defines around them). Then you need a structure to pass slices into a @trusted function: struct Array(T) { T* ptr; size_T length; } A @trusted function that accepts this type has 2 options: 1. it can't do ANYTHING with the data beyond the one value pointed at by ptr, because @safe code is allowed to set length to anything it wants. 2. It can use the data beyond the first element, but then you have to review all @safe functions that call it. It's the fact that the compiler disallows mutable access to length that we can reason about what this semantically means as a parameter to a @trusted function. Therefore, I don't have to review any array-using safe code for memory safety because I know that the semantic invariant is held by the compiler. Likewise, a union of an int and an int * semantically MUST mean int today in safe code AND TRUSTED CODE. If you access the int * after any safe code has run with it, it must be considered memory unsafe. So for instance: struct S { union X { int x; int *y; } X val; @safe { void a(); void b(); void c(); void d(); } @trusted void e() { /* use val.y */ } } How do you review that the usage of val.y is safe? the answer is: you review a, b, c, d, in addition to e. Now you are reviewing safe code to make sure it's safe in the context of val. This is useless. val might as well be an int, or a, b, c, d might as well be marked trusted. So the logical conclusion is, e cannot use val.y. And if it cannot use it, then what's the point of having it? If we know that a, b, c, and d can never set the value of val.x or val.y, then we don't have to review them at all. Now we are only reviewing e, which is the intent of D's safety system. I'm not arguing that the current implementation is unsafe, just that the current semantic guarantees make using such unions pointless in the context of safe/trusted code. The point of a union is to use all the members in it. If there is one member that cannot be used, then it shouldn't be part of the union. > Personally, I think @safe should allow all code that the compiler can prove > is memory-safe, regardless of whether you, I, or anyone else thinks it "has > utility" or not. The @safe rules provide a framework for proof of memory safety where we can avoid reviewing whole sections of code. The compiler isn't proving safety, it's just enforcing rules. We create the rules to make sure memory safety cannot be violated even without a careful review of certain functions AND that @safe/@trusted code is reasonable to write with those rules. For example, let's say we changed the @safe rules to say only arrays and references are allowed to be dereferenced in @safe code, never pointers. Now, safe code can read and write pointers, even write arbitrary values. That's perfectly safe. And perfectly useless. Isn't that just asking to make trusted code even less safe? How does one use @trusted code with a pointer when you can never know if the safe code that passed it to you has just set arbitrary values? What do we gain as a language by allowing setting pointers up as garbage in @safe code? Such a rule would mean, pointers are safe to use in @trusted functions as long as you don't use them as pointers, only as bits. This is the same rule we are talking about. I don't see why the rule is desirable, and I am surprised that this is a controversial position.
Comment #16 by snarwin+bugzilla — 2021-02-28T02:45:07Z
(In reply to Steven Schveighoffer from comment #15) > > The whole point of @safe is to avoid code review. Otherwise it's a > glamorized linter. If you have to review @safe code to make sure things > outside the safe code are actually memory safe, then you have lost the > battle. I think this is probably the root of this misunderstanding. There is nothing in the spec (or the implementation) that *requires* you to limit your review to @trusted code. If your proof of memory safety is based on manual review of @safe code, then as far as the language is concerned, that's a perfectly valid proof. I agree completely that D should make it *possible* for the programmer to limit their checking to @trusted code (see: DIP 1035), and that @trusted code that relies on manual review of @safe code should be subject to close scrutiny. But I think it would be a step too far for the language spec to outright *forbid* anything that requires any manual review of @safe code--especially since there is no practical way the compiler could enforce such a restriction.
Comment #17 by robert.schadek — 2024-12-13T19:14:10Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/19858 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB