← Back to index | Original Bugzilla link

Bug 19916 – union member access should be un-@safe

Status: NEW
Severity: enhancement
Priority: P4
Component: dmd
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2019-05-28T22:24:20Z
Last change time: 2024-12-13T19:03:35Z
Keywords: safe, spec
Assigned to: No Owner
Creator: Manu
Depends on: 21561, 21565, 20655

Comments

Comment #0 by turkeyman — 2019-05-28T22:24:20Z

It should be invalid to access members of unions in @safe code. It is necessary that code that interacts with unions implements the runtime checks in a @trusted function. struct S { union { int x; float y; } bool whichOne; } void fun() @safe { S s; int t = s.x; // COMPILE ERROR: NOT SAFE s.x = 10; // COMPILE ERROR: NOT SAFE }

Comment #1 by iamthewilsonator — 2019-05-29T05:12:15Z

This is already the case of the union contains pointers.

Comment #2 by turkeyman — 2019-05-29T07:32:59Z

I have seen such a complaint appear occasionally. I don't know what motivates it, but the truth of the matter is, literally any access to any member of a union is unsafe.

Comment #3 by simen.kjaras — 2019-05-29T09:05:52Z

@safe deals with memory safety. Can you show a case where a union containing no pointers or references causes memory safety issues?

Comment #4 by turkeyman — 2019-05-29T18:01:00Z

union U { int x = 10; float f = void; NotAPointer np = void; } U u; float r = 10.0 + u.f; // <- f is uninitialised, r is junk np.callMethod(); // <- np is not a pointer, but still uninitialised, certain crash etc. Any access to a union member may interact with uninitialised memory. Accessing uninitialised memory is a memory safety issue.

Comment #5 by simen.kjaras — 2019-05-30T00:13:27Z

(In reply to Manu from comment #4) > union U > { > int x = 10; > float f = void; > NotAPointer np = void; > } > U u; > > float r = 10.0 + u.f; // <- f is uninitialised, r is junk This is not a memory safety issue. > u.np.callMethod(); // <- np is not a pointer, but still uninitialised, certain > crash > > Any access to a union member may interact with uninitialised memory. > Accessing uninitialised memory is a memory safety issue. I think I found an example that qualifies: module someothermodule; struct NotAPointer { private size_t n; @disable this(); @trusted this(int* p) { assert(p.isValid); n = cast(size_t)p; } @trusted void callMethod() { *cast(int*)n = 3; } } While NotAPointer's wanton casting looks suspicious, p.isValid ensures the pointer will stay valid as long as NotAPointer exists. We could even have a private constructor and force the use of a factory function to ensure no stupid pointers are passed to the constructor. Also, the use of @disable this() and the private member ensures @safe user code won't be able to set n to something silly. ...except for unions and void initialization. Both of these wreak havoc with the assumptions of NotAPointer. So yeah, I concede - there are clear ways to cause memory safety issues by simple union member access in @safe methods.

Comment #6 by turkeyman — 2019-05-30T04:47:21Z

Glad you agree. I think it's overwhelmingly simple; if the compiler can't confirm that the memory it's accessing is valid, then it's not @safe, and if that's not the definition of @safe, then we need to fix the definition. Accessing uninitialised memory is absolutely a memory safety issue. I don't know where this idea that it has strictly to do with pointers comes from? Why would safety be limited that way? Anyway, let's fix this.

Comment #7 by dkorpel — 2019-05-30T08:56:40Z

(In reply to Manu from comment #6) > Accessing uninitialised memory is absolutely a memory safety issue. Not per se. This compiles, prints a random number, and doesn't corrupt memory. ``` import std; void main() @safe { int a = void; writeln(a); } ``` > I don't know where this idea that it has strictly to do with pointers comes from? > Why would safety be limited that way? Paraphrasing Walter from his DConf 2017 keynote, memory safety is not about 'no memory related bugs', it's "a concern in software development that aims to avoid software bugs that cause security vulnerabilities dealing with random-access memory (RAM) access, such as buffer overflows and dangling pointers". Uninitialized / overlapped pointers can cause such issues, uninitialized integers can not. Disallowing a simple harmless sum-type in @safe invites more use of @trusted giving more opportunities for actual memory corrupting bugs to creep in. Not to mention it would break existing code. Unless there is a way to actually corrupt memory in @safe code using this (without using @trusted) it's not something @safe should prevent.

Comment #8 by simen.kjaras — 2019-05-30T10:54:48Z

(In reply to Dennis from comment #7) > Unless there is a way to actually corrupt memory in @safe code using this > (without using @trusted) it's not something @safe should prevent. Did you see my example code? NotAPointer is perfectly safe to use in @safe code, and presents an interface that encodes that. To reiterate, this may be weird code, but it should be perfectly fine to use in @safe: struct NotAPointer { private static int* p = null; private static int len; private static void initialize() { import core.stdc.stdlib; len = 100; p = cast(int*)malloc(int.sizeof*len); } int idx; @disable this(); @trusted this(int i) { if (p is null) initialize(); assert(i >= 0 && i < len); idx = i; } @trusted void callMethod() { // We know idx can only be set by the constructor, which checks that // it's valid, and initializes p correctly, so no bounds check is // necessary at this point. p[idx] = 3; } } There are currently only two ways to make that code do bad things in @safe code, and that's unions and void initialization: @safe unittest { NotAPointer p = void; p.callMethod(); } @safe unittest { union U { int n; NotAPointer np; } U u = {10}; u.np.callMethod(); }

Comment #9 by dkorpel — 2019-05-30T12:01:17Z

(In reply to Simen Kjaeraas from comment #8) > Did you see my example code? Yes, that's why I added "without using @trusted". @safe disallows uninitialized/overlapping pointers, you explicitly circumvented that by casting to an integer type and made it @trusted, and now it breaks. That means your code couldn't be @trusted because you didn't account for overlap, not that overlap of non-pointers is unsafe. Your second version states: // We know idx can only be set by the constructor That's clearly false, that's even the point of the example. A logic bug causing your @trusted code to break is not a violation of @safe. I'm a bit surprised that with `@disable this();` void and union initialization is still allowed. (n.b. you can also do `NotAPointer.init.callMethod();`). Arguably that is an issue of @disable this. To me the larger issue is that @trusted is too easily slapped on a function without considering all the ways in which your code can be used. You can't expect every programmer to check that their @trusted code is safe in every imaginable setting / with every possible language feature, when even experienced D programmers get it wrong. (See for example https://github.com/atilaneves/automem/issues/25 or https://issues.dlang.org/show_bug.cgi?id=19777). In theory, a trusted function should be memory safe any way you call it from safe code. In practice, some trusted functions have oversights or make assumptions about the call site / context, meaning that a function calling @trusted code is also really @trusted instead of safe. Removing more and more things you can do from @safe code does not seem like a good solution to me. What if I modify idx in your example? You can make it private, but then the entire module is implicitly @trusted. In fact, the whole project becomes @trusted because of __traits(getMember). Shall we make that not @safe too?

Comment #10 by turkeyman — 2019-05-31T21:31:08Z

(In reply to Dennis from comment #7) > (In reply to Manu from comment #6) > > Accessing uninitialised memory is absolutely a memory safety issue. > > Not per se. This compiles, prints a random number, and doesn't corrupt > memory. > > ``` > import std; > > void main() @safe > { > int a = void; > writeln(a); > } > ``` Hah, well that's obviously broken too! > > I don't know where this idea that it has strictly to do with pointers comes from? > > Why would safety be limited that way? > > Paraphrasing Walter from his DConf 2017 keynote, memory safety is not about > 'no memory related bugs', it's "a concern in software development that aims > to avoid software bugs that cause security vulnerabilities dealing with > random-access memory (RAM) access, such as buffer overflows and dangling > pointers". Uninitialized / overlapped pointers can cause such issues, > uninitialized integers can not. Accessing uninitialised int's (as above) is possibly the most accessible form ob buffer overflow I can imagine. > Disallowing a simple harmless sum-type in @safe invites more use of @trusted > giving more opportunities for actual memory corrupting bugs to creep in. Not > to mention it would break existing code. It's not harmless at all, interaction with uninitialised memory is *the most critical form of safety error I can imagine*. If I were searching for exploits, that's where I would start every time. > Unless there is a way to actually corrupt memory in @safe code using this > (without using @trusted) it's not something @safe should prevent. int x = void; array[x]; // boom

Comment #11 by turkeyman — 2019-05-31T21:37:00Z

> In fact, the whole project becomes @trusted because of __traits(getMember). Shall we make that not @safe too? Maybe. I think what you're demonstrating here is that this should work: void func() @safe { // safecode... ... @trusted { trusted code } ... // safeCode... } That way it's easy to contain the one line of trusted code in a surrounding context without over-elevating @trusted.

Comment #12 by dkorpel — 2019-05-31T22:05:33Z

(In reply to Manu from comment #10) > int x = void; > array[x]; // boom In @safe code that either accesses the array within bounds or gives a run-time range violation. No memory corruption there. > Accessing uninitialised int's (as above) is possibly the most accessible form ob > buffer overflow I can imagine. It's not buffer overflow. It can only _lead_ to buffer overflow in @system or poorly written @trusted code. In @safe code it's merely a logic bug. If we're going to prevent any language aspect that commonly causes bugs, then @safe should also disallow classic for-loops, unsigned numbers and null-pointers. The goals and meaning of @safe are currently clear. Let's not change this by subjectively disabling other things that only 'feel' unsafe but really aren't with respect to memory corruption.

Comment #13 by ag0aep6g — 2019-05-31T22:41:48Z

(In reply to Manu from comment #10) > (In reply to Dennis from comment #7) > > (In reply to Manu from comment #6) > > > Accessing uninitialised memory is absolutely a memory safety issue. > > > > Not per se. This compiles, prints a random number, and doesn't corrupt > > memory. > > > > ``` > > import std; > > > > void main() @safe > > { > > int a = void; > > writeln(a); > > } > > ``` > > Hah, well that's obviously broken too! For reference, that's issue 18016. Walter has an open PR to resolve it by making the value of `a` "implementation defined".

Comment #14 by turkeyman — 2019-06-01T07:54:02Z

> If we're going to prevent any language aspect that commonly causes bugs, > then @safe should also disallow classic for-loops, unsigned numbers and > null-pointers. You're seriously going to suggest that allowing functional access to uninitialised memory is comparable to a for-loop... with a straight face? > The goals and meaning of @safe are currently clear. Let's not change this by > subjectively disabling other things that only 'feel' unsafe but really > aren't with respect to memory corruption. How can you argue that feeding uninitialised memory into ANY transformation pipeline is @safe? Or only 'subjectively' broken? Accessing uninitialised memory doesn't 'feel' unsafe, it's unsafe. No valid result can appear from any process where the input is fed uninitialised data, and it's the first and most obvious place any sane security engineer will look for attack surface. Anyway, we're done here.

Comment #15 by dkorpel — 2019-06-01T10:50:32Z

(In reply to Manu from comment #14) > You're seriously going to suggest that allowing functional access to > uninitialised memory is comparable to a for-loop... with a straight face? Note that we are talking about union member access here, i.e. potential access to garbage memory. Not guaranteed access to garbage memory. The constructs I mentioned are all possible to use right, but common sources of bugs in C. for (size_t i=length; i >= 0; i--) //endless loop because of unsigned underflow for (int i=0; i < length; i++) // endless loop if > 2GB array for (p = str; p; p++) // read past null-byte until segfault int *a = tryFindPointer(); *a = 3; // possible null-dereference struct Obj obj; useObj(&obj); // uninitialized object While these bugs are also possible in D, in @safe code they won't result in memory corruption, only in a segfault or range violation. > How can you argue that feeding uninitialised memory into ANY transformation > pipeline is @safe? @safe does not mean no bugs! @safe is not the same as the English word 'safe'! ``` void main() @safe { int* a; *a = 3; } ``` This is a bug. Arguably it should give an error It may not be 'safe', but it is '@safe' with its current definition. Please mentally replace the term @safe with something like 'insufficient to cause memory corruption'. As mentioned by ag0aep6g, an uninitialized array can allow you to pass the guard page that prevents stack overflow. Therefor, using uninitialized arrays on the stack is sufficient to cause memory corruption and should be fixed for @safe code. Can you show that union member access is sufficient for memory corruption? Alternatively, can you give a new, better definition of @safe that includes union member access and excludes the earlier mentioned risky language constructs? > it's the first and most obvious place any sane security engineer > will look for attack surface. In C, definitely. He will also look for suspicious for-loops and non-trivial bounds checks. Here's an example of one: CUTE_PNG_CHECK(s->out - backwards_distance >= s->begin, "Attempted to write before out buffer (invalid backwards distance)."); https://github.com/RandyGaul/cute_headers/blob/master/cute_png.h#L499

Comment #16 by simen.kjaras — 2019-06-05T11:36:14Z

There's code that simply cannot be @trusted under Dennis' @safe regime that would be perfectly safe without void initialization and union member access. For instance, there's C libraries that return opaque handles that may or may not be actual pointers, and that offer no way to determine if such a handle is valid or not, and may summon nasal demons if invalid handles are used. Also, unions are really a low-level trick that is very rarely used. Disallowing their use in @safe code will impact very little code.

Comment #17 by dkorpel — 2019-06-05T13:32:43Z

(In reply to Simen Kjaeraas from comment #16) > There's code that simply cannot be @trusted under Dennis' @safe regime that > would be perfectly safe without void initialization and union member access. If you don't agree with the 'insufficient to corrupt memory' regime, please propose a new one! The spec is pretty informal currently. If you have a better interpretation, I'm eager to hear it. So far I've only heard 'things any sane security engineer is wary of'. > For instance, there's C libraries that return opaque handles that may or may > not be actual pointers, and that offer no way to determine if such a handle > is valid or not, and may summon nasal demons if invalid handles are used. Then store that handle as a void* or make it absolutely inaccessible from the outside. Disabling @safe union access of non-pointers is neither necessary nor sufficient to prevent invalid handles to be passed to your C library. The constraint 'I only want integer members in my struct with @trusted memory sensitive data' is completely arbitrary and only exists in an attempt to validate this issue. > Also, unions are really a low-level trick that is very rarely used. How do you know? I use them in @safe code. GFM (which is downloaded 31739 times at the time of writing) uses them for their vectors [1] among other things. Any formerly safe game using GFM vectors needlessly breaks! And the quick fix will be to slap @trusted labels in every function where vector members are accessed. How is that for memory safety? [1] https://github.com/d-gamedev-team/gfm/blob/5b86c52582de04b8b2277a3b257507083ed31cbf/math/gfm/math/vector.d#L26

Comment #18 by turkeyman — 2019-06-05T18:42:29Z

(In reply to Dennis from comment #17) > (In reply to Simen Kjaeraas from comment #16) > > There's code that simply cannot be @trusted under Dennis' @safe regime that > > would be perfectly safe without void initialization and union member access. > > If you don't agree with the 'insufficient to corrupt memory' regime, please > propose a new one! Uninitialised memory is ALREADY CORRUPTED. There is no world where access to what can be confirmed to be uninitialised memory is @safe, period. Any interaction with uninitialised memory will guarantee unintended consequences, which could be anything. It's very simple; if you're going to use unions, or uninitialised memory, then do so carefully, and @trusted that block of code. > Then store that handle as a void* or make it absolutely inaccessible from > the outside. Disabling @safe union access of non-pointers is neither > necessary nor sufficient to prevent invalid handles to be passed to your C > library. The constraint 'I only want integer members in my struct with > @trusted memory sensitive data' is completely arbitrary and only exists in > an attempt to validate this issue. I don't understand any sentence in this paragraph. > > Also, unions are really a low-level trick that is very rarely used. > > How do you know? I use them in @safe code. GFM (which is downloaded 31739 > times at the time of writing) uses them for their vectors [1] among other > things. Any formerly safe game using GFM vectors needlessly breaks! And the > quick fix will be to slap @trusted labels in every function where vector > members are accessed. How is that for memory safety? It's an explicit statement that you considered and correctly handled the issues of interacting with unsafe low-level machinery. You shouldn't slap it on functions, you should minimise the surface area as much as possible; perhaps higher-level functions should access the unsafe shit through small helpers. Maybe use a tagged-union helper, etc.

Comment #19 by ag0aep6g — 2019-06-05T19:45:26Z

(In reply to Dennis from comment #17) > If you don't agree with the 'insufficient to corrupt memory' regime, please > propose a new one! The spec is pretty informal currently. If you have a > better interpretation, I'm eager to hear it. So far I've only heard 'things > any sane security engineer is wary of'. Per the spec (<https://dlang.org/spec/function.html#function-safety>), @safe functions "are functions that are statically checked to exhibit no possibility of undefined behavior". I suppose that's equivalent to "doesn't corrupt memory". Memory corruption is always a symptom of undefined behavior, and there is no kind of undefined behavior that doesn't potentially lead to memory corruption (or it wouldn't really be UB). I prefer focusing on undefined behavior, though. It feels more more clear cut to me. When the spec says that something has UB, then it can't be in @safe. Period. Otherwise, when the focus is on memory corruption, there's more (perceived) wiggle room. We're tempted to allow anything that doesn't lead to memory corruption in practice, even if has undefined behavior. That quickly gets messy. (In reply to Manu from comment #18) > Uninitialised memory is ALREADY CORRUPTED. There is no world where access to > what can be confirmed to be uninitialised memory is @safe, period. > > Any interaction with uninitialised memory will guarantee unintended > consequences, which could be anything. > > It's very simple; if you're going to use unions, or uninitialised memory, > then do so carefully, and @trusted that block of code. You might want to argue your position in <https://github.com/dlang/dlang.org/pull/2260>. It's an open PR by Walter where he does exactly what you oppose here: he wants to explicitly allow accessing uninitialized memory in @safe code (unless it involves pointers).

Comment #20 by turkeyman — 2019-06-05T20:26:54Z

> Otherwise, when the focus is on memory corruption, there's more (perceived) > wiggle room. We're tempted to allow anything that doesn't lead to memory > corruption in practice, even if has undefined behavior. That quickly gets > messy. T x = void; // <- DOES lead to memory corruption; it's effectively an explicit statement to do memory corruption. How could a more explicit violation of @safe exist, no matter how you squint and interpret it? > (In reply to Manu from comment #18) > > Uninitialised memory is ALREADY CORRUPTED. There is no world where access to > > what can be confirmed to be uninitialised memory is @safe, period. > > > > Any interaction with uninitialised memory will guarantee unintended > > consequences, which could be anything. > > > > It's very simple; if you're going to use unions, or uninitialised memory, > > then do so carefully, and @trusted that block of code. > > You might want to argue your position in > <https://github.com/dlang/dlang.org/pull/2260>. It's an open PR by Walter > where he does exactly what you oppose here: he wants to explicitly allow > accessing uninitialized memory in @safe code (unless it involves pointers). That PR doesn't appear to have anything to do with @safe?

Comment #21 by ag0aep6g — 2019-06-05T20:50:58Z

(In reply to Manu from comment #20) > T x = void; // <- DOES lead to memory corruption; it's effectively an > explicit statement to do memory corruption. How could a more explicit > violation of @safe exist, no matter how you squint and interpret it? > I don't care strongly either way. You'll have to convince Walter. [...] > > You might want to argue your position in > > <https://github.com/dlang/dlang.org/pull/2260>. It's an open PR by Walter > > where he does exactly what you oppose here: he wants to explicitly allow > > accessing uninitialized memory in @safe code (unless it involves pointers). > > That PR doesn't appear to have anything to do with @safe? But it clearly does. It resolves issue 18016 which is about accessing void initialized memory in @safe code. With the PR, the spec will allow that. DMD already allows it (in conflict with the current spec).

Comment #22 by dkorpel — 2019-06-06T18:22:53Z

(In reply to Manu from comment #18) > There is no world where access to > what can be confirmed to be uninitialised memory is @safe, period. This is just proof by assertion at this point. (https://en.wikipedia.org/wiki/Proof_by_assertion) > Any interaction with uninitialised memory will guarantee unintended > consequences, which could be anything. Counterexample: just print an uninitialized integer. I can predict the consequence pretty well. Also: besides the point. @safe != no bugs and union member access != uninitialized memory. (the discussion got a bit derailed, I suggest continuing in issue 18016 about `= void` and keeping this to union member access) > I don't understand any sentence in this paragraph. Simen argued union access must be unsafe or else this doesn't work: ``` struct S { private int myHandle; // no touchy! @disable this(); this(int x) {myHandle = getValidHandle();} void doSomethingSafe() @trusted {unsafeCfunction(myHandle);} } ``` I say: - it's not necessary, just give S a pointer member - it's not sufficient, __traits(getMember, S, "myHandle") = 0xFFFF; will break it anyway. > You shouldn't slap it on functions, you should minimise the surface area as > much as possible; I agree. Therefore, we shouldn't suddenly reject @safe code that has no chance of corrupting memory, because that invites more use of @trusted increasing the surface area.

Comment #23 by turkeyman — 2019-06-07T00:59:01Z

(In reply to Dennis from comment #22) > (In reply to Manu from comment #18) > > There is no world where access to > > what can be confirmed to be uninitialised memory is @safe, period. > > This is just proof by assertion at this point. > (https://en.wikipedia.org/wiki/Proof_by_assertion) Correct, well spotted. And nice deployment of big words, I appreciate the handy link just in case I'm an idiot. > > Any interaction with uninitialised memory will guarantee unintended > > consequences, which could be anything. > > Counterexample: just print an uninitialized integer. I can predict the > consequence pretty well. > Also: besides the point. @safe != no bugs and union member access != > uninitialized memory. (the discussion got a bit derailed, I suggest > continuing in issue 18016 about `= void` and keeping this to union member > access) Union access is void initialisation. All members other than one are invalid in precisely the same way as that other thread discusses. Concerns related to validity and language semantic violations are identical. > > You shouldn't slap it on functions, you should minimise the surface area as > > much as possible; > > I agree. Therefore, we shouldn't suddenly reject @safe code that has no > chance of corrupting memory, because that invites more use of @trusted > increasing the surface area. Huh? I don't think you understood what I said... I said to narrow the surface area; for instance, don't apply it to functions, apply it within functions.

Comment #24 by nick — 2024-06-03T10:43:33Z

(In reply to Simen Kjaeraas from comment #5) > struct NotAPointer { > private size_t n; > @disable this(); > @trusted this(int* p) { > assert(p.isValid); > n = cast(size_t)p; > } > @trusted void callMethod() { > *cast(int*)n = 3; > } > } The answer to this is to mark `n` as `@system`.

Comment #25 by robert.schadek — 2024-12-13T19:03:35Z

THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/17908 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB