Bug 24411 – [CODEGEN] bad shl codegen

Status
RESOLVED
Resolution
INVALID
Severity
major
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86
OS
All
Creation time
2024-02-26T06:19:14Z
Last change time
2024-02-28T09:10:15Z
Assigned to
No Owner
Creator
Manu

Comments

Comment #0 by turkeyman — 2024-02-26T06:19:14Z
I just uncovered a very surprising bug. I have a function like this: bool validCode(ubyte code) { enum validCodes = 0b1111100111001100111111110; return (1 << code) & validCodes; } void test() { assert(validCode(76) == false); // FAILS! } Where `code` is an enum with some sparse values close to zero, and only specified code values are valid. I detect invalid code values by comparing the code value bit against a bitfield. `validCodes` is 32bit, and I made the assumption that `1<<x` where x is greater than 32 would result in 0, and so the function above would return false for `code` values larger than 32. It turns out this is NOT the case, and I get surprising results. This code compiles to: 00007FF6D1FE781B mov eax,1 00007FF6D1FE7820 movzx ecx,byte ptr [code] 00007FF6D1FE7824 shl eax,cl 00007FF6D1FE7826 test eax,1F399FEh In this case, `code` is 76 (an invalid code), and it turns out that the x86 `shl` doesn't shift left by 75 (resulting in 0), what actually happens is that shl takes the lower 5 bits from cl, and shifts by that number, which happens to be 12, so the result is `1 << 12`, which coincides with a 1-bit, and this function returns TRUE in this case! Is the << operator in the language specified to take the lower 5 bits of the operand? I think this is a codegen bug... the language shouldn't assume that the user has clamped the value into the range required by the x86 opcode.
Comment #1 by dfj1esp02 — 2024-02-27T09:41:54Z
Yes, that's how shift normally works: https://dlang.org/spec/expression.html#shift_expressions I believe, the goto solution here is checked int.
Comment #2 by turkeyman — 2024-02-27T11:19:10Z
Okay, my bad. It's in the spec! Surprising; dlang uses prides itself on not having surprise invisible undefined behaviour littered around your code. This seems like a safety concern; it's conceivable an exploit could be written taking advantage of this undefined behaviour.
Comment #3 by dkorpel — 2024-02-27T11:37:09Z
The key here is that it's specified as "implementation defined behavior", not "undefined behavior". It could give a bogus integer and lead to logic bugs, but it can't result in memory corruption in `@safe` code. D's 'safety' is specifically targeting memory safety, not logic bugs in general (e.g. unintentional integer overflow). It's still a systems programming language with similar performance to C. Introducing bounds checks to shift expressions is a big performance hit, especially considering shifts are usually found in bit-twiddling performance sensitive code.
Comment #4 by dfj1esp02 — 2024-02-28T09:10:15Z
Shifts are often hardcoded. If you shift by untrusted amount, then you probably have bit arrays, and if you use bit arrays with untrusted indexes, then you need bound checking, not clear what you try to do, try https://dlang.org/phobos/std_bitmanip.html#BitArray AFAIK most processors simply mask the shift amount. If some processor traps on overflow here, it would be safe, but probably not very useful for you.