← Back to index | Original Bugzilla link

Bug 3850 – Signed/unsigned bytes type name

Status: RESOLVED
Resolution: WONTFIX
Severity: enhancement
Priority: P2
Component: dmd
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2010-02-24T02:33:14Z
Last change time: 2018-05-16T14:53:53Z
Assigned to: No Owner
Creator: bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2010-02-24T02:33:14Z

While programming in D I have seen that you can forget that the "byte" is signed. (Because normally I think of bytes as unsigned entities. Other people share the same idea). (It's similar but not equal to the situation of signed and unsigned chars in C). There are several ways to solve this small problem. One of the simpler ways I can think of is to deprecate the "byte" type name and introduce a "sbyte" type name (that replaces the "byte" type name). Using a sbyte it's probably quite more easy to not forget that it's a signed value. This introduces an inconstancy in the naming scheme of D integral values (they are now symmetric, ubyte, byte, int, uint, etc), but it can help avoid some bugs, especially from D newbies.

Comment #1 by bearophile_hugs — 2010-03-14T18:19:24Z

The signed/unsigned bytes in C# are: - The sbyte type represents signed 8-bit integers with values between -128 and 127. - The byte type represents unsigned 8-bit integers with values between 0 and 255. Choosing ubyte/sbyte is acceptable too.

Comment #2 by andrej.mitrovich — 2012-10-21T19:52:53Z

Although I agree with you I think it's way too late to fix this without breaking tons of code. You can always use an alias in your own code. Adding it to Phobos would probably be unwise too (people would ask what's the difference between byte and sbyte).

Comment #3 by clugdbug — 2012-10-22T02:02:18Z

This is not a newbie issue. I make this mistake myself, fairly often. *Walter* made this mistake once, in the header generation tool! My experience is that 90% of uses of "byte", should instead be "ubyte". It is really, really unusual to be using signed bytes. I wish we could change this. (I would do it by changing the type to "sbyte" and then adding "alias byte = sbyte;" to object.d).

Comment #4 by andrej.mitrovich — 2012-10-22T08:58:02Z

(In reply to comment #3) > This is not a newbie issue. I make this mistake myself, fairly often. Absolutely, it happens to me all the time as well. > I wish we could change this. (I would do it by changing the type to "sbyte" and > then adding "alias byte = sbyte;" to object.d). That still won't prevent you from making the mistake of typing 'byte' instead of 'ubyte' though. :)

Comment #5 by bearophile_hugs — 2012-10-22T09:52:06Z

(In reply to comment #4) > That still won't prevent you from making the mistake of typing 'byte' instead > of 'ubyte' though. :) If you have sbyte and ubyte, and you keep using them consistently, I think this alone helps reduce mistakes a little. And once few years have passed, and using "byte" is considered a bad idiom, D programs in the wild use "byte" less and less, we can even consider deprecating it. There are tons of C++ code that represents null as "0", yet in C++11 there is nullptr, and G++ from version 4.7 has a warning (-Wzero-as-null-pointer-constant) that allows to find usage of "0" to represent null pointer. The most important thing is the desire to improve the situation, then some slow deprecation paths exist.

Comment #6 by clugdbug — 2012-10-23T03:15:33Z

>> I wish we could change this. (I would do it by changing the type to "sbyte" >> and then adding "alias byte = sbyte;" to object.d). > That still won't prevent you from making the mistake of typing 'byte' instead > of 'ubyte' though. :) By itself, no, but anybody can modify their local copy of object.d to remove the alias... A very slow deprecation path is possible.

Comment #7 by kozzi11 — 2012-10-23T07:02:09Z

(In reply to comment #0) > While programming in D I have seen that you can forget that the "byte" is > signed. (Because normally I think of bytes as unsigned entities. Other people > share the same idea). (It's similar but not equal to the situation of signed > and unsigned chars in C). > > There are several ways to solve this small problem. One of the simpler ways I > can think of is to deprecate the "byte" type name and introduce a "sbyte" type > name (that replaces the "byte" type name). Using a sbyte it's probably quite > more easy to not forget that it's a signed value. > > This introduces an inconstancy in the naming scheme of D integral values (they > are now symmetric, ubyte, byte, int, uint, etc), but it can help avoid some > bugs, especially from D newbies. I think byte should be unsigned by default. So I am for sbyte(signed byte - Is there really anyone who need it?) and byte (unsigned byte)

Comment #8 by bearophile_hugs — 2012-10-23T09:32:37Z

(In reply to comment #7) > I think byte should be unsigned by default. So I am for sbyte(signed byte - Is > there really anyone who need it?) and byte (unsigned byte) Ideally I agree with you. In practice D built-in types are prefixed by "u" when unsigned, so a more practical solution is the C# one, that is using the "ubyte" and "sbyte" names pair. Regarding the usefulness of signed bytes: small data types like ubyte, sbyte, short, ushort and even float are mostly useful in aggregates, like arrays and arrays of structs. They are not so useful if you need only one of them. Recently I have used an array of sbyte values to represent indexes in a short array (statically known to be shorter than 127 items). Using 1 byte instad of an int/uint/size_t saves space if you have many of such indexes. And saving space means reducing cache misses. And to represent those indexes I used a sbyte instead of a ubyte because I have used -1 to represent "missing value"). sbyte values are not used often, but it's right to have them too in a system language.

Comment #9 by bearophile_hugs — 2014-10-09T16:02:27Z

An example of the problems caused by the byte/ubyte name pair: http://forum.dlang.org/thread/[email protected] For the mind of most persons a "byte" is not signed.

Comment #10 by mailnew4ster — 2014-10-09T17:28:09Z

Also, BYTE is unsigned in Windows, which adds to the confusion. typedef unsigned char BYTE; http://msdn.microsoft.com/en-us/library/windows/desktop/aa383751(v=vs.85).aspx#byte

Comment #11 by dfj1esp02 — 2014-10-16T13:35:39Z

The principle of least surprise is utterly violated with this: byte is unsigned everywhere except for D. A symmetric name can be tiny/utiny (for tiny int). (In reply to Andrej Mitrovic from comment #2) > Although I agree with you I think it's way too late to fix this without > breaking tons of code. Another easy task for dfix, BTW.

Comment #12 by samjnaa — 2015-10-17T10:53:26Z

If the default signage of byte is going to be changed, then I support the request for tiny/utiny (a very nice choice) or some other <name>/u<name> pair. byte would then be aliased to utiny/u<name> and ubyte slowly deprecated and removed. I personally haven't had much problem with byte/ubyte, but I can see where it would be a problem for others.

Comment #13 by dmitry.olsh — 2018-05-16T14:53:53Z

Like it or not but changing `byte` to be unsigned and `sbyte` to be signed or some such is ton of _trivial_ breakage that gives us exactly 0 benefit. It may appease C# programmers, but I believe name of signed byte is the least of their problems. Java has signed byte as `byte` though so not w/o precedent.