Bug 3843 – Signed lengths (and other built-in values)

Comment #0 by bearophile_hugs — 2010-02-23T16:58:38Z

This is Java code coming from Wikipedia: http://en.wikipedia.org/wiki/Selection_Sort#Code void selectionSort(int[] a) { for (int i = 0; i < a.length - 1; i++) { int min = i; for (int j = i + 1; j < a.length; j++) { if (a[j] < a[min]) { min = j; } } if (i != min) { int swap = a[i]; a[i] = a[min]; a[min] = swap; } } } You can translate it in correct Python in a short time (I have added a postcondition too, plus doctests): def selection_sort(arr): """ >>> items = [] >>> selection_sort(items) >>> items [] >>> items = [5, 3, 1] >>> selection_sort(items) >>> items [1, 3, 5] """ for i in xrange(0, len(arr)-1): min = i for j in xrange(i + 1, len(arr)): if arr[j] < arr[min]: min = j if i != min: arr[i], arr[min] = arr[min], arr[i] if __debug__: # postcondition that it's not increasing for i, el in enumerate(arr[:-1]): assert el <= arr[i + 1], "'arr' not correctly sorted." if __name__ == "__main__": import doctest doctest.testmod() print "Doctests done.\n" This is a possible translation in D2 with the postcondition: import std.stdio: writeln; void selectionSort(T)(T[] arr) out { // assert that it's not increasing foreach (i, el; arr[0 .. $-1]) assert (el <= arr[i + 1], "selectionSort: 'arr' not correctly sorted."); } body { for (int i = 0; i < arr.length - 1; i++) { T min = i; for (int j = i + 1; j < arr.length; j++) if (arr[j] < arr[min]) min = j; if (i != min) { T aux = arr[i]; arr[i] = arr[min]; arr[min] = aux; } } } void main() { int[] items; writeln(items); selectionSort(items); writeln(items); } But unlike the Python and Java code, that D2 code contains two bugs (or more), that Python, Java and C# programmers may need some time to spot. The first bug is in this line: for (int i = 0; i < arr.length - 1; i++) { This bug is caused by the combination of the following factors: - arr.length is unsigned, so 0-1 is not -1, it's not less than 0. - Currently with DMD there's no way to ask the code to perform integral overflow tests at runtime, so it can't catch it and give a nice error message at runtime. - The compiler isn't even telling me that i < arr.length performs a comparison between a signed and unsigned number, that can hide a bug (I think GCC can raise a warning there). I can show other examples of D code that contain similar bugs caused by mixing signed and unsigned values. Unsigned integers are useful when: - You need a bit array, for example to implement a bloom filter, a bitvector, a bit set, when you want to do SWAR, when you need bit arrays to deal with hardware, etc. - When you really need the full range of numbers 0 .. 2^n, this happens but it's uncommon for the numbers of 32 or more bits (it's more common for numbers less than 32 bits long). In most other situations using unsigned numbers is unsafe (because D can silently cast signed values to unsigned ones, and because it doesn't perform run time overflow tests) and it's better to use signed values. So for example array indices and lengths are (I think) better to be signed (ptrdiff_t), as almost everything else. If you mix signed and unsigned arithmetic to index an array or to measure its length you will often introduce bugs in the code. Note 1: C# has unsigned numbers, but I think its array lengths are signed. Note 2: on 64 bit operating systems arrays can be longer than 2^31, and even now an int is not able to store the full range of a 32 bit ptrdiff_t. So code like: int len = arr.length; Is unsafe now and will be even more unsafe on 64 bit systems where arrays can and will be longer. It's better to invent ways to avoid such kind bugs as much as possible. Note 3: The second bug in that D2 selectionSort is in this line: foreach (i, el; arr[0 .. $-1]) If arr is empty, that produces an out of bound error in nonrelease mode (a runtime error is better than nothing). In Python a slice of an empty array is empty (this is not a design overlook or mistake, and it's something Python designers wanted).

Comment #1 by andrei — 2011-06-06T06:27:39Z

At this point it's virtually impossible to turn lengths to signed types.

Comment #2 by bearophile_hugs — 2011-06-06T10:17:33Z

(In reply to comment #1) > At this point it's virtually impossible to turn lengths to signed types. I understand. A note: in Bugzilla I have some issues open regarding little D improvements. If they don't happen in something like another year, it will probably be quite harder to perform those changes, even if they are appreciated. I am open for any question about those issues.

Bug 3843 – Signed lengths (and other built-in values)

Comments