Bug 3849 – Compiler should catch incomplete initialisation of an array

Status
NEW
Severity
enhancement
Priority
P4
Component
dmd
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2010-02-24T01:56:19Z
Last change time
2024-12-13T17:51:23Z
Keywords
diagnostic, spec
Assigned to
No Owner
Creator
bearophile_hugs
See also
https://issues.dlang.org/show_bug.cgi?id=5290
Moved to GitHub: dmd#18152 →

Comments

Comment #0 by bearophile_hugs — 2010-02-24T01:56:19Z
This small program compiles, but I'd like the compiler to raise a compile error, because I think this is often a bug: string[4] arr = ["foo", "bar"]; void main() {} ------------ A related enhancement: when I want to define a fixed-sized array with a literal and the number of its items is high, I may not want to count them. In this situation the following syntax can be adopted: int[$] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
Comment #1 by bearophile_hugs — 2010-02-24T02:10:50Z
This is a similar bug, but the causes seem different (I don't know if in this case I have to file a new bug report). This program: void main() { struct S { int x; } S[2] a = [{1}, {2}, {3}]; } Seems to crash DMD with this error message: Assertion failure: 'j < edim' on line 444 in file 'init.c'
Comment #2 by clugdbug — 2010-03-16T01:09:43Z
I've moved the ICE in the comment to bug 3974. The original issue is an enhancement.
Comment #3 by bearophile_hugs — 2010-03-17T12:44:40Z
See related bug 3948 too.
Comment #4 by bearophile_hugs — 2010-04-27T10:09:45Z
Walter doesn't want to add the int[$] arr = [...]; syntax: > D is full of syntax, at some point adding more and more syntax to deal > with more and more obscure cases is not a net improvement. > There's a point of diminishing returns. I still think that when a static array literal is given, the compiler has to enforce the length of an array literal to be the same as the specified length. In the uncommon situations where a partial array specification is necessary, the programmer can just add leading empty items.
Comment #5 by bearophile_hugs — 2010-04-27T13:00:32Z
Once the length test is in place, to avoid adding the trailing empty items a very simple ... trailing syntax can be introduced (partially from a suggestion by Michel Fortin): immutable ubyte _ctype[256] = [ _CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL, _CTL,_CTL|_SPC,_CTL|_SPC,_CTL|_SPC,_CTL|_SPC,_CTL|_SPC,_CTL,_CTL, _CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL, _CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL, _SPC|_BLK,_PNC,_PNC,_PNC,_PNC,_PNC,_PNC,_PNC, _PNC,_PNC,_PNC,_PNC,_PNC,_PNC,_PNC,_PNC, _DIG|_HEX,_DIG|_HEX,_DIG|_HEX,_DIG|_HEX,_DIG|_HEX, _DIG|_HEX,_DIG|_HEX,_DIG|_HEX,_DIG|_HEX,_DIG|_HEX, _PNC,_PNC,_PNC,_PNC,_PNC,_PNC, _PNC,_UC|_HEX,_UC|_HEX,_UC|_HEX,_UC|_HEX,_UC|_HEX,_UC|_HEX,_UC, _UC,_UC,_UC,_UC,_UC,_UC,_UC,_UC, _UC,_UC,_UC,_UC,_UC,_UC,_UC,_UC, _UC,_UC,_UC,_PNC,_PNC,_PNC,_PNC,_PNC, _PNC,_LC|_HEX,_LC|_HEX,_LC|_HEX,_LC|_HEX,_LC|_HEX,_LC|_HEX,_LC, _LC,_LC,_LC,_LC,_LC,_LC,_LC,_LC, _LC,_LC,_LC,_LC,_LC,_LC,_LC,_LC, _LC,_LC,_LC,_PNC,_PNC,_PNC,_PNC,_CTL, ... ]; This is first of all explicit, and it doesn't clash with C or C99 syntax, it's easy to understand, short, easy to write, compatible with other D syntax.
Comment #6 by smjg — 2010-05-22T10:12:40Z
It isn't an array literal, it's a static initializer. They look the same, but are distinct entities with distinct rules. See bug 181 and bug 508. This is really a request to change from the fix that was actually applied to the more sensible one.
Comment #7 by bearophile_hugs — 2010-08-01T15:29:25Z
See a consequence of this in bug 4565
Comment #8 by bearophile_hugs — 2010-10-28T12:20:43Z
(In reply to comment #6) > It isn't an array literal, it's a static initializer. They look the same, but > are distinct entities with distinct rules. General design rule: if you want to minimize traps and bugs, then to represent a different entity you need a different syntax. Currently this program compiles: int[4] a = [1, 2, 3]; void main() {} While this generates: object.Exception: lengths don't match for array copy void main() { int[4] a = [1, 2, 3]; } This looks like a corner case that's better to remove from D. In this bug report there are syntaxes that restore the needed flexibility.
Comment #9 by bearophile_hugs — 2010-11-26T12:25:34Z
See also bug 481
Comment #10 by gide — 2010-11-29T07:39:26Z
Real example where the [$] syntax would have been useful. http://www.dsource.org/projects/phobos/changeset/2204
Comment #11 by bearophile_hugs — 2011-09-10T02:35:35Z
See also: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=144237 From that post: > The solution is to add some symbol that explicitly marks the array as not > complete, so both the compiler and > the person that later reads the code > knows some items are missing. > > If no item is missing the compiler probably has to generate an error again: > > int[2] arr = [1, 2, ...]; // compile-time error > > I think that syntax is explicit and readable enough. A problem with this idea is > this syntax is probably not > used often. On the other hand leaving that trap in > the D language is not good at all. > > The idea of the dollar symbol can't be used with the ellipsis symbol: > > int[$] arr = [1, 2, ...]; // compile-time error again > > Note: for me this syntax with $ is more commonly useful compared to the "..." syntax.
Comment #12 by smjg — 2011-09-10T18:31:01Z
(In reply to comment #11) > See also: > http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=144237 > > From that post: > >> The solution is to add some symbol that explicitly marks the array >> as not complete, so both the compiler and the person that later >> reads the code knows some items are missing. >> >> If no item is missing the compiler probably has to generate an >> error again: >> >> int[2] arr = [1, 2, ...]; // compile-time error I'm not sure about this. I can imagine someone wanting it to work when the length is a template parameter, in order to initialise only the first n members where n is fixed. And should we allow a value to precede the ..., like int[100] arr = [1, 2, 42...]; (all elements beyond the first two initialised to 42)?
Comment #13 by bearophile_hugs — 2011-09-11T04:11:11Z
(In reply to comment #12) > >> If no item is missing the compiler probably has to generate an > >> error again: > >> > >> int[2] arr = [1, 2, ...]; // compile-time error > > I'm not sure about this. I can imagine someone wanting it to work when the > length is a template parameter, in order to initialise only the first n members > where n is fixed. I see. > And should we allow a value to precede the ..., like > > int[100] arr = [1, 2, 42...]; > > (all elements beyond the first two initialised to 42)? D allows floating point literals without decimal digits: float[6] arr = [1., 2., 42....]; This is too much ugly, so I think it's much better to require a comma before the ellipsis.
Comment #14 by smjg — 2011-09-11T07:50:51Z
(In reply to comment #13) > float[6] arr = [1., 2., 42....]; > > This is too much ugly, You don't have to use it then. You could use float[6] arr = [1., 2., 42. ...]; or float[6] arr = [1., 2., 42...]; or float[6] arr = [1., 2., 42.0...]; instead. > so I think it's much better to require a comma before the ellipsis. I made out your intention to be that, with the comma, the remainder of elements would be initialised to the type's .init. A ... following a value without a comma would, OTOH, initialise all remaining elements to the specified value.
Comment #15 by smjg — 2011-09-11T09:45:07Z
This isn't accepts-invalid, because the current spec allows incomplete initialisation of arrays. Rather, it's a request to stop accepting these, which later became a request to make the means of partially initialising an array explicit.
Comment #16 by bearophile_hugs — 2011-09-13T03:15:30Z
(In reply to comment #14) > You don't have to use it then. You could use > float[6] arr = [1., 2., 42. ...]; > or > float[6] arr = [1., 2., 42...]; > or > float[6] arr = [1., 2., 42.0...]; > > instead. Right, but currently D doesn't require such syntaxes to write floating point values, so people are free to write the bad syntax, or you have to add one or more special cases to D. > with the comma, the remainder of elements > would be initialised to the type's .init. A ... following a value without a > comma would, OTOH, initialise all remaining elements to the specified value. An engineer usually prefers KISS designs, this also means that language features serve for only one purpose. The sub-feature you propose is cute, but I think seen from the eyes of an engineer it risks reducing the value of the whole ellipsis feature :-|
Comment #17 by smjg — 2011-09-13T03:38:12Z
(In reply to comment #16) > An engineer usually prefers KISS designs, this also means that language > features serve for only one purpose. > The sub-feature you propose is cute, but I think seen from the eyes of an > engineer it risks reducing the value of the whole ellipsis feature :-| For what more valuable purpose do you wish to save the syntax I proposed? :)
Comment #18 by bearophile_hugs — 2011-09-13T04:28:51Z
(In reply to comment #17) I have suggested to introduce the "..." syntax for arrays just because Walter thinks global arrays are often initialized partially. Some evidence shows this is a really uncommon need, so maybe it doesn't deserve a special syntax and it doesn't deserve to leave a trap in D that's a confirmed (by Don too) source of bugs. Don also has suggested a library solution that maybe makes "..." useless or less needed. > For what more valuable purpose do you wish to save the syntax I proposed? :) The first and main purpose of the "..." syntax is to denote a global/static fixed-sized array that is underspecified (and all items not specified default to T.init). You propose to add a secondary purpose to the "..." syntax, that allows to specify what's the value of all the not specified items, to ask for a value different from T.init. I have seen not even one use case for this sub-feature, this is bad for this idea. I my note about engineers I have tried to explain that engineers often have aversion of designs that conflate two different purposes into a single "user interface", especially if one of the purposes isn't a confirmed need and if it creates problems when it's used alongside other features (floating point numbers without leading digits). So I fear that the appreciation of Walter of this idea is _decreased_ by the idea of adding this sub-feature.
Comment #19 by bearophile_hugs — 2013-03-27T06:28:32Z
The fix for Issue 9712 offers a workaround for this D design mistake: T[n] fixed(T, size_t n)(T[n] a) { return a; } string[3] colors1 = ["red" "green", "blue"]; // Wrongly accepted. string[3] colors2 = ["red" "green", "blue"].fixed; // Refused. void main() {}
Comment #20 by bearophile_hugs — 2013-05-28T04:45:25Z
A small example why enforcing array lengths match improves safety of D programs. This part of a program uses strings to define a binary decision table, but it's easy to make mistakes in the strings: struct DataPair { string message, truth; } immutable DataPair[] solutions = [ {"Check the power cable", "..#....."}, {"Check the printer-computer cable", "#.#....."}, {"Ensure printer software is installed", "#.#.#.#."}, {"Check/replace ink", "##..##.."}, {"Check for paper jam", ".#.#...."}]; An improvement is to use fixed-sized arrays so the compiler catches some bugs at compile-time: struct DataPair(uint N) { string message; immutable(char)[N] truth; } immutable DataPair!8[] solutions = [ {"Check the power cable", "..#....."}, {"Check the printer-computer cable", "#.#....."}, {"Ensure printer software is installed", "#.#.#.#."}, {"Check/replace ink", "##..##.."}, {"Check for paper jam", ".#.#...."}]; But currently the compiler only gives an error if you add one more char: {"Check the power cable", "..#......"}, And not if you miss one: {"Check the power cable", "..#...."}, And this is not a nice solution also because most D programmers don's write code like this: immutable DataPair!8[] solutions = [ {"Check the power cable", "..#.....".fixed}, {"Check the printer-computer cable", "#.#.....".fixed}, {"Ensure printer software is installed", "#.#.#.#.".fixed}, {"Check/replace ink", "##..##..".fixed}, {"Check for paper jam", ".#.#....".fixed}];
Comment #21 by samjnaa — 2013-05-31T10:48:29Z
(In reply to comment #16) > (In reply to comment #14) > > with the comma, the remainder of elements > > would be initialised to the type's .init. A ... following a value without a > > comma would, OTOH, initialise all remaining elements to the specified value. > > An engineer usually prefers KISS designs, this also means that language > features serve for only one purpose. > The sub-feature you propose is cute, but I think seen from the eyes of an > engineer it risks reducing the value of the whole ellipsis feature :-| I support the ... syntax to indicate an incomplete array specification for a fixed-size array. Of course, the T[$]= syntax prescribed by bug 481 should not be used with this syntax since they conflict. 1) IMHO absence of a comma between two items inside an array literal should be treated as an error. 2) However, at the end of the specified elements of an array literal, a comma may or may not be present before the ... and it should NOT make any difference -- all the remaining objects should be initialized to T.init. Making a semantic difference on the small distinction between 3,... and 3... would be a bad decision IMHO.
Comment #22 by robert.schadek — 2024-12-13T17:51:23Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/18152 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB