Bug 6421 – Require initialization of static arrays with array literals not to allocate

Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2011-07-31T16:56:49Z
Last change time
2020-05-15T03:53:33Z
Keywords
performance
Assigned to
No Owner
Creator
bearophile_hugs
Depends on
2356

Comments

Comment #0 by bearophile_hugs — 2011-07-31T16:56:49Z
From a comment by Peter Alexander: > int[3] a = [1, 2, 3]; // in D, this allocates then copies > int a[3] = {1, 2, 3}; // in C++, this doesn't allocate > > Apparently, to avoid the allocation in D, you must do: > > static const int[3] staticA = [1, 2, 3]; // in data segment > int[3] a = staticA; // non-allocating copy > > These little 'behind your back' allocations are good examples of my previous two points. Memory allocations caused by this, inside an inner loop, have given me performance troubles. I suggest to add an optimization to the DMD front-end to avoid this problem. Some comments received: Don: > Yeah, it's not fundamental, and not even very complicated. The current > implementation was a quick hack to provide the functionality, that > hasn't been replaced with a proper implementation yet. All that's > required to fix it is a bit of code in e2ir.c. Peter Alexander: > Also, I think it > would be worth while adding it to the language definition so that it's > not merely an implementation detail. Timon Gehr: > I think it should be more than an implementation detail, as it can severely affect > performance. How do you specify this in the D language definition? What are the corner cases?
Comment #1 by bearophile_hugs — 2011-08-01T06:36:11Z
This bug is related to bug 2356, the difference is this enhancement request asks for a language definition change too.
Comment #2 by andrej.mitrovich — 2014-05-04T08:43:51Z
With recent changes we can now use this syntax for initialization of a variable: ----- void main() { float x = float(1.0); } ----- And in OpenGL, you can use this syntax for array initializers: ----- float a[5] = float[5](3.4, 4.2, 5.0, 5.2, 1.1); ----- So this got me thinking, the initializer looks very much like the new initializer in D that we introduced. With a parser fix we could implement this in D: ----- void main() { float[3] arr = float[3](1.0, 2.0, 3.0); } ----- Even though Issue 2356 is fixed the above might help in some other contexts, perhaps in array appends.
Comment #3 by bearophile_hugs — 2014-05-04T09:13:42Z
(In reply to Andrej Mitrovic from comment #2) > With a parser fix we could implement this in D: > float[3] arr = float[3](1.0, 2.0, 3.0); I also like this syntax (composed of two parts usable in different situations): float[$] arr = [1.0, 2.0, 3.0]s; Or: auto arr = [1.0f, 2.0f, 3.0f]s;
Comment #4 by rswhite4 — 2014-05-04T09:27:40Z
(In reply to bearophile_hugs from comment #3) > (In reply to Andrej Mitrovic from comment #2) > > > With a parser fix we could implement this in D: > > float[3] arr = float[3](1.0, 2.0, 3.0); > > I also like this syntax (composed of two parts usable in different > situations): > > float[$] arr = [1.0, 2.0, 3.0]s; > > Or: > > auto arr = [1.0f, 2.0f, 3.0f]s; I had PR's for both of them, but they was rejected, because no sufficient DIP exist. You could make one.
Comment #5 by andrej.mitrovich — 2014-05-04T10:10:19Z
(In reply to bearophile_hugs from comment #3) > (In reply to Andrej Mitrovic from comment #2) > > > With a parser fix we could implement this in D: > > float[3] arr = float[3](1.0, 2.0, 3.0); > > I also like this syntax (composed of two parts usable in different > situations): > > float[$] arr = [1.0, 2.0, 3.0]s; > > Or: > > auto arr = [1.0f, 2.0f, 3.0f]s; I don't like them, they're too much of a special case. Re-using existing syntax is better IMO.
Comment #6 by k.hara.pg — 2014-05-04T10:34:17Z
(In reply to bearophile_hugs from comment #0) > From a comment by Peter Alexander: > > > int[3] a = [1, 2, 3]; // in D, this allocates then copies > > int a[3] = {1, 2, 3}; // in C++, this doesn't allocate This is already fixed issue 2356. And in git-head, more than cases will be fixed. int[3] a = [1, 2, 3]; // not allocated a = [1, 2, 3]; // not allocated in git-head > Some comments received: [snip] > How do you specify this in the D language definition? What are the corner > cases? I think this is the most better definition about the issue. "If an array literal could be deduced as static array, and it won't escape from its context, it would be allocated on stack." For example: int[3] a = [1, 2, 3]; -> OK: The array initializer could be typed as int[3] from the variable type a = [1, 2, 3]; -> OK: The assignment rhs should have same type with the assigned lvalue. Therefore the array literal could be typed as int[3]. void foo(int[3] a); foo([1, 2, 3]); -> OK: The required argument type is int[3]. int[] a = [1, 2, 3]; -> Cannot be allocated on stack, because the memory can escape via the indirection 'a'.
Comment #7 by k.hara.pg — 2014-05-04T10:38:14Z
(In reply to Andrej Mitrovic from comment #5) > (In reply to bearophile_hugs from comment #3) > > (In reply to Andrej Mitrovic from comment #2) > > > > > With a parser fix we could implement this in D: > > > float[3] arr = float[3](1.0, 2.0, 3.0); > > > > I also like this syntax (composed of two parts usable in different > > situations): > > > > float[$] arr = [1.0, 2.0, 3.0]s; > > > > Or: > > > > auto arr = [1.0f, 2.0f, 3.0f]s; > > I don't like them, they're too much of a special case. Re-using existing > syntax is better IMO. I also think that "static array literal sytax" (eg. DIP34) is not good feature. But "length inference" on variable declaration is a useful syntax. float[$] arr = [1, 2, 3]; // typeof(arr) == float[3] auto[$] arr = [1.0f, 2.0f, 3.0f]; // dito
Comment #8 by andrej.mitrovich — 2014-05-04T11:06:45Z
(In reply to Kenji Hara from comment #7) > But "length inference" on variable declaration is a useful syntax. > > float[$] arr = [1, 2, 3]; // typeof(arr) == float[3] > auto[$] arr = [1.0f, 2.0f, 3.0f]; // ditto What do you think about my extension to the new type construction syntax?: float[3] arr = float[3]([1, 2, 3]); I'm thinking it could be a more generic solution (more composable in template/generic code) since you could do things like: ----- float[3] arr; arr = float[3]([1, 2, 3]); ----- ----- float[3] arr; arr = float[arr.length]([1, 2, 3]); ----- ----- float[3] arr; arr = typeof(arr)([1, 2, 3]); ----- ----- int[] arr; arr.length = 3; arr[] += int[3]([1, 2, 3]; arr[] += int[3]([1, 2, 3]; assert(arr == [2, 4, 6]); ----- ----- void foo(Arr)(ref Arr arr) if ( isStaticArray!Arr) { } void foo(Arr)(Arr arr) if (!isStaticArray!Arr) { } foo(int[2]([1, 2])); // explicitly pick overload ----- And things like that.
Comment #9 by rswhite4 — 2014-05-04T11:10:06Z
(In reply to Andrej Mitrovic from comment #8) > (In reply to Kenji Hara from comment #7) > > But "length inference" on variable declaration is a useful syntax. > > > > float[$] arr = [1, 2, 3]; // typeof(arr) == float[3] > > auto[$] arr = [1.0f, 2.0f, 3.0f]; // ditto > > What do you think about my extension to the new type construction syntax?: > > float[3] arr = float[3]([1, 2, 3]); > > I'm thinking it could be a more generic solution (more composable in > template/generic code) since you could do things like: > > ----- > float[3] arr; > arr = float[3]([1, 2, 3]); > ----- > > ----- > float[3] arr; > arr = float[arr.length]([1, 2, 3]); > ----- > > ----- > float[3] arr; > arr = typeof(arr)([1, 2, 3]); > ----- > > ----- > int[] arr; > arr.length = 3; > arr[] += int[3]([1, 2, 3]; > arr[] += int[3]([1, 2, 3]; > assert(arr == [2, 4, 6]); > ----- > > ----- > void foo(Arr)(ref Arr arr) if ( isStaticArray!Arr) { } > void foo(Arr)(Arr arr) if (!isStaticArray!Arr) { } > foo(int[2]([1, 2])); // explicitly pick overload > ----- > > And things like that. I would prefer float[3](1, 2, 3) instead of float[3]([1, 2, 3]). The latter has too many parentheses.
Comment #10 by andrej.mitrovich — 2014-05-04T11:14:26Z
(In reply to rswhite4 from comment #9) > I would prefer float[3](1, 2, 3) instead of float[3]([1, 2, 3]). The latter > has too many parentheses. Easier on the eyes, sure. But the latter is simpler to interpret with multidimensional static arrays: float[2][3] = float[2][3]([[1, 2], [3, 4], [5, 6]]); I'm not sure what this would look like with the former syntax.
Comment #11 by rswhite4 — 2014-05-04T11:16:13Z
(In reply to Andrej Mitrovic from comment #10) > (In reply to rswhite4 from comment #9) > > I would prefer float[3](1, 2, 3) instead of float[3]([1, 2, 3]). The latter > > has too many parentheses. > > Easier on the eyes, sure. But the latter is simpler to interpret with > multidimensional static arrays: > > float[2][3] = float[2][3]([[1, 2], [3, 4], [5, 6]]); > > I'm not sure what this would look like with the former syntax. float[2][3]([1, 2], [3, 4], [5, 6]); Three elements, each of them an array with two elements.
Comment #12 by bearophile_hugs — 2014-05-04T13:18:10Z
(In reply to rswhite4 from comment #9) > The latter has too many parentheses. But it's more uniform with the current D array syntax.
Comment #13 by rswhite4 — 2014-05-04T13:20:06Z
(In reply to bearophile_hugs from comment #12) > (In reply to rswhite4 from comment #9) > > > The latter has too many parentheses. > > But it's more uniform with the current D array syntax. It is ugly and redundant.
Comment #14 by bearophile_hugs — 2014-05-05T11:35:57Z
(In reply to Kenji Hara from comment #6) > I think this is the most better definition about the issue. > > "If an array literal could be deduced as static array, and it won't escape > from its context, it would be allocated on stack." This rule should become part of the D language, so all conformant D compilers should respect it. So the functions that contain such cases can become @nogc. (By they way "most better" is better written as "best".)
Comment #15 by bearophile_hugs — 2014-05-05T11:42:51Z
(In reply to Andrej Mitrovic from comment #5) > I don't like them, they're too much of a special case. Re-using existing > syntax is better IMO. The $ syntax can't be replaced by the float[3](...) syntax. For longer arrays counting the items is a bug-prone chore: auto a = ubyte[47]([9,2,6,4,3,3,4,2,3,6,6,4,1,9,1,5,8,0,9,3,2,5,4, 4,8,2,2,6,0,1,9,1,1,5,3,9,9,1,6,3,7,4,5,3,0,3,4]); Vs: ubyte[$] a = [9,2,6,4,3,3,4,2,3,6,6,4,1,9,1,5,8,0,9,3,2,5,4, 4,8,2,2,6,0,1,9,1,1,5,3,9,9,1,6,3,7,4,5,3,0,3,4]);
Comment #16 by bearophile_hugs — 2014-05-05T11:56:43Z
(In reply to Kenji Hara from comment #7) > I also think that "static array literal sytax" (eg. DIP34) is not good > feature. I still don't know what the best solution is. The $ syntax to infer the number of items seems good enough. Regarding the []s syntax, if you have a function template like: ForeachType!Items sum(Items)(ref Items sequence) { typeof(return) total = 0; foreach (x; sequence) total += x; return total; } If you call it like this it will allocate an array on the heap (it's the default behavour, I guess): immutable tot = sum([1, 2, 3]); If you use a fixed-size literal there is no need for heap allocation and you can use @nogc: immutable tot = sum([1, 2, 3]s); An advantage of the []s syntax is that it always allocates the data on the stack, so it's very easy for the @nogc to accept such literals in a function. You can use in a line of code like: auto t1 = tuple([1, 2]s, "values"); That defines a Tuple!(int[2], string). Currently to do it you must specify the type: auto t2 = Tuple!(int[2], string)([1, 2], "values"); This is using the syntax suggested elsewhere in this thread: auto t3 = tuple(int[2]([1, 2]), "values"); When you have array literals nested in other literals (or nested in other generic function calls), having the []s syntax is a clear way to tell the compiler what you want: auto aa1 = ["key": [1, 2]s]; Instead of: int[2][string] aa2 = ["key": [1, 2]]; If you have to pass such associative array literal to a function: foo(["key": [1, 2]s]); Currently you need to use a not nice and bug-prone cast: void foo(TK, TV)(TV[TK] aa) { pragma(msg, TK, " ", TV); } void main() { foo(["key": cast(int[2])[1, 2]]); }
Comment #17 by andrej.mitrovich — 2014-05-05T12:30:50Z
(In reply to bearophile_hugs from comment #15) > For longer arrays counting the items is a bug-prone chore. You have another bug report opened for exactly that. I'd like to have both options on the table though. Sometimes count inference is nice, other times you may want a diagnostic if you miss the count.
Comment #18 by pro.mathias.lang — 2020-05-15T03:53:33Z
``` void main () @nogc { int[3] a = [1, 2, 3]; a = [4, 5, 6]; } ``` This does not allocate anymore. The static array literal syntax would be nice, but Walter has vetoed it IIRC. Closing as fixed.