← Back to index | Original Bugzilla link

Bug 14912 – Move initialisation of GC'd struct and class data from the callee to the caller

Status: NEW
Severity: enhancement
Priority: P4
Component: dmd
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2015-08-12T21:53:35Z
Last change time: 2024-12-13T18:44:09Z
Keywords: performance
Assigned to: No Owner
Creator: Iain Buclaw
See also: https://issues.dlang.org/show_bug.cgi?id=24368

Moved to GitHub: dmd#19026 →

Comments

Comment #0 by ibuclaw — 2015-08-12T21:53:35Z

Currently, druntime will initialise all GC'd data in the caller. Examples: _d_newclass(): p[0 .. ci.init.length] = ci.init[]; _d_newitemT(): memset(p, 0, _ti.tsize); _d_newitemiT(): memcpy(p, init.ptr, init.length); In each example, results in a system call. And because the implementation is always hidden away, the optimizer (or an optimizing backend) cannot assume anything about the contents of the pointer returned in these calls. For instance, in very simple case: class A { int foo () { return 42; } } int test() { A a = new A(), b = a; return b.foo(); } If the contents of 'a' set by the caller in the compiler, we would have the following codegen (pseudo-code): int test() { struct A *a; struct A *b; a = new A(); *a = A.init; b = a; return b.__vptr.foo(b); } From that, an optimizer can break down and inline the default initializer without the need for memset/memcpy: // ... a = new A(); a.__vptr = &typeid(A).vtbl a.__monitor = null; // ... Perform constant propagation to replace all occurrences of b with a: // ... return *(a.__vptr + 40)(a); // ... Global value numbering to resolve the lookup in the vtable, and de-virtualize the call: // ... return A.foo(a); // ... After some dead code removal, the inliner now sees the direct call and is ready to inline A.foo: int test() { struct A *a = new A(); a.__vptr = typeid(A).vtbl.ptr a.__monitor = null; return 42; } There is another challenge here to remove the dead GC allocation (that will have to wait for another bug report). But I think that this simple change is justified by the opportunity to produce much better resulting code when using classes in at least simple ways - haven't even considered possibilities when considering LTO. If there's no objections, I suggest that we should make a push for this. It will require dmd to update its own NewExp::toElem, and to remove the memcpy/memset parts from druntime.

Comment #1 by kinke — 2015-08-12T23:11:45Z

Great find. I didn't like it either, but didn't realize the actual implementation isn't available for the optimizer!

Comment #2 by ibuclaw — 2015-08-13T05:06:36Z

I don't think this would be particularly difficult to change in dmd either as this kind of callee initializing already done for scoped classes and classes that have their own new(size_t) allocator.

Comment #3 by schveiguy — 2015-08-13T14:54:12Z

Wouldn't it be enough to simply change the call to the opaque function _d_newitemT(TypeInfo ti) to a template _d_newitem!(T)() ? I don't want to put more special code in the compiler if possible.

Comment #4 by rsw0x — 2015-08-13T17:59:41Z

@Steven Schveighoffer Unrelated, but this would help a lot in making a precise GC which is a reason to prefer going that route.

Comment #5 by ibuclaw — 2015-08-13T21:02:47Z

(In reply to Steven Schveighoffer from comment #3) > Wouldn't it be enough to simply change the call to the opaque function > _d_newitemT(TypeInfo ti) to a template _d_newitem!(T)() ? > > I don't want to put more special code in the compiler if possible. Not really, because any potential optimization would stop at the memcpy, and not go any further. I don't see what the complaint is? The compiler has a much better idea of what is going on when it comes to initializing structures efficiently vs. a memcpy which is non-inlineable, and almost always falls into the slow, unaligned code path.

Comment #6 by schveiguy — 2015-08-14T12:44:57Z

Why would a template need memcpy? T *_d_newitem(T)() { // could eliminate typeid here T *result = cast(T *)GC.malloc(T.sizeof, typeid(T).flags); *result = T.init; return result; }

Comment #7 by ibuclaw — 2015-08-14T13:10:50Z

(In reply to Steven Schveighoffer from comment #6) > Why would a template need memcpy? > > T *_d_newitem(T)() > { > // could eliminate typeid here > T *result = cast(T *)GC.malloc(T.sizeof, typeid(T).flags); > *result = T.init; > return result; > } That's fine for anything except classes...

Comment #8 by schveiguy — 2015-08-14T13:18:45Z

(In reply to Iain Buclaw from comment #7) > That's fine for anything except classes... Sure, but with compiler visibility, and ability to inline (and ability to alter the implementation based on the type), we can do whatever makes the optimizer happy. All we need from the compiler is the components that make up the ci.init.

Comment #9 by ibuclaw — 2015-08-14T17:43:17Z

(In reply to Steven Schveighoffer from comment #8) > (In reply to Iain Buclaw from comment #7) > > That's fine for anything except classes... > > Sure, but with compiler visibility, and ability to inline (and ability to > alter the implementation based on the type), we can do whatever makes the > optimizer happy. > > All we need from the compiler is the components that make up the ci.init. Still need the ability to *deref a class to assign in bulk to its underlying structure in the language. Currently only the compiler/codegen is allowed to do that.

Comment #10 by ibuclaw — 2015-08-14T17:45:48Z

(In reply to Iain Buclaw from comment #9) > (In reply to Steven Schveighoffer from comment #8) > > (In reply to Iain Buclaw from comment #7) > > > That's fine for anything except classes... > > > > Sure, but with compiler visibility, and ability to inline (and ability to > > alter the implementation based on the type), we can do whatever makes the > > optimizer happy. > > > > All we need from the compiler is the components that make up the ci.init. > > Still need the ability to *deref a class to assign in bulk to its underlying > structure in the language. Currently only the compiler/codegen is allowed > to do that. And I read "from the compiler is the components that make up the ci.init" as a proposal for adding yet another property.

Comment #11 by schveiguy — 2015-08-14T18:10:57Z

(In reply to Iain Buclaw from comment #9) > > Still need the ability to *deref a class to assign in bulk to its underlying > structure in the language. Currently only the compiler/codegen is allowed > to do that. We can do tupleof right now, but it just doesn't include the vtable and monitor. That is easily figured out via casting. (In reply to Iain Buclaw from comment #10) > And I read "from the compiler is the components that make up the ci.init" as > a proposal for adding yet another property. At the moment, we already HAVE that via some magic compiler knowledge of the TypeInfo_Class object (where it stashes the ci.init). If we remove the compiler knowledge of this, and expose the c.init via a __traits for instance, then we have all we need to build the runtime in a way that is expandable, customizable, and optimizable. I really think the future includes a completely druntime-generated druntime, based on compile-time introspection. It may be this is too far off in the distance, and perhaps we could put more magic in the compiler now to save a bit of performance, but I'd rather wait until we do it right, and not add any *more* magic there. Just my opinion.

Comment #12 by ibuclaw — 2015-08-14T21:32:24Z

(In reply to Steven Schveighoffer from comment #11) > (In reply to Iain Buclaw from comment #9) > > > > Still need the ability to *deref a class to assign in bulk to its underlying > > structure in the language. Currently only the compiler/codegen is allowed > > to do that. > > We can do tupleof right now, but it just doesn't include the vtable and > monitor. That is easily figured out via casting. > > (In reply to Iain Buclaw from comment #10) > > And I read "from the compiler is the components that make up the ci.init" as > > a proposal for adding yet another property. > > At the moment, we already HAVE that via some magic compiler knowledge of the > TypeInfo_Class object (where it stashes the ci.init). If we remove the > compiler knowledge of this, and expose the c.init via a __traits for > instance, then we have all we need to build the runtime in a way that is > expandable, customizable, and optimizable. > > I really think the future includes a completely druntime-generated druntime, > based on compile-time introspection. > > It may be this is too far off in the distance, and perhaps we could put more > magic in the compiler now to save a bit of performance, but I'd rather wait > until we do it right, and not add any *more* magic there. Just my opinion. There's nothing really to add because the compiler already does this - see initialization of classes that supply their own new operator, for instance. And I wouldn't call it magic, infact I'd say that it's always expected that the compiler should initialize at least the vtable (C++ for instance), rather than inventing some trait that allows the user a gateway to initialize hidden immutable data with anything they wish.

Comment #13 by robert.schadek — 2024-12-13T18:44:09Z

THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/19026 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB