Bug 16479 – Missing substitution while mangling C++ template parameter for functions

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
All
OS
Linux
Creation time
2016-09-09T10:12:28Z
Last change time
2020-08-04T04:06:50Z
Keywords
C++, mangling, pull
Assigned to
No Owner
Creator
Thomas Brix Larsen

Comments

Comment #0 by brix — 2016-09-09T10:12:28Z
testcase_cpp.cpp: class StructReader { public: template <typename T> T getDataField(unsigned int offset) const { return 0; } }; void initializeTemplates() { StructReader reader; reader.getDataField<signed char>(0); } testcase.d: extern(C++) class StructReader { public: byte getDataField(T)(uint offset) const; } void main() { new StructReader().getDataField!byte(0); } g++ -c testcase_cpp.cpp dmd testcase.d testcase_cpp.o testcase.o: In function `_Dmain': testcase.d:(.text._Dmain+0x20): undefined reference to `StructReader::getDataField<signed char>::getDataField(unsigned int) const' collect2: error: ld returned 1 exit status --- errorlevel 1 D mangles as: _ZNK12StructReader12getDataFieldIaE12getDataFieldEj Expected C++ symbol: _ZNK12StructReader12getDataFieldIaEET_j
Comment #1 by brix — 2016-09-09T10:17:08Z
gcc version 6.2.1 20160830 (GCC)
Comment #2 by brix — 2016-09-09T10:18:05Z
DMD64 D Compiler v2.071.1
Comment #3 by pro.mathias.lang — 2018-06-12T06:34:05Z
Edited the title to make it a bit clearer. I was hit by this today. DMD does not respect this part of the spec: ``` When function and member function template instantiations reference the template parameters in their parameter or result types, the template parameter number is encoded, with the sequence T_, T0_, ... ``` Source: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.template-param
Comment #4 by pro.mathias.lang — 2018-06-12T10:30:18Z
This is actually quite a non-trivial problem. Take the following code in C++: ``` #include <array> template<size_t S, class T> std::array<T, S>* getArray(const T* data) { auto ret = new std::array<T, S>; for (size_t idx = 0; idx < S; ++idx) (*ret)[idx] = data[idx]; return ret; } void unused () { getArray<5, bool>(nullptr); getArray<3, int>(nullptr); getArray<5, char>(nullptr); } ``` This gives the following symbols on OSX: ``` 0000000000000000 T __Z6unusedv 00000000000000c0 T __Z8getArrayILm3EiEPNSt3__15arrayIT0_XT_EEEPKS2_ 0000000000000040 T __Z8getArrayILm5EbEPNSt3__15arrayIT0_XT_EEEPKS2_ 0000000000000140 T __Z8getArrayILm5EcEPNSt3__15arrayIT0_XT_EEEPKS2_ U __Znwm ``` I mentioned OSX because on Linux, the inlined namespace `__1` might not be present, thus the symbol (and substitutions) will differ, but the bug is still there on Linux. The equivalent D code is as follow: ``` extern(C++, std) extern (C++, __1) { public struct array (T, /*size_t*/ cpp_ulong N) { private T[N > 0 ? N : 1] __elems_; } } extern (C++) array!(T, S)* getArray (cpp_ulong S, T) (const(T)* data); void main () { const d1 = [true, false, true, false, true]; const d2 = [42, 84, 1992]; const d3 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']; getArray!5(d1.ptr); getArray!3(d2.ptr); getArray!6(d3.ptr); // Not 7 on purpose } ``` This will produce the following symbols with DMD master: ``` nm types.o | grep getArray U __Z8getArrayILm3EiEPNSt3__15arrayIiLm3EEEPKi U __Z8getArrayILm5EbEPNSt3__15arrayIbLm5EEEPKb U __Z8getArrayILm6EcEPNSt3__15arrayIcLm6EEEPKc ``` There are 2 issues here: - We don't do template parameter substituion, so we end up with the string "[...]arrayI{i,b,c}Lm{3,5,6}E" to represent `array<{int,bool,char}, {3,4,6}>` instead of `arrayIT0_XT_E` (using substitution for `getArray`'s template parameter. - We don't do substitution for the function parameter. clang++ will use `S2_` as substitution and g++ `S1_` (because there's no inline namespace) instead of `{i,b,c}`. It's surprising because substitution does not normally happen for basic types, but I suppose template parameters are special. Note that this is non-trivial to solve because of the following case: ``` template <int A, int B> struct Bar { }; template <int A, int B> Bar<B,A> foo () { return Bar<B, A>{}; } void unused () { foo<1, 2>(); foo<1, 1>(); } ``` This generates the following symbols: ``` 0000000000000030 T __Z3fooILi1ELi1EE3BarIXT0_EXT_EEv 0000000000000020 T __Z3fooILi1ELi2EE3BarIXT0_EXT_EEv 0000000000000000 T __Z6unusedv ``` Which means we cannot solely rely on the value of the template parameters, we have to track which one is used where, but I don't think we have this information in the frontend at the moment...
Comment #5 by pro.mathias.lang — 2018-07-05T11:28:08Z
Comment #6 by github-bugzilla — 2018-08-02T21:42:41Z
Commit pushed to master at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/beb2a889124e53a6a6cc5218ffc596177b157086 Correct definition of foo15372 in cppa.d In the C++ file it is defined as `template<typename T> int foo15372(int)`, but in the D file the argument was the template type. As template arguments are substituted in mangling, this was not correct and only compiled thanks to bug 16479
Comment #7 by pro.mathias.lang — 2018-10-22T18:07:29Z
*** Issue 15970 has been marked as a duplicate of this issue. ***
Comment #8 by pro.mathias.lang — 2018-10-22T18:09:32Z
*** Issue 16944 has been marked as a duplicate of this issue. ***
Comment #9 by github-bugzilla — 2018-11-01T09:23:36Z
Commits pushed to master at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/b75c9f110795109aebe610cfbf8f814cd1d6afee Fix issue 16479: No namespace substitution for C++ mangling on POSIX The C++ ABI used by POSIX is the Itanium C++ ABI. Reference document available here: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling One important (and tricky) part of the ABI is the substitutions being done, in order to reduce the bloat introduced by long symbol names, a typical issue when using templates heavily (of which D was not exempt). There are 2 kinds of substitutions: component substitution and template parameter substitution. Component substitution replaces repeated parts of the symbol with `S[X]_`, template parameter substitution replaces occurences of template parameters with `T[X]_`. `X` represents a base36 index into the array of components or template parameters already encountered so far, respectively. This substitution is done on an identity basis, which means that the templated function `template<typename T> int foo()` instantiated with `int` will be mangled as `_Z3fooIiE*i*v` (asterisks are emphasis and not part of the mangling) while it would be mangled as `_Z3fooIiE*T_*v` if the definition was `template<typename T> T foo()`. Moreover, experience with C++ compilers shows that component substitution is prefered over template parameter substitution, such as `template<typename T> T foo(T)` is mangled as `_Z3fooIiET_*S0_*` when instantiated with `int` and not `_Z3fooIiET_*T_*` as would be the case if template substitution was prefered. This is just brushing the surface of the problem, since only template type parameters have been mentioned so far, but other kind (aliases, values) are also concerned. Substitution also needs to happen if a template parameter is part of another type, such as the `template<typename T> array<T>* foo (T, int)`, which, when instantiated with `int`, is mangled as `_Z3fooIiEP5arrayIT_ES1_i`. For more detailed test cases, see `test/compilable/cppmangle.d`. The main issue encountered while implementing this in DMD is that there's no easy way to know if a type (which is part of the function's type, e.g. parameters and return value) was a template parameter or not, as DMD merges types, so in the previously mentioned `template<typename T> int foo()` vs `template<typename T> T foo()` the template instantiation will come with the same exact two pointer to the singleton `int` type. Moreover, DMD does destructive semantic analysis, meaning that objects gets mutated, pointers get replaced, aliases get resolved, and information gets lost. After different approaches where taken, the most practical and reliable approach devised was to provide a `visit` overload for non-resolved AST type `TypeIdentifier` and `TypeInstance`, and compare the identifier to that of the template definition. Fallback to post-semantic type when it isn't found. Note that no attempt has been made whatsoever to handle the mess that would result from expressions themselves being mangled. The reference doc for the ABI mentions that "[...] this mangling is quite similar to the source token stream. (C++ Standard reference 14.5.5.1p5.)". Original quote: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#expressions (5.1.6 Expressions) https://github.com/dlang/dmd/commit/06d45325331a4c14a099da1d45fa4216c07ab2f6 Merge pull request #8455 from Geod24/cppmangle-fix-16479 Fix issue 16479: No namespace substitution for C++ mangling on POSIX merged-on-behalf-of: Iain Buclaw <[email protected]>
Comment #10 by sahmi.soulaimane — 2019-06-08T13:48:33Z
The substitution still fails with qualified types. Example: https://run.dlang.io/is/5BEekf --- extern(C++, N) { struct S(T) {} } extern(C++): S!T func(T)(); pragma(msg, func!int.mangleof); N.S!T funq(T)(); pragma(msg, funq!int.mangleof); --- output: _Z4funcIiEN1N1SIT_EEv _Z4funqIiEN1N1SIiEEv S!T works, but N.S!T doesn't.
Comment #11 by kinke — 2020-01-06T18:51:58Z
Possibly related: ``` extern (C++, std) struct pair(T1, T2) {} extern (C++) void func_20413(pair!(int, float), pair!(float, int)); ``` actual: _Z10func_20413St4pairIifEStS_IfiE expected: _Z10func_20413St4pairIifES_IfiE A C++ string namespace `extern (C++, "std") struct pair(T1, T2) {}` works as expected (testcase from https://issues.dlang.org/show_bug.cgi?id=20413).
Comment #12 by pro.mathias.lang — 2020-01-06T23:27:44Z
I really wish we could deprecate 'extern(C++, ident)' and just use 'extern(C++, "string ")' as supporting the two is an absolute mess, but Walter is against it: https://github.com/dlang/dmd/pull/10031
Comment #13 by pro.mathias.lang — 2020-08-04T04:06:50Z
Moved to https://issues.dlang.org/show_bug.cgi?id=21108 as the original issue has been mostly fixed and only a few test-cases remains, which can be worked around.