Bug 23903 – Demangling produces exponentially long output

Status
NEW
Severity
major
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
All
OS
All
Creation time
2023-05-07T23:35:13Z
Last change time
2024-12-07T13:42:43Z
Keywords
mangling
Assigned to
No Owner
Creator
Vladimir Panteleev
Moved to GitHub: dmd#17461 →

Comments

Comment #0 by dlang-bugzilla — 2023-05-07T23:35:13Z
Consider the following D program: ///////////////// test.d //////////////// auto fun(T)(T t = T.init) { struct S { T t; } return S(t); } import core.demangle; alias f1 = fun!int; pragma(msg, f1.mangleof.demangle.length); alias f2 = fun!(typeof(f1())); pragma(msg, f2.mangleof.demangle.length); alias f3 = fun!(typeof(f2())); pragma(msg, f3.mangleof.demangle.length); alias f4 = fun!(typeof(f3())); pragma(msg, f4.mangleof.demangle.length); alias f5 = fun!(typeof(f4())); pragma(msg, f5.mangleof.demangle.length); ///////////////////////////////////////// Note that although the amount of code grows linearly, the length of the demangled string grows exponentially. Some problems which results from this: 1. When the program crashes, it is very slow to print its stack trace (easily in the order of minutes). This makes debugging difficult. 2. Debuggers (at least, GDB) will run out of memory and crash when attempting to debug a program using such constructs. This also makes debugging difficult. Chaining IFTI functions and voldemort types, which I think is idiomatic D code, is one way for this problem to be manifested. One practical use case where I have constructed such a program by accident is a functor-based text formatting package (which encapsulates values to be formatted into self-contained objects and allows building simple expression trees of them without memory allocations). However, it is not difficult to run into it in other ways. The underlying problem is that the relationship between language constructs is a DAG; however, the demangled string version is essentially a walk of the DAG through every possible path, which is exponential in complexity. Here is a more explicit example which illustrates this: //////////////// test.d //////////////// struct T(alias A, alias B) {} struct T0 {} alias T1 = T!(T0, T0); alias T2 = T!(T1, T1); alias T3 = T!(T2, T2); alias T4 = T!(T3, T3); alias T5 = T!(T4, T4); alias T6 = T!(T5, T5); alias T7 = T!(T6, T6); alias T8 = T!(T7, T7); alias T9 = T!(T8, T8); pragma(msg, T9.mangleof.length); // 174 pragma(msg, T9.stringof.length); // 4090 //////////////////////////////////////// The mangled version of the symbols does not have this problem, and grows linearly; it seems to be using backreferences to avoid this problem there. Ignoring the DAG nature of the relationships of the mangled objects is a mis-design of the demangling algorithm, and is fundamentally incorrect! Not only does it cause issues such as the above, it makes debugging non-trivial template code difficult due to the verbosity of the output, and it allows the equivalent of ZIP-bombs (https://en.wikipedia.org/wiki/Zip_bomb). To fix this problem, the demangling algorithm (in Druntime and in debuggers) should not expand long back-references, emitting the DAG as it is. For example, instead of: pure nothrow @nogc @safe test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S).S test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S) it could write e.g.: pure nothrow @nogc @safe test.fun!(T1).fun(__1).S test.fun!(__1).fun(T1); __1=test.fun!(int).fun(int).S
Comment #1 by dlang-bugzilla — 2023-05-08T18:13:41Z
This also applies to C++, where it seems to be a known problem. Google's Abseil limits demangling recursion with a "ComplexityGuard": https://chromium.googlesource.com/external/github.com/abseil/abseil-cpp/+/HEAD/absl/debugging/internal/demangle.cc#187 Some discussion in a blog post about implementing a C++ demangler in Rust: https://fitzgeraldnick.com/2017/02/22/cpp-demangle.html#what-makes-demangling-c-symbols-difficult
Comment #2 by robert.schadek — 2024-12-07T13:42:43Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/17461 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB