Bug 23179 – Unicode in symbol names in DLLs breaks MSVC linker

Status
RESOLVED
Resolution
WONTFIX
Severity
blocker
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2022-06-12T13:51:53Z
Last change time
2024-02-07T14:30:07Z
Keywords
pull
Assigned to
No Owner
Creator
Richard Cattermole
See also
https://issues.dlang.org/show_bug.cgi?id=19418

Attachments

IDFilenameSummaryContent-TypeSize
1854fix23179_patch_attempt.diffAttempted fix as patchtext/plain4937

Comments

Comment #0 by alphaglosined — 2022-06-12T13:51:53Z
The MSVC linker does not support Unicode characters in symbol names when creating import/export files. This has not been found before now due to other blockers associated with dll's. For executables, we do have a test (runnable/testmodule.d), but I had to disable it for Windows to fix https://issues.dlang.org/show_bug.cgi?id=23177
Comment #1 by kinke — 2022-06-12T16:13:09Z
To be clear, we're talking about linker directives (cmdline option strings) embedded in COFF object files. LDC uses UTF8 encoding for these (IIRC), and those do work with the LLD linker, but don't with the MS linker. So I *guess* the MS linker expects some other encoding.
Comment #2 by alphaglosined — 2022-06-12T16:40:00Z
After a bunch of hunting wrt. GetProcAddress, it seems Microsoft does not intend for exports to support anything other than ANSI. There are no A/W versions of this function which based upon consistency means that it only takes ANSI. Which gets us back to the fact that we will probably need to sanitize mangling to not include Unicode, at least on Windows.
Comment #3 by dlang-bot — 2022-06-12T17:35:51Z
@rikkimax created dlang/dmd pull request #14207 "[DO NOT MERGE] Fix Issue 23179 - Unicode in symbol names in DLLs breaks MSVC linker" fixing this issue: - Fix Issue 23179 - Unicode in symbol names in DLLs breaks MSVC linker https://github.com/dlang/dmd/pull/14207
Comment #4 by alphaglosined — 2022-06-13T19:55:06Z
Created attachment 1854 Attempted fix as patch After talking with kinke, we have decided to wait for this to appear in the wild before fixing. I've attached my proposed fix as a patch, in case something happens to my fork with the branch containing it. If you experience this please do reply!
Comment #5 by bugzilla — 2023-01-26T07:53:34Z
There are other limitations on names we accept on Windows, such as the file names being insensitive to case. This has tripped up a handful of people, but people do accept it for what it is. It's not an onerous limitation. If the Microsoft linker fails at Unicode characters, so be it. Turning them into hex makes the mangled names even uglier and longer. Demangling them also becomes another problem. I suggest to just let Microsoft worry about this issue. They'll probably eventually fix their linker anyway. It's not worth us fixing it, then unfixing it when MS updates their linker. So WONTFIX.
Comment #6 by alphaglosined — 2023-01-26T09:26:42Z
They won't eventually fix this. It permeates the kernel and WinAPI as well. It is an intentional limitation that occasionally becomes an issue on other platforms as well. Other languages like Rust use Punycode for encoding Unicode. I picked hex for my implementation because it's easy to encode and also decode. So making this WONTFIX not only prevents statically binding against c/c++ code but it also leaves people who have Unicode names in symbols with no option to compile their existing codebases as DLLs.
Comment #7 by alphaglosined — 2023-01-28T14:39:24Z
Okay I may end up eating my words on this one. I can't reproduce on VS 2022. But what I can get on dmd&ldc rather than VC is: ``` Creating library test.lib and object test.exp test.exp : error LNK2001: unresolved external symbol _µ Hint on symbols that are defined and could potentially match: _µ test.exe : fatal error LNK1120: 1 unresolved externals ``` So something isn't right, will need to review this at some other point in time and file a different bug report if I can figure out what is going on there.