Bug 23906 – Unicode file names are not properly handled

Status
NEW
Severity
normal
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2023-05-08T15:49:17Z
Last change time
2024-12-13T19:28:45Z
Assigned to
No Owner
Creator
Richard (Rikki) Andrew Cattermole
Moved to GitHub: dmd#20275 →

Comments

Comment #0 by alphaglosined — 2023-05-08T15:49:17Z
Unicode file names for D source code are apparently not handled correctly. This was reported by a non-Latin user. https://forum.dlang.org/post/[email protected] Proposed steps to fix this: toWStringz should be converting from CP_UTF8 not CP_ACP (checked, this looks to be correct). https://github.com/dlang/dmd/blob/be151e6d854c0df8af7ee88b6f380b6283ea824f/compiler/src/dmd/common/string.d#L136 I will counterpropose this proposal in suggesting the conversion of CreateProcessA to instead be CreateProcessW with the help of toWStringZ. https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L892
Comment #1 by kinke — 2023-05-08T16:46:03Z
I've fixed this for LDC (AFAIK :D), by IIRC: * Switching the main() C entry point on Windows to wmain(), so that its gets the cmdline params (source files, import dirs...) in UTF16 encoding, *not* the current 8-bit code page (CP_ACP). `_d_wrun_main` in druntime then converts those to proper UTF8 strings for _Dmain(). See: https://github.com/dlang/dmd/blob/be151e6d854c0df8af7ee88b6f380b6283ea824f/compiler/src/dmd/mars.d#L872-L931 * Then redefining the `CodePage` enum in https://github.com/dlang/dmd/blob/b87b011e0c91596b9722187192416a5a6534b16f/compiler/src/dmd/root/filename.d#L46 from `CP_ACP` to `CP_UTF8`. That https://github.com/dlang/dmd/blob/be151e6d854c0df8af7ee88b6f380b6283ea824f/compiler/src/dmd/common/string.d#L140 is new to me (and missed by LDC! - thx for the link) - it should definitely use `dmd.root.filename.CodePage` instead (is currently a *private* enum). > suggesting the conversion of CreateProcessA to instead be CreateProcessW with the help of toWStringZ Yes, all child process invocations on Windows should use the wide API.
Comment #2 by kinke — 2023-05-08T16:57:03Z
[Oh, switching to wmain() shouldn't be required; _d_run_main in druntime ignores the narrow cmdline args anyway and properly converts the UTF16 ones to UTF8 for _Dmain.]
Comment #3 by robert.schadek — 2024-12-13T19:28:45Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/20275 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB