This is wrong on Windows. One *can* set console codepage to utf8 and font to Lucida Console, though this is unusual configuration and console programs can't work out of the box. This leaves std.stdio useless. As far as I know, this applies also to Phobos1. If this is not going to be fixed, it should be documented.
>But how many DOS or Windows console apps in the real world output UTF-8?
>Presumably not many, considering that no versions of DOS and only a few
>versions of Windows support it. There's also a causal loop in that even
>modern Windows versions don't come with the console code page set to 65001
>by default. I don't know what is likely to break this loop, but I doubt
>that the restrictiveness of one language's standard library is going to do
>it.
There is PoshConsole http://poshconsole.codeplex.com/
It's all .net and WPF, therefore UTF-16, but it's way different architecture and interface.
BTW cmd has /u switch for (redirected) unicode output, I use it sometimes.
Comment #3 by smjg — 2010-07-28T10:42:40Z
*** Issue 4522 has been marked as a duplicate of this issue. ***
Comment #4 by andrei — 2010-09-26T14:24:54Z
Any fresh ideas on how to fix this?
Comment #5 by smjg — 2010-09-26T16:01:10Z
I suppose the way to go about it is to create wrapper stream classes that provide encoding conversion. And have ready-made instances for stdin/out/err, with the codepage detected at launch.
The difficulty I can see is seekability, but this probably isn't needed given that it'll be primarily for stdio (which are inherently not seekable) and text files (for which seeking isn't particularly useful).
This can be a good test for dchar[]-looking ranges.
Comment #8 by dfj1esp02 — 2010-09-29T11:39:42Z
Looking at std.stdio, an easy fix will be to make sure all IO goes through File.write, which calls LockingTextWriter.put, which actually tries to do the correct transcoding. You just need to have target codepage in File, and use it in LockingTextWriter.put. The first thing is to statically import core.stdc.stdio to minimize and control its usage.
Though a nice design will be correctly implemented .net-way Streams/TextStreams, whatever you want them to work like in D.
Comment #9 by andrej.mitrovich — 2011-05-24T20:01:07Z
According to this page http://codesnippets.joyent.com/posts/show/414
you can get and set the codepage via the [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage] key's OEMCP value.
Setting the codepage requires a restart though. Also, changing the codepage has other effects, e.g. using ALT+Numpad keys is handled differently (with codepage 1252 you don't have to prepend a zero when using ALT+Numkey apparently).
Here's how to fetch the value:
import std.stdio;
import std.windows.registry;
void main()
{
Key HKLM = Registry.localMachine;
Key SFW = HKLM.getKey(r"SYSTEM\CurrentControlSet\Control\Nls\CodePage");
auto codePage = SFW.getValue("OEMCP").value_SZ();
writeln(codePage);
}
Note that the key type is REG_SZ, a string, not a binary value. So if you want to set the code page programmatically, you have to call:
SFW.setValue("OEMCP", "1252");
One more thing, there was this comment:
"Change the code page in your registry and you may not be able to reboot your windows anymore."
That sounds kind of scary. Perhaps all of this should be left to the user to do and just document it somewhere in the docs.
Comment #10 by smjg — 2011-05-25T04:59:10Z
(In reply to comment #9)
> According to this page http://codesnippets.joyent.com/posts/show/414
> you can get and set the codepage via the
> [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage] key's OEMCP
> value.
>
> Setting the codepage requires a restart though.
Not if you do it using chcp on the command line, or (presumably) SetConsoleCP in the Windows API.
> Also, changing the codepage has other effects, e.g. using ALT+Numpad
> keys is handled differently (with codepage 1252 you don't have to
> prepend a zero when using ALT+Numkey apparently).
<snip>
I don't have to prepend a zero anyway. It just produces a different character if I do. Traditionally at least, with a 0 it types a character from the ANSI set, and without a 0 it types a character from the OEM set.
But as I test it (Win7), it depends on what font the command prompt is set to.
----- Lucida Console or Consolas -----
C:\Users\StewartGordon>chcp 850
Active code page: 850
C:\Users\StewartGordon>£úœ£
'£úœ£' is not recognized as an internal or external command,
operable program or batch file.
C:\Users\StewartGordon>chcp 1252
Active code page: 1252
C:\Users\StewartGordon>£úœ£
----- Raster Fonts -----
C:\Users\StewartGordon>chcp 850
Active code page: 850
C:\Users\StewartGordon>£úo£
'£úo£' is not recognized as an internal or external command,
operable program or batch file.
C:\Users\StewartGordon>chcp 1252
Active code page: 1252
C:\Users\StewartGordon>ú·£ú
----------
The sequence of strange characters is Alt+0163, Alt+163, Alt+0156, Alt+156 in each case.
Comment #11 by dlang-bugzilla — 2011-05-25T05:02:20Z
Comment #14 by dlang-bugzilla — 2016-10-14T04:06:03Z
I think we should let this one go.
1. To see international characters in the first place, you have to change the console font from a raster one.
2. Setting the output console CP to 65001 is not an option because it breaks spawned programs. In particular, batch files stop working. Problems also occur if the console isn't changed back.
3. Changing the data's output encoding according to the user's locale cannot be done if the output is a file or pipe, as it would be a breaking change.
4. As a result, the only way to do this is to check if the output is the console. However, because we do output via the C standard library, whatever stdout points to may change at any moment, so we cannot cache the check.
5. Since all output is done via the C standard library, it is its responsibility to handle this, however it does not. We do not have control over the MS standard C library, which does not implement this check.
I think this is unactionable unless either we move away from using C for input/output (see: std.io), or someone presents a C example program that produces correct Unicode output to both console and file and which works with all C runtimes that D uses (AFAIU, this is impossible).
> If this is not going to be fixed, it should be documented.
The problem is with Windows and the C libraries, not D.
Comment #15 by bugzilla — 2016-10-14T08:19:53Z
When I start a command prompt in Windows, I run the command:
chcp 65001
which sets it to Unicode.