Bug 15845 – Windows console cannot read properly UTF-8 lines

Status
NEW
Severity
normal
Priority
P3
Component
phobos
Product
D
Version
D2
Platform
x86_64
OS
Windows
Creation time
2016-03-29T00:06:58Z
Last change time
2024-12-01T16:26:23Z
Assigned to
No Owner
Creator
JVortex
See also
https://issues.dlang.org/show_bug.cgi?id=1448, https://issues.dlang.org/show_bug.cgi?id=15761
Moved to GitHub: phobos#9677 →

Comments

Comment #0 by jv_vortex — 2016-03-29T00:06:58Z
module runnable; import std.stdio; import std.string : chomp; import std.experimental.logger; void doSomethingElse(char[] data) { writeln("hello!"); } int main(string[] args) { /* Some fix I found in UTF-8 related problems, I'm using Windows 10 */ version(Windows) { import core.sys.windows.windows; if (SetConsoleCP(65001) == 0) throw new Exception("failure"); if (SetConsoleOutputCP(65001) == 0) throw new Exception("failure"); } FileLogger fl = new FileLogger("log.log"); char[] readerBuffer; readln(readerBuffer); readerBuffer = chomp(readerBuffer); fl.info(readerBuffer.length); /* <- if the readed string contains at least one UTF-8 char this gets 0, else it prints its length */ if (readerBuffer != "exit") doSomethingElse(readerBuffer); /* Also, all the following code doesn't run as expected, the program doesn't wait for you, it executes readln() even without pressing/sending a key */ readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); return 0; } The code above doesn't work properly on windows if you input at least one of the following chars: á, é, í, ó, ú, ñ, à, è, ì, ò, ù (I haven't tried with others). This behaviour is reproducible ONLY using O.S. Windows. It has been tested in Debian and Mac OS X and it works correctly. Also is different for each mode: 32-bit (DMC stdlib) and 64-bit (MVSC stdlib).In both, the line is not read properly (I get a length of 0). On 32-bit, the program exits immediately, indicating it cannot read any more data. On 64-bit, the program continues to allow input.
Comment #1 by schveiguy — 2016-03-29T00:22:28Z
*** Issue 15846 has been marked as a duplicate of this issue. ***
Comment #2 by ag0aep6g — 2016-03-29T21:48:53Z
For -m32 (DIGITAL_MARS_STDIO) it seems to come down to this (with `chcp 65001` in the console): ---- import std.stdio; void main() { FILE* fps = core.stdc.stdio.stdin; FLOCK(fps); scope(exit) FUNLOCK(fps); auto fp = cast(_iobuf*)fps; assert(!(__fhnd_info[fp._file] & FHND_WCHAR)); /* passes; no wide characters */ assert(!(fp._flag & _IOTRAN)); /* passes; no translated mode */ int c = FGETC(fp); assert(c != -1); /* passes with 'a'; fails with 'ä' */ } ---- That is, Digital Mars's FGETC (_fgetc_nlock) returns -1 for non-ASCII characters. The issue does not manifest with a pipe: `echo ä | test` works.
Comment #3 by ag0aep6g — 2016-03-30T21:49:25Z
(In reply to ag0aep6g from comment #2) > For -m32 (DIGITAL_MARS_STDIO) it seems to come down to this (with `chcp > 65001` in the console): [...] > That is, Digital Mars's FGETC (_fgetc_nlock) returns -1 for non-ASCII > characters. > > The issue does not manifest with a pipe: `echo ä | test` works. The same happens with -m64, and with a simple C++ program (just `printf("%d\n", fgetc(stdin));`). So apparently `chcp 65001` is not enough to make UTF-8 input work from the console. I'm not much of a Windows programmer, though, so I have no idea what's missing.
Comment #4 by ag0aep6g — 2016-04-03T14:45:43Z
(In reply to ag0aep6g from comment #2) > The issue does not manifest with a pipe: `echo ä | test` works. It seems to be a problem with input from a TTY. Normal stdin is a TTY, but when a pipe is used it's not. According to _isatty [1], stdin is also not a TTY when I run things in the bash from Git for Windows. And indeed: no UTF-8 problems there. [1] https://msdn.microsoft.com/en-us/library/f4s0ddew.aspx
Comment #5 by robert.schadek — 2024-12-01T16:26:23Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9677 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB