The following code does not correctly handle Unicode strings.
-----
import std.stdio;
void main () {
string s;
readf ("%s", &s);
writeln (s.length);
write (s);
}
-----
Example input ("Test." in cyrillic):
-----
Тест.
-----
(hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)
That is 11 bytes (with '\n'=CR/LF being two bytes on Windows).
Example output:
-----
18
ТеÑÑ.
-----
(hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)
The second line is 19 bytes (again with '\n'=CR/LF being two bytes on Windows).
The reported length (18 counting '\n' as one character - instead of the expected length of 10) ensures that the problem is in reading, not in writing.
Here, the input bytes are handled separately: D0 -> C3 90, A2 -> C2 A2, etc.
On the bright side, reading the file with readln works properly.
Relevant discussion: http://forum.dlang.org/thread/[email protected]
Comment #1 by gassa — 2014-11-04T18:21:43Z
Created attachment 1449
example program
Comment #2 by gassa — 2014-11-04T18:22:00Z
Created attachment 1450
example input
Comment #3 by gassa — 2014-11-04T18:22:13Z
Created attachment 1451
example output
Comment #4 by ag0aep6g — 2014-11-05T18:46:42Z
Copying my comment from the forum thread:
std.stdio.LockingTextReader is to blame:
void main()
{
import std.stdio;
auto ltr = LockingTextReader(std.stdio.stdin);
write(ltr);
}
----
$ echo Тест | rdmd test.d
ТеÑÑ
LockingTextReader has a dchar front. But it doesn't do any
decoding. The dchar front is really a char front.