← Back to index | Original Bugzilla link

Bug 13686 – Reading unicode string with readf ("%s") produces a wrong string

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P1
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2014-11-04T18:20:00Z
Last change time: 2015-02-18T03:39:49Z
Assigned to: nobody
Creator: gassa

Attachments

ID	Filename	Summary	Content-Type	Size
1449	test.d	example program	text/plain	108
1450	test.in	example input	text/plain	11
1451	test.out	example output	text/plain	23

Comments

Comment #0 by gassa — 2014-11-04T18:20:56Z

The following code does not correctly handle Unicode strings. ----- import std.stdio; void main () { string s; readf ("%s", &s); writeln (s.length); write (s); } ----- Example input ("Test." in cyrillic): ----- Тест. ----- (hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A) That is 11 bytes (with '\n'=CR/LF being two bytes on Windows). Example output: ----- 18 Ð¢ÐµÑÑ. ----- (hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A) The second line is 19 bytes (again with '\n'=CR/LF being two bytes on Windows). The reported length (18 counting '\n' as one character - instead of the expected length of 10) ensures that the problem is in reading, not in writing. Here, the input bytes are handled separately: D0 -> C3 90, A2 -> C2 A2, etc. On the bright side, reading the file with readln works properly. Relevant discussion: http://forum.dlang.org/thread/[email protected]

Comment #1 by gassa — 2014-11-04T18:21:43Z

Created attachment 1449 example program

Comment #2 by gassa — 2014-11-04T18:22:00Z

Created attachment 1450 example input

Comment #3 by gassa — 2014-11-04T18:22:13Z

Created attachment 1451 example output

Comment #4 by ag0aep6g — 2014-11-05T18:46:42Z

Copying my comment from the forum thread: std.stdio.LockingTextReader is to blame: void main() { import std.stdio; auto ltr = LockingTextReader(std.stdio.stdin); write(ltr); } ---- $ echo Тест | rdmd test.d Ð¢ÐµÑÑ LockingTextReader has a dchar front. But it doesn't do any decoding. The dchar front is really a char front.

Comment #5 by ag0aep6g — 2014-11-07T14:29:44Z

https://github.com/D-Programming-Language/phobos/pull/2663

Comment #6 by github-bugzilla — 2014-11-13T19:53:41Z

Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/3d058e285352998a445acc30c401a13ab1ba0b4d fix LockingTextReader: issues 13686 and 12320 Issue 13686 (Reading unicode string with readf ("%s") produces a wrong string) is fixed by reading all chars of a multibyte sequence and decoding. Before, each char was mistaken for a dchar. Issue 12320 (std.stdio.LockingTextReader populates .front in .empty) is fixed by moving the work from empty to popFront. https://github.com/D-Programming-Language/phobos/commit/a0ca85550a74cd20b435db6ac6da2f9e4902ab96 Merge pull request #2663 from aG0aep6G/lockingtextreader fix LockingTextReader: issues 13686 and 12320

Comment #7 by github-bugzilla — 2015-02-18T03:39:49Z

Commits pushed to 2.067 at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/3d058e285352998a445acc30c401a13ab1ba0b4d fix LockingTextReader: issues 13686 and 12320 https://github.com/D-Programming-Language/phobos/commit/a0ca85550a74cd20b435db6ac6da2f9e4902ab96 Merge pull request #2663 from aG0aep6G/lockingtextreader