← Back to index | Original Bugzilla link

Bug 10125 – readln!dchar misdecodes Unicode non-BMP

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2013-05-20T05:26:00Z
Last change time: 2014-08-29T14:29:08Z
Assigned to: nobody
Creator: fw..vdijk

Comments

Comment #0 by fw..vdijk — 2013-05-20T05:26:14Z

readln!dchar decodes Unicode code point >U+FFFF to 2 surrogates instead of 1 dchar containing the code point. e.g. U+10001 becomes [0xd800,0xdc01] instead of [0x10001]

Comment #1 by fw..vdijk — 2013-05-20T05:33:52Z

queued pull request #1296

Comment #2 by monarchdodra — 2013-07-06T07:16:40Z

Concurrently fixed in: https://github.com/D-Programming-Language/phobos/pull/1381

Comment #3 by github-bugzilla — 2013-08-04T23:59:44Z

Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/8f401a9199b441f941717cda5ab551c4e1a86a40 fix Issue 10125 Unicode non-BMP decoding to dchar in stdio readln strings were first decoded to wchars, each wchar was then separately decoded to dchar, resulting in 2 dchars in the surrogate block instead of 1 correct dchar. added unit test to verify readln decoding of non-ASCII characters https://github.com/D-Programming-Language/phobos/commit/1086f2955418a4effd7e815d906460e1b137eb2d Merge pull request #1296 from M-frankied/readln-nonbmp fix Issue 10125 Unicode non-BMP decoding to dchar in stdio readln Merged.

Comment #4 by hsteoh — 2014-08-29T14:29:08Z

Fixed, according to latest bug notes. If it still occurs, please reopen. Thanks!