Bug 10125 – readln!dchar misdecodes Unicode non-BMP

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-05-20T05:26:00Z
Last change time
2014-08-29T14:29:08Z
Assigned to
nobody
Creator
fw..vdijk

Comments

Comment #0 by fw..vdijk — 2013-05-20T05:26:14Z
readln!dchar decodes Unicode code point >U+FFFF to 2 surrogates instead of 1 dchar containing the code point. e.g. U+10001 becomes [0xd800,0xdc01] instead of [0x10001]
Comment #1 by fw..vdijk — 2013-05-20T05:33:52Z
queued pull request #1296
Comment #2 by monarchdodra — 2013-07-06T07:16:40Z
Comment #3 by github-bugzilla — 2013-08-04T23:59:44Z
Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/8f401a9199b441f941717cda5ab551c4e1a86a40 fix Issue 10125 Unicode non-BMP decoding to dchar in stdio readln strings were first decoded to wchars, each wchar was then separately decoded to dchar, resulting in 2 dchars in the surrogate block instead of 1 correct dchar. added unit test to verify readln decoding of non-ASCII characters https://github.com/D-Programming-Language/phobos/commit/1086f2955418a4effd7e815d906460e1b137eb2d Merge pull request #1296 from M-frankied/readln-nonbmp fix Issue 10125 Unicode non-BMP decoding to dchar in stdio readln Merged.
Comment #4 by hsteoh — 2014-08-29T14:29:08Z
Fixed, according to latest bug notes. If it still occurs, please reopen. Thanks!