Bug 5904 – std.json parseString doesn't handle chars outside the BMP

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
Other
OS
All
Creation time
2011-04-28T12:24:48Z
Last change time
2018-01-05T13:29:31Z
Keywords
pull
Assigned to
No Owner
Creator
Sean Kelly
See also
https://issues.dlang.org/show_bug.cgi?id=17556

Comments

Comment #0 by sean — 2011-04-28T12:24:48Z
According to RFC 4627, characters outside the Basic Multilingual Plane (ie. those that require more than two bytes to represent) are encoded as a surrogate pair in JSON strings. In effect, what you have to do is test whether a "\uXXXX" value is >= 0xD800 and <= 0xDBFF. If so, then the next value should be another "\uXXXX" character representing the low surrogate. To verify this, the value should be >= 0xDC00 and <= 0xDFFF. If it isn't, then skip the preceding "\uXXXX" value (the high surrogate) as invalid and decode the following "\uXXXX" value as a standalone Unicode code-point (the RFC is actually unclear on this point, but this seems the most reasonable failure mode). Assuming that you have a valid high and low surrogate, stick them into a wchar[2] and convert to UTF8.
Comment #1 by dlang-bugzilla — 2017-06-25T16:42:53Z
Test case: ///////////// test.d ///////////// import std.json; void main() { string s = `"\uD834\uDD1E"`; auto j = parseJSON(s); assert(j.str == "\U0001D11E"); } //////////////////////////////////
Comment #2 by dlang-bugzilla — 2017-06-26T10:03:48Z
Comment #3 by github-bugzilla — 2017-07-03T09:07:45Z
Commit pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/b23e7a4107cc2eb3275e022cb46f7270e586ca29 Fix Issue 5904 - std.json parseString doesn't handle chars outside the BMP
Comment #4 by github-bugzilla — 2017-07-08T17:09:24Z
Commit pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/b23e7a4107cc2eb3275e022cb46f7270e586ca29 Fix Issue 5904 - std.json parseString doesn't handle chars outside the BMP
Comment #5 by github-bugzilla — 2018-01-05T13:29:31Z
Commit pushed to dmd-cxx at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/b23e7a4107cc2eb3275e022cb46f7270e586ca29 Fix Issue 5904 - std.json parseString doesn't handle chars outside the BMP