According to RFC 4627, characters outside the Basic Multilingual Plane (ie. those that require more than two bytes to represent) are encoded as a surrogate pair in JSON strings. In effect, what you have to do is test whether a "\uXXXX" value is >= 0xD800 and <= 0xDBFF. If so, then the next value should be another "\uXXXX" character representing the low surrogate. To verify this, the value should be >= 0xDC00 and <= 0xDFFF. If it isn't, then skip the preceding "\uXXXX" value (the high surrogate) as invalid and decode the following "\uXXXX" value as a standalone Unicode code-point (the RFC is actually unclear on this point, but this seems the most reasonable failure mode). Assuming that you have a valid high and low surrogate, stick them into a wchar[2] and convert to UTF8.
Comment #1 by dlang-bugzilla — 2017-06-25T16:42:53Z
Test case:
///////////// test.d /////////////
import std.json;
void main()
{
string s = `"\uD834\uDD1E"`;
auto j = parseJSON(s);
assert(j.str == "\U0001D11E");
}
//////////////////////////////////
Comment #2 by dlang-bugzilla — 2017-06-26T10:03:48Z