Bug 12897 – std.json.toJSON doesn't translate unicode chars(>=0x80) to "\uXXXX"

Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-06-12T11:48:10Z
Last change time
2020-03-21T03:56:36Z
Assigned to
Basile-z
Creator
egustc

Attachments

IDFilenameSummaryContent-TypeSize
1362foo.djson bugtext/x-csrc175

Comments

Comment #0 by egustc — 2014-06-12T11:48:10Z
Created attachment 1362 json bug As the attachment showed, uUnicode chars GE than 0x80 (for exp.: Chinese, Japanese ) should be converted to "\uXXXX" in JSON. But Phobos doesn't. It causes problems while transmitting JSON from D to other languages. A std.json.appendJSONChar implement can fix this bug: private void appendJSONChar(Appender!string* dst, wchar c) { if(isControl(c) || c>=0x80) dst.put("\\u%04x".format(c)); else dst.put(c); }
Comment #1 by justin — 2014-07-11T18:13:16Z
Looking at the spec (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) it appears that while strings _may_ encode characters using the escape sequence, they are not _required_ to for any range of characters. On the face of it it seems that std.json is conformant and other languages are not. Which parsers are unable to handle the raw UTF-8?
Comment #2 by egustc — 2014-07-12T01:13:01Z
OK... I used Python but didn't decode first and got a problem. (In reply to Justin Whear from comment #1) > Looking at the spec > (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) > it appears that while strings _may_ encode characters using the escape > sequence, they are not _required_ to for any range of characters. On the > face of it it seems that std.json is conformant and other languages are not. > Which parsers are unable to handle the raw UTF-8?
Comment #3 by b2.temp — 2016-03-22T17:22:50Z
(In reply to egustc from comment #2) > OK... I used Python but didn't decode first and got a problem. > > (In reply to Justin Whear from comment #1) > > Looking at the spec > > (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) > > it appears that while strings _may_ encode characters using the escape > > sequence, they are not _required_ to for any range of characters. On the > > face of it it seems that std.json is conformant and other languages are not. > > Which parsers are unable to handle the raw UTF-8? I propose a PR for this (https://github.com/D-Programming-Language/phobos/pull/4106), but it was not clear if you considered the problem as fixed or not. Maybe it can even be closed without any modification. Let's see what people say.
Comment #4 by github-bugzilla — 2016-04-10T08:23:38Z
Commit pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/b5cd354a05033ade13ae376377bec590bef62212 Merge pull request #4106 from BBasile/issue-12897 fix issue 12897 - toJSON, add the escapeNonAsciiChars option