Bug 9621 – support html named entities in std.conv.parseEscape

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-03-01T02:27:46Z
Last change time
2024-12-01T16:16:44Z
Assigned to
No Owner
Creator
monarchdodra
Moved to GitHub: phobos#9600 →

Comments

Comment #0 by monarchdodra — 2013-03-01T02:27:46Z
D allows this: void main() { string s1 = "\&"; string s2 = "\141"; assert(s1 == "&"); assert(s2 == "a"); } But parse doesn't allow it (not supported in parse escape). //---- void main() { string s1 = `[ "\&", "\141" ]`; writeln(parse!(string[])(s1)); } //---- Can't parse string: Unknown escape character & Can't parse string: Unknown escape character 1
Comment #1 by dmitry.olsh — 2013-03-01T02:59:43Z
Is it documented anywhere that std.conv.parse should follow D lexer conventions on parsing?? If not I guess we shouldn't pretend it does and pull the whole freaking table of HTML4/5 entities in *every* program that uses parse to read a couple of ints.
Comment #2 by monarchdodra — 2013-03-01T03:27:10Z
(In reply to comment #1) > Is it documented anywhere that std.conv.parse should follow D lexer conventions > on parsing?? Well it's kind of implied, isn't it? Why would parse follow a convention other than D's ? No it's not documented, but I do remember somewhere in the threads that Jonathan (I thin it was him), specifically saying that the idea is that it allowed parsing pretty much anything that's valid D. > If not I guess we shouldn't pretend it does and pull the whole freaking table > of HTML4/5 entities in *every* program that uses parse to read a couple of > ints. I Disagree because the function *is* named parse, and is capable of parsing a string, and returning the object parsed (in this case a string). If "\"" is a valid D string, then I'd expect parse to not choke on it. As long as the user is parsing string to int, then no, he shouldn't need it, but if the parse outcome is a string, there is no excuse to not do it right. Shouldn't the fact that the table would only ever be used in a template function (parse) mean the compiler should be able to know whether or not to link with said table? Or would importing std.conv immediately link in the table into the final executable?
Comment #3 by monarchdodra — 2013-03-01T03:30:39Z
(In reply to comment #1) > If not I guess we shouldn't pretend it does and pull the whole freaking table > of HTML4/5 entities in *every* program that uses parse to read a couple of > ints. How does std.uni does it? I mean, in the case I want to know if unicode character is white, does it mean I'll have to pull the entire unicode tables for isUpper etc. etc. etc. I'm not trying to justify by comparison, but trying to see how other modules work with this "problem".
Comment #4 by dmitry.olsh — 2013-03-01T04:12:34Z
(In reply to comment #3) > (In reply to comment #1) > > If not I guess we shouldn't pretend it does and pull the whole freaking table > > of HTML4/5 entities in *every* program that uses parse to read a couple of > > ints. > > How does std.uni does it? > That's why I'm increasinlgy against of adding tables that are hidden behind opaque interface. I feel uneasy about it. That's why I exposed all I ould about tables & predefined sets in std.uni. For instance any set is usable not only for std.uni puprposes. I also took tremendous effort to not include tables unless user code needs them and will seek new ways to avoid it. Having a dead HTML5 entity table burried beneath innocently looking function is NOT good enough. If we do it there HAS to be a way to tap into HTML entities so that people wouldn't have to include the VERY SAME table twice should they need full access to HTML5 entities. > I mean, in the case I want to know if unicode character is white, does it mean > I'll have to pull the entire unicode tables for isUpper etc. etc. etc. Something I'm going to change. Technically there is no reason to pull these tables. Also in case of parse the cost to benefit is far greater since if you use isXXX you surely need the table, period. In case of parse you may easily never hit escape sequence or even mean to unescape it in your data but you'd pay all the same. > I'm not trying to justify by comparison, but trying to see how other modules > work with this "problem". I thought std.conv.parse goal was closer to sscanf of C. In other words that it's a backbone behind the formattedRead, readf etc. If the goal is to parse whatever D strings are I fail to see the use case as e.g. std.d.lexer would 100% likely to use its own tricks to process escapes etc. to be more efficient.
Comment #5 by dmitry.olsh — 2013-03-01T04:13:40Z
> Something I'm going to change. Technically there is no reason to pull these > tables. Also in case of parse the cost to benefit is far I've meant lower, obviously. > since if you > use isXXX you surely need the table, period. In case of parse you may easily > never hit escape sequence or even mean to unescape it in your data but you'd > pay all the same.
Comment #6 by dmitry.olsh — 2013-03-01T04:33:15Z
(In reply to comment #5) > > Something I'm going to change. Technically there is no reason to pull these > > tables. Also in case of parse the cost to benefit is far > > I've meant lower, obviously. Looks like I'm on streak... for std.conv.parse it's *higher* cost to benefit ratio after all. Sorry for the confusion.
Comment #7 by monarchdodra — 2013-03-01T04:50:56Z
(In reply to comment #4) > I thought std.conv.parse goal was closer to sscanf of C. In other words that > it's a backbone behind the formattedRead, readf etc. I guess the whole discussion boils down to rather "what should/does formattedRead" accept then? Given the fact that it is "higher order" and capable of parsing arrays of stuff, what happens what it parses a string that represents an array of strings? I mean, imagine this program: string s1 = ... string s2[]; formattedRead(s1, "%s", &s2); The question is: What are legal s1 values? s1 = `["a", "b"]`; => ["a", "b"] s1 = `["a", "b", ]`; => ["a", "b"] (1) s1 = `["ab", ['a', 'b']]` => ["ab", "ab"] s1 = `["\t", "\n"]`; => ["\t", "\n"] s1 = `["\0"]`; => ["\0"] (2) s1 = `["\141"]`; => ["a"] s1 = `["\x61"]`; => ["a"] s1 = `["\u0061"]`; => ["a"] s1 = `["\U00000061"]`; => ["a"] s1 = `["\&"]`; => ["&"] (3) (1) //Not currently supported (2) //Not currently supported (3) //Not currently supported Unless formatted read can document what it can(should) and doesn't support, we'll just run around in circles...
Comment #8 by dlang-bot — 2019-11-13T12:08:47Z
@berni44 created dlang/phobos pull request #7273 "Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named" mentioning this issue: - Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named https://github.com/dlang/phobos/pull/7273
Comment #9 by dlang-bot — 2019-11-13T19:20:20Z
@berni44 created dlang/phobos pull request #7274 "Fix Issue 9621 - std.conv.parseEscape fails on octals and named" fixing this issue: - Fix Issue 9621 - std.conv.parseEscape fails on octals and named https://github.com/dlang/phobos/pull/7274
Comment #10 by dlang-bot — 2019-11-14T04:41:31Z
dlang/phobos pull request #7273 "Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named" was merged into master: - 932d49b2178c52ebc6c74f11f9797d6ff85c0ab0 by Bernhard Seckinger: Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named https://github.com/dlang/phobos/pull/7273
Comment #11 by dkorpel — 2022-11-06T16:37:36Z
The octal part has been fixed, so I changed the title accordingly
Comment #12 by robert.schadek — 2024-12-01T16:16:44Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9600 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB