Bug 7519 – std.xml cannot manage single quoted attribute values
Status
RESOLVED
Resolution
INVALID
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-02-16T07:29:15Z
Last change time
2019-12-24T08:23:48Z
Assigned to
No Owner
Creator
Michael Rynn
Comments
Comment #0 by michaelrynn — 2012-02-16T07:29:15Z
Search for std.xml on google, and you will get a "top answer: Don't use std.xml".
Nevertheless, I've put up on review list, some candidates for a xml tool set for D.
Using my experience in building an xml parser the hard way for a while, yesterday, I took a look again at the old std.xml, and remembered my first efforts to understand it, and how I got lost and gave up trying to make a few changes on it.
Now I have backported a few efficiencies and a bug fix or too, to make a std.xml1, from std.xml, (my own project is currently labelled std.xml2).
Xml is probably a separate library, given its proper code size.
But I just made a first different new version by editing your Phobos toy xml.
Its nearly 50% faster on the release compile, due to a number of obvious optimizations. The main Element parse loop was given a more reasonable arrangement, and is more efficient with some custom munchers.
On my now more educated code review, I found and fixed a most amazing bug, that current std.xml does not support single quoted attribute values. This almost certainly proves no one is using it.
I haven't yet started to replace its its monomaniacal error checking debug code, which slows debug version execution performance down to a snail. I put a version tag in, to suggest throwing away those crazy catagory arrays in the Element class.
std.xml1, is a toy parser still. I have added it to my std.xmlp project, to see how far a tiny phobos parser can go, because I've certainly got more code invested in other versions. Still, there are some interesting approaches in it. So I did a days coding work and replaced some of its toy pieces, from my experience of what works better. Further improvements without radical change will be more difficult. I think I might just also change it to use my versions of common code that I borrowed and developed, originally from std.encode and std.xml
std.xml1 has an added module dependency of my own creation, currently called alt.zstring. This includes an Array!(T), which is meant to be an efficient array struct, that also tracks its capacity. Its nice to have this in a class, so its always available on callback. I know there are Appender thingos in std.array, but I wanted my own hand tuned, hard tested version, used in the std.xmlp xml tools. I suppose I should look hard at the std.array for a substitute, or for ways to improve this one. But the Array!T can be easily removed or substituted, in std.xml1. I know I throw it in, to find out its limitations, requiring improvement.
This means that its not now a drop in replacement for std.xml.
But I think I could do one in a few days, given encouragment, and some access please.
Here is the URL of d2-xml project:
https://launchpad.net/d2-xml
Its now on DigMars review list, for attracting attention and comment.
and view code at http://bazaar.launchpad.net/~michael-rynn-500/d2-xml/d2-xml-dev/view/head:/std/xml1.d
Here is the original offending attribute parsing code in class Tag, constructor
string key = munch(s,"^="~whitespace);
munch(s,whitespace);
reqc(s,'=');
munch(s,whitespace);
reqc(s,'"');
string val = decode(munch(s,"^\""), DecodeMode.LOOSE);
reqc(s,'"');
munch(s,whitespace);
attr[key] = val;
Note only double quotes are expected, by reqc.
Here is some of my replacement, referring to some new munchers, which are simple loop switch char, find and slice, to replace the generic pattern muncher. decode function, to look for entities, changed as well.
string key = munchAttribute(s);
eatWhiteSpace(s);
reqc(s,'=');
eatWhiteSpace(s);
if (s.length == 0)
badParseEnd();
dchar quoteMe = s[0];
if ((quoteMe != '\'') && (quoteMe != '\"'))
badAttributeQuote(quoteMe);
s = s[1..$];
string val = decode(munchTillNext(s,quoteMe), DecodeMode.LOOSE);
if (s.length < 1 || s[0] != quoteMe)
badAttributeQuote(s[0]);
s = s[1..$];
eatWhiteSpace(s);
attr[key] = val;
Comment #1 by webby — 2012-02-17T08:09:00Z
For what it's worth, a simple
string s = cast(string)std.file.read(filePath);
auto doc = new Document(s);
using the xml file attached to https://bugs.launchpad.net/d2-xml/+bug/933594
Takes ~18.6 seconds using std.xml and ~16.4 seconds using std.xml1.
The parse time drops substantially if the GC is disabled while the Document is being created.
Comment #2 by bugzilla — 2019-12-24T08:23:48Z
I think, this is pretty much outdated meanwhile. I could not find any munch() function in the current implementation... As this report is quiet old, I'm closing it. Feel free to reopen it, if you think, the issue still exists. (And in that case I would be happy, if you could provide an example showing what's going wrong.)