Bug 18378 – std.regex causes major slowdown in compilation times

Status
NEW
Severity
regression
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2018-02-05T21:18:46Z
Last change time
2024-12-01T16:32:28Z
Assigned to
No Owner
Creator
hsteoh
Moved to GitHub: phobos#9742 →

Comments

Comment #0 by hsteoh — 2018-02-05T21:18:46Z
Seems to be related to issue #17161, issue #14431, and probably others, but I'm filing this separately as this seems to have been introduced just recently (merged Aug 2017, though the offending commit itself was earlier). Code: ------ import std.regex; void main() { string s = `blahblahblah`; auto re = regex(s); } ------ Compilation command: ------ time dmd -c test.d ------ On git master, the timing output is: ------ real 0m3.171s user 0m2.936s sys 0m0.233s ------ Which is ridiculously slow for just the mere act of compiling a single regex. I've managed to isolate the problematic commit to: 905788a65a4b7833f52ee0701dc919ee54f0d35b, which is part of Phobos PR #5337 (https://github.com/dlang/phobos/pull/5337). There may be other culprits as well, but this is the major one. Compiling the above code on this specific commit gives: ------ real 0m2.791s user 0m2.572s sys 0m0.218s ------ whereas doing so on the ancestor commit gives: ------ real 0m1.004s user 0m0.892s sys 0m0.111s ------ Which is not great, but still 2-3 times faster.
Comment #1 by dfj1esp02 — 2019-01-19T13:45:38Z
Will a non-templated wrapper be good enough? How big api do you want?
Comment #2 by dfj1esp02 — 2019-01-19T14:54:49Z
Proof of concept for Adam's code: // https://github.com/adamdruppe/arsd/blob/ff68e1cf004861dcf256fce996bec851c7c0e208/cgi.d struct Uri { import std.conv, std.string; // scheme//userinfo@host:port/path?query#fragment string scheme; /// e.g. "http" in "http://example.com/" string userinfo; /// the username (and possibly a password) in the uri string host; /// the domain name int port; /// port number, if given. Will be zero if a port was not explicitly given string path; /// e.g. "/folder/file.html" in "http://example.com/folder/file.html" string query; /// the stuff after the ? in a uri string fragment; /// the stuff after the # in a uri. /// Breaks down a uri string to its components this(string uri) { reparse(uri); } private void reparse(string uri) { //import std.regex; // from RFC 3986 // the ctRegex triples the compile time and makes ugly errors for no real benefit // it was a nice experiment but just not worth it. // enum ctr = ctRegex!r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"; auto ctr = regex(r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"); auto m = match(uri, ctr); if(m) { scheme = m.captures[2]; auto authority = m.captures[4]; auto idx = authority.indexOf("@"); if(idx != -1) { userinfo = authority[0 .. idx]; authority = authority[idx + 1 .. $]; } idx = authority.indexOf(":"); if(idx == -1) { port = 0; // 0 means not specified; we should use the default for the scheme host = authority; } else { host = authority[0 .. idx]; port = to!int(authority[idx + 1 .. $]); } path = m.captures[5]; query = m.captures[7]; fragment = m.captures[9]; } // uriInvalidated = false; } } import std=std.regex; StringRegex regex(string pattern, const char[] flags=null) { return StringRegex(std.regexImpl(pattern,flags)); } struct StringRegex { alias typeof(std.regexImpl("")) Type; Type re; } RegexMatch match(string input, StringRegex re) { return RegexMatch(std.match(input,re.re)); } struct RegexMatch { alias std.RegexMatch!string Type; Type mre; this(this){} ~this(){} bool opCast() const { return !mre.empty; } inout(Captures) captures() inout { return inout Captures(mre.captures); } } struct Captures { alias std.Captures!string Type; Type cre; string opIndex(size_t i) const { return cre[i]; } } As a non-templated interface it can be provided with an interface file.
Comment #3 by robert.schadek — 2024-12-01T16:32:28Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9742 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB