Seems to be related to issue #17161, issue #14431, and probably others, but I'm filing this separately as this seems to have been introduced just recently (merged Aug 2017, though the offending commit itself was earlier).
Code:
------
import std.regex;
void main() {
string s = `blahblahblah`;
auto re = regex(s);
}
------
Compilation command:
------
time dmd -c test.d
------
On git master, the timing output is:
------
real 0m3.171s
user 0m2.936s
sys 0m0.233s
------
Which is ridiculously slow for just the mere act of compiling a single regex.
I've managed to isolate the problematic commit to: 905788a65a4b7833f52ee0701dc919ee54f0d35b, which is part of Phobos PR #5337 (https://github.com/dlang/phobos/pull/5337). There may be other culprits as well, but this is the major one. Compiling the above code on this specific commit gives:
------
real 0m2.791s
user 0m2.572s
sys 0m0.218s
------
whereas doing so on the ancestor commit gives:
------
real 0m1.004s
user 0m0.892s
sys 0m0.111s
------
Which is not great, but still 2-3 times faster.
Comment #1 by dfj1esp02 — 2019-01-19T13:45:38Z
Will a non-templated wrapper be good enough? How big api do you want?
Comment #2 by dfj1esp02 — 2019-01-19T14:54:49Z
Proof of concept for Adam's code:
// https://github.com/adamdruppe/arsd/blob/ff68e1cf004861dcf256fce996bec851c7c0e208/cgi.d
struct Uri {
import std.conv, std.string;
// scheme//userinfo@host:port/path?query#fragment
string scheme; /// e.g. "http" in "http://example.com/"
string userinfo; /// the username (and possibly a password) in the uri
string host; /// the domain name
int port; /// port number, if given. Will be zero if a port was not explicitly given
string path; /// e.g. "/folder/file.html" in "http://example.com/folder/file.html"
string query; /// the stuff after the ? in a uri
string fragment; /// the stuff after the # in a uri.
/// Breaks down a uri string to its components
this(string uri) {
reparse(uri);
}
private void reparse(string uri) {
//import std.regex;
// from RFC 3986
// the ctRegex triples the compile time and makes ugly errors for no real benefit
// it was a nice experiment but just not worth it.
// enum ctr = ctRegex!r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?";
auto ctr = regex(r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?");
auto m = match(uri, ctr);
if(m) {
scheme = m.captures[2];
auto authority = m.captures[4];
auto idx = authority.indexOf("@");
if(idx != -1) {
userinfo = authority[0 .. idx];
authority = authority[idx + 1 .. $];
}
idx = authority.indexOf(":");
if(idx == -1) {
port = 0; // 0 means not specified; we should use the default for the scheme
host = authority;
} else {
host = authority[0 .. idx];
port = to!int(authority[idx + 1 .. $]);
}
path = m.captures[5];
query = m.captures[7];
fragment = m.captures[9];
}
// uriInvalidated = false;
}
}
import std=std.regex;
StringRegex regex(string pattern, const char[] flags=null)
{
return StringRegex(std.regexImpl(pattern,flags));
}
struct StringRegex
{
alias typeof(std.regexImpl("")) Type;
Type re;
}
RegexMatch match(string input, StringRegex re)
{
return RegexMatch(std.match(input,re.re));
}
struct RegexMatch
{
alias std.RegexMatch!string Type;
Type mre;
this(this){}
~this(){}
bool opCast() const { return !mre.empty; }
inout(Captures) captures() inout { return inout Captures(mre.captures); }
}
struct Captures
{
alias std.Captures!string Type;
Type cre;
string opIndex(size_t i) const { return cre[i]; }
}
As a non-templated interface it can be provided with an interface file.
Comment #3 by robert.schadek — 2024-12-01T16:32:28Z