Bug 8725 – segmentation fault with negative-lookahead in module-level regex
Status
RESOLVED
Resolution
DUPLICATE
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
x86_64
OS
Mac OS X
Creation time
2012-09-25T22:31:00Z
Last change time
2012-12-01T00:12:43Z
Assigned to
nobody
Creator
val
Comments
Comment #0 by val — 2012-09-25T22:31:39Z
The following program crashes with a segmentation fault:
-------------
#!/usr/bin/env rdmd
import std.stdio;
import std.regex;
auto italic = regex( r"\*
(?!\s+)
(.*?)
(?!\s+)
\*", "gx" );
void main() {
string input = "this * is* interesting, *very* interesting";
writeln( replace( input, italic, "<i>$1</i>" ) );
}
--------------
If one removes the first line with (?!\s+), then the program doesn't crash.
I was under the impression that this snippet of code operates under the SafeD subset and therefore shouldn't cause a segmentation fault. A thrown exception on problems or something, that I can understand. But a segfault?
In other sad news, these are the first lines of D I've ever written :( ... so much for experimentation...
Comment #1 by val — 2012-09-25T22:33:03Z
Oh, and the segfault goes away if I put the regex creation directly in the call, like so:
writeln( replace( input, regex( r"\*
(?!\s+)
(.*?)
(?!\s+)
\*", "gx" ), "<i>$1</i>" ) );
Comment #2 by dmitry.olsh — 2012-09-26T06:46:49Z
I suspect that is a long standing bug with compile-time evaluation that compiler parses regex pattern at compile time wrongly (unlike at R-T).
See also: http://d.puremagic.com/issues/show_bug.cgi?id=7810
The problem is that once D compiler sees an initialized global variable it has to const-fold it:
int fact10 = factorial(10);
//will compute and hardcode the value of factorial(10)
then with regex ...:
auto italic = regex( ... );
// *parses* and *generates* binary object for compiled regex pattern object with all the datastructures for matching it
All of this *at compile time* via CTFE, see about it here (near the bottom of): http://dlang.org/function.html
Though previously it only caused unexpectedly long compilation time (CTFE is slow) and in a select cases it failed with assert *during compilation*, it never segfaulted.
Probably internal structure has subtle corruption that self-test failed to catch.
E.g this one also works because italic regex is created at run-time:
import std.stdio;
import std.regex;
void main() {
auto italic = regex( r"\*
(?!\s+)
(.*?)
(?!\s+)
\*", "gx" );
string input = "this * is* interesting, *very* interesting";
writeln( replace( input, italic, "<i>$1</i>" ) );
}
Also a tip: the second lookahead should be lookbehind! As is is it will test that \* is not a space indeed... Also both can be just \s, because \s+ matches whenever \s matches. And since you don't capture the contents of lookahead/lookbehind it'll be faster/simpler to use a single \s.
About SafeD: it shouldn't segfault but the program listed is @system (as this is the default) :). Otherwise since regex is @trusted, it's my responsibilty to verfiy that it is memory safe, so blame me (or rather the compiler).
To be actually in SafeD try putting @safe: at the top of your code or just tag main and all functions with @safe.
AFAIK writeln in SafeD wouldn't work as it's still @system (obviously it should be safe/trusted). To be honest SafeD hasn't been addressed properly in the standard library yet.
Comment #3 by val — 2012-09-26T09:39:30Z
Thanks for the explanation!
WRT the regex string being faulty, I was aware of that; I was just experimenting when I encountered a segfault.
Thanks for the pointer about adding @safe: at the top; too bad writeln is still @system. That kinda kills the usefulness of SafeD, doesn't it? I mean if I literally can't write a Hello World program in SafeD, then SafeD is quite far from ready. :)
I've read the TDPL last week and this is my first encounter with writing real D code; all in all, the language is freaking awesome (goodbye C++) and I'm even willing to live with esoteric bugs in the compiler/libs if I can work around them. I understand that D is still a work-in-progress language.
I intend to write a substantial (multi KLOC) D program as a learning experience; will report any bugs I find as I find them.
Anyway, good luck fixing this. :)
Comment #4 by dmitry.olsh — 2012-11-30T12:49:42Z
Works with current git master.
Must have been fixed along with the compiler bug in 7810.
*** This issue has been marked as a duplicate of issue 7810 ***
Comment #5 by github-bugzilla — 2012-12-01T00:12:43Z