Bug 11350 – libphobos2 regex match segfaults when a rare HTTP header is received
Status
RESOLVED
Resolution
WORKSFORME
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Linux
Creation time
2013-10-25T03:30:00Z
Last change time
2014-09-17T21:10:17Z
Keywords
pull
Assigned to
nobody
Creator
sha0
Comments
Comment #0 by sha0 — 2013-10-25T03:30:36Z
A simple std.net.curl.get() is performed to a remote host, which contains some rare http headers, (I don't define the onReceiveHeader callback) but the liphobos2 call to the default onReceiveHeader() which apply a regex to the header, and then crashes.
I connect on this way:
auto conn = HTTP();
conn.connectTimeout(dur!"seconds"(4));
conn.addRequestHeader("User-agent","Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0");
char[] html = get(url,conn);
It seems the bug is at:
/usr/include/dmd/phobos/std/regex.d line 6348
6537 public auto match(R, RegEx)(R input, RegEx re)
6538 if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
6539 {
6540 return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
6541 }
Maybe is an encoding problem, it seems the input is:
>>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
da�H4STeF
(gdb) bt
#0 0xb76c8d13 in rt.deh2.terminate() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#1 0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#2 0x080b04cc in _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (this=0x95ac0774, input=646197483453546546, prog=...)
at /usr/include/dmd/phobos/std/regex.d:6348
#3 0x080a09a2 in _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (__HID46=0x95ac0b18, re=..., input=646197483453546546) at /usr/include/dmd/phobos/std/regex.d:6540
#4 0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#5 0xb769125a in std.net.curl.Curl.onReceiveHeader() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#6 0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#7 0xb72a5e7a in Curl_client_write () from /usr/lib/i386-linux-gnu/libcurl.so.4
#8 0xb72a4912 in Curl_http_readwrite_headers () from /usr/lib/i386-linux-gnu/libcurl.so.4
#9 0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4
#10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4
#11 0xb72be793 in curl_easy_perform () from /usr/lib/i386-linux-gnu/libcurl.so.4
#12 0xb7691093 in std.net.curl.Curl.perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#13 0xb768d8e1 in std.net.curl.HTTP._perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#14 0xb768d734 in std.net.curl.HTTP.perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#15 0x08081aac in _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa (client=..., sendData=579669917507256320,
url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762
#16 0x08081948 in _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa (conn=..., url=10576998119117946914)
at /usr/include/dmd/phobos/std/net/curl.d:364
Comment #1 by dmitry.olsh — 2013-10-25T11:21:26Z
(In reply to comment #0)
>
> It seems the bug is at:
>
> /usr/include/dmd/phobos/std/regex.d line 6348
>
> 6537 public auto match(R, RegEx)(R input, RegEx re)
> 6538 if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
> 6539 {
> 6540 return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
> 6541 }
>
> Maybe is an encoding problem, it seems the input is:
> >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
> da�H4STeF
>
Would be nice to see what pattern that is and how exactly the argument to it looks like.
I tried to reproduce with this:
void main()
{
import std.regex;
ubyte[] header = [0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46];
auto m = match(cast(char[]) header, regex("(.*?): (.*)$"));
assert(m.empty);
}
I get:
std.utf.UTFException@C:\dmd2\windows\bin\..\..\src\phobos\std\utf.d(1113): Invalid UTF-8 sequence (at index 1)
No crashes.
Now it may have to do with shared object / PIC code for all I know, as I'm testing on Win32.
But w/o a smaller or at least complete reproduceble test-case there is nothing to work on.
Comment #2 by dmitry.olsh — 2013-10-25T11:40:08Z
(In reply to comment #0)
> It seems the bug is at:
No and I think I know what it is.
> Maybe is an encoding problem, it seems the input is:
> >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
> da�H4STeF
Yes, this is broken UTF-8 and hence...
>
>
>
> (gdb) bt
> #0 0xb76c8d13 in rt.deh2.terminate() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #1 0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
it throws and exception ...
> #2 0x080b04cc in
> _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
> (this=0x95ac0774, input=646197483453546546, prog=...)
> at /usr/include/dmd/phobos/std/regex.d:6348
.. inside of std.regex.match. But the thing is - we are doing it inside of a callback of C-library CURL (browse the call stack to curl_easy_perform). IT HAS NO IDEA what to do with exception hence the crash.
So the fix would be to insulate it with try/catch inside of that onRecieve callback.
> #3 0x080a09a2 in
> _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
> (__HID46=0x95ac0b18, re=..., input=646197483453546546) at
> /usr/include/dmd/phobos/std/regex.d:6540
> #4 0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #5 0xb769125a in std.net.curl.Curl.onReceiveHeader() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #6 0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #7 0xb72a5e7a in Curl_client_write () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #8 0xb72a4912 in Curl_http_readwrite_headers () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #9 0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4
> #10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4
> #11 0xb72be793 in curl_easy_perform () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #12 0xb7691093 in std.net.curl.Curl.perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #13 0xb768d8e1 in std.net.curl.HTTP._perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #14 0xb768d734 in std.net.curl.HTTP.perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #15 0x08081aac in
> _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa
> (client=..., sendData=579669917507256320,
> url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762
> #16 0x08081948 in
> _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa
> (conn=..., url=10576998119117946914)
> at /usr/include/dmd/phobos/std/net/curl.d:364