Bug 11350 – libphobos2 regex match segfaults when a rare HTTP header is received

Status
RESOLVED
Resolution
WORKSFORME
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Linux
Creation time
2013-10-25T03:30:00Z
Last change time
2014-09-17T21:10:17Z
Keywords
pull
Assigned to
nobody
Creator
sha0

Comments

Comment #0 by sha0 — 2013-10-25T03:30:36Z
A simple std.net.curl.get() is performed to a remote host, which contains some rare http headers, (I don't define the onReceiveHeader callback) but the liphobos2 call to the default onReceiveHeader() which apply a regex to the header, and then crashes. I connect on this way: auto conn = HTTP(); conn.connectTimeout(dur!"seconds"(4)); conn.addRequestHeader("User-agent","Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0"); char[] html = get(url,conn); It seems the bug is at: /usr/include/dmd/phobos/std/regex.d line 6348 6537 public auto match(R, RegEx)(R input, RegEx re) 6538 if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R))) 6539 { 6540 return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input); 6541 } Maybe is an encoding problem, it seems the input is: >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46) da�H4STeF (gdb) bt #0 0xb76c8d13 in rt.deh2.terminate() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #1 0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #2 0x080b04cc in _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (this=0x95ac0774, input=646197483453546546, prog=...) at /usr/include/dmd/phobos/std/regex.d:6348 #3 0x080a09a2 in _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (__HID46=0x95ac0b18, re=..., input=646197483453546546) at /usr/include/dmd/phobos/std/regex.d:6540 #4 0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #5 0xb769125a in std.net.curl.Curl.onReceiveHeader() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #6 0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #7 0xb72a5e7a in Curl_client_write () from /usr/lib/i386-linux-gnu/libcurl.so.4 #8 0xb72a4912 in Curl_http_readwrite_headers () from /usr/lib/i386-linux-gnu/libcurl.so.4 #9 0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4 #10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4 #11 0xb72be793 in curl_easy_perform () from /usr/lib/i386-linux-gnu/libcurl.so.4 #12 0xb7691093 in std.net.curl.Curl.perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #13 0xb768d8e1 in std.net.curl.HTTP._perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #14 0xb768d734 in std.net.curl.HTTP.perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 #15 0x08081aac in _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa (client=..., sendData=579669917507256320, url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762 #16 0x08081948 in _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa (conn=..., url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:364
Comment #1 by dmitry.olsh — 2013-10-25T11:21:26Z
(In reply to comment #0) > > It seems the bug is at: > > /usr/include/dmd/phobos/std/regex.d line 6348 > > 6537 public auto match(R, RegEx)(R input, RegEx re) > 6538 if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R))) > 6539 { > 6540 return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input); > 6541 } > > Maybe is an encoding problem, it seems the input is: > >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46) > da�H4STeF > Would be nice to see what pattern that is and how exactly the argument to it looks like. I tried to reproduce with this: void main() { import std.regex; ubyte[] header = [0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46]; auto m = match(cast(char[]) header, regex("(.*?): (.*)$")); assert(m.empty); } I get: std.utf.UTFException@C:\dmd2\windows\bin\..\..\src\phobos\std\utf.d(1113): Invalid UTF-8 sequence (at index 1) No crashes. Now it may have to do with shared object / PIC code for all I know, as I'm testing on Win32. But w/o a smaller or at least complete reproduceble test-case there is nothing to work on.
Comment #2 by dmitry.olsh — 2013-10-25T11:40:08Z
(In reply to comment #0) > It seems the bug is at: No and I think I know what it is. > Maybe is an encoding problem, it seems the input is: > >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46) > da�H4STeF Yes, this is broken UTF-8 and hence... > > > > (gdb) bt > #0 0xb76c8d13 in rt.deh2.terminate() () from > /usr/lib/i386-linux-gnu/libphobos2.so.0.63 > #1 0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63 it throws and exception ... > #2 0x080b04cc in > _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch > (this=0x95ac0774, input=646197483453546546, prog=...) > at /usr/include/dmd/phobos/std/regex.d:6348 .. inside of std.regex.match. But the thing is - we are doing it inside of a callback of C-library CURL (browse the call stack to curl_easy_perform). IT HAS NO IDEA what to do with exception hence the crash. So the fix would be to insulate it with try/catch inside of that onRecieve callback. > #3 0x080a09a2 in > _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch > (__HID46=0x95ac0b18, re=..., input=646197483453546546) at > /usr/include/dmd/phobos/std/regex.d:6540 > #4 0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from > /usr/lib/i386-linux-gnu/libphobos2.so.0.63 > #5 0xb769125a in std.net.curl.Curl.onReceiveHeader() () from > /usr/lib/i386-linux-gnu/libphobos2.so.0.63 > #6 0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from > /usr/lib/i386-linux-gnu/libphobos2.so.0.63 > #7 0xb72a5e7a in Curl_client_write () from > /usr/lib/i386-linux-gnu/libcurl.so.4 > #8 0xb72a4912 in Curl_http_readwrite_headers () from > /usr/lib/i386-linux-gnu/libcurl.so.4 > #9 0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4 > #10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4 > #11 0xb72be793 in curl_easy_perform () from > /usr/lib/i386-linux-gnu/libcurl.so.4 > #12 0xb7691093 in std.net.curl.Curl.perform() () from > /usr/lib/i386-linux-gnu/libphobos2.so.0.63 > #13 0xb768d8e1 in std.net.curl.HTTP._perform() () from > /usr/lib/i386-linux-gnu/libphobos2.so.0.63 > #14 0xb768d734 in std.net.curl.HTTP.perform() () from > /usr/lib/i386-linux-gnu/libphobos2.so.0.63 > #15 0x08081aac in > _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa > (client=..., sendData=579669917507256320, > url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762 > #16 0x08081948 in > _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa > (conn=..., url=10576998119117946914) > at /usr/include/dmd/phobos/std/net/curl.d:364
Comment #3 by dmitry.olsh — 2014-01-07T07:26:45Z
@sha0coder Could you try with this fix: https://github.com/D-Programming-Language/phobos/pull/1842
Comment #4 by github-bugzilla — 2014-01-07T17:02:24Z
Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/3b6cc0cb73be19986ef1b8a30036227c98b37bb9 fix issue 11350 Do not throw on bad UTF inside of a C callback https://github.com/D-Programming-Language/phobos/commit/84dbc9934d4e0e72dc9ce138a0a0771666b51f26 Merge pull request #1842 from blackwhale/issue-11350 Fix issue 11350 ibphobos2 regex match segfaults when a rare HTTP header is received
Comment #5 by dmitry.olsh — 2014-09-17T21:10:17Z
Original problem was patched in std.net.curl long ago, so closing this.