Bug 15513 – Memory Corruption with thread local objects

Status
RESOLVED
Resolution
FIXED
Severity
blocker
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2016-01-04T12:58:05Z
Last change time
2018-01-05T13:27:19Z
Assigned to
No Owner
Creator
Puneet Goel

Attachments

IDFilenameSummaryContent-TypeSize
1570memfault.tgzTarred C and D filesapplication/x-compressed-tar16478
1577memerr.tgzAnother testcase with just one threadapplication/x-compressed-tar13572

Comments

Comment #0 by puneet — 2016-01-04T12:58:05Z
Created attachment 1570 Tarred C and D files This is a strange memory corruption issue with thread local objects. I am able to recreate the issue only when I link in D code from inside C. There are 2 C files and 1 D file involved. I have also attached a makefile to assist with compilation process.
Comment #1 by puneet — 2016-01-20T02:54:54Z
I am getting hit by this issue often when I have static variables declared inside a function and the function is being called in multiple threads. Another observation is that the memory corruption happens if the static variable is allocated on the heap rather than stack. For example if I declare a static variable array with fixed length, then there will not be any issue. But if the static variable is a dynamically sized array which gets allocated on the heap on the first call to the function, it gets corrupted after a while. I am reducing another test case where the contents of a dynamic array are getting corrupted though the code writes into it only once. Since I work on a large project with almost 100000 lines of code, dustmite takes days to reduce.
Comment #2 by puneet — 2016-01-20T04:13:57Z
Even when I move the static variable to a class, I still get memory corruption. The following seem necessary to reproduce the issue: 1. Static variable that gets allocated on heap. 2. The variable has to be used in multiple threads. Also, as of now I am able to reproduce the issue only when a shared D library is loaded from C. But since the execution control is totally with D after that, I believe the memory corruption issue is more generic and could happen in standalone D applications as well.
Comment #3 by puneet — 2016-01-20T16:26:01Z
Adding another test case. This time I could replicate the issue within a single thread. Testcase has been attached. This issue has also been discussed in the forum thread http://forum.dlang.org/thread/[email protected]
Comment #4 by puneet — 2016-01-20T16:27:58Z
Created attachment 1577 Another testcase with just one thread
Comment #5 by ag0aep6g — 2016-01-20T18:44:41Z
Reduced this some: foo.d: ---- import std.stdio: writeln; import core.thread: Thread; void main() { writeln("Start frop from D"); frop(); } extern(C) void function() startup_routine = &initialize; extern(C) void initialize() { import core.runtime; Runtime.initialize; writeln("Start frop from C"); frop(); Runtime.terminate(); } void frop() { Thread bar = new Thread(&foo); bar.start(); bar.join(); } void foo() { proc(); new int; new int[10_000]; writeln(dash); } int[] dash; void proc () { dash.length = 8; dash[] = 0; } ---- main.c: ---- #include <dlfcn.h> typedef void (*routine_t)(void); int main(int argc, char*argv[]) { void* dll = dlopen("./foo.so", RTLD_LAZY); routine_t* routine = (routine_t*)dlsym(dll, "startup_routine"); (*routine)(); } ---- Compile and run: ---- dmd foo.d && ./foo # change the path for the -L-R option according to your setup dmd -fPIC -shared -offoo.so -L-ldl -L-lphobos2 -L-R$HOME/d/dmd2.git/linux/lib64 foo.d gcc -m64 -fPIC main.c -ldl -o main ./main ---- Output: ---- Start frop from D [0, 0, 0, 0, 0, 0, 0, 0] Start frop from C [-1488654016, 32761, 37363216, 0, 0, 0, 0, 0] ----
Comment #6 by puneet — 2016-03-09T17:51:12Z
Comment #7 by github-bugzilla — 2016-03-20T03:17:28Z
Commits pushed to master at https://github.com/D-Programming-Language/druntime https://github.com/D-Programming-Language/druntime/commit/09419a101116c0439bc16b5e898fe45fd1b553bd fix regression tests for fixed Issue 15513 - the existing tls GC test weren't working b/c the scanned stack region still contained references to the allocated values https://github.com/D-Programming-Language/druntime/commit/5fe4cd967d3df000a7d6e93bb3fc077b859b2e6b Merge pull request #1510 from MartinNowak/fixup1507 fix regression tests for fixed Issue 15513
Comment #8 by github-bugzilla — 2016-10-01T11:44:35Z
Comment #9 by github-bugzilla — 2018-01-05T13:27:19Z