Bug 6014 – rt_finalize Segmentation fault , dmd 2.053 on linux & freebsd

Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
Other
OS
Linux
Creation time
2011-05-15T23:13:00Z
Last change time
2013-10-09T21:41:30Z
Assigned to
nobody
Creator
changlon

Comments

Comment #0 by changlon — 2011-05-15T23:13:11Z
I build my projects on linux64 & freebsd32 . the same runtime error keep touble me . Program received signal SIGSEGV, Segmentation fault. 0x00000000004ba73f in rt_finalize () . The dmd version is 2.053 release, I remove all dtor "~this()" from my code but the error still exists. I have no idea how to reduce the example, I just sure the error is throw when i call Parse.parse . http://gool.googlecode.com/files/jade_dtor_bug.tar.bz2
Comment #1 by changlon — 2011-05-16T02:21:42Z
I notice that if throw exception in dtor will cause some problem , but in this case is no exception, and also no dtor exists . the package I post here dtor is still exists, you can remove them and test agian (util.pool.dtor, jade.Compiler.dtor).
Comment #2 by changlon — 2011-05-16T18:48:44Z
The same code work fine in Win32, the runtime rt_finalize error is since dmd 2.052. The Win32 dmd 2.052 has same problem , but fiexd in dmd 2.053 .
Comment #3 by changlon — 2011-05-27T03:13:15Z
I have no idea how to reduce this test case or how to trace the bug. I build the project with -g -debug, then run gdb . But the error is not in the project code . it is in the druntime. Can anybody tell me how to build a debug version druntime lib ?
Comment #4 by schveiguy — 2011-05-27T04:25:16Z
(In reply to comment #3) > Can anybody tell me how to build a debug version druntime lib ? In posix.mak, change flags: DFLAGS=-gc -Isrc -Iimport -nofloat -d -w UDFLAGS=$(DFLAGS) make -f posix.mak then rebuild phobos, and copy the phobos library into your lib dir. Probably want to build phobos in debug mode as well. I'm actually surprised you have to edit the makefile, it should be easier...
Comment #5 by changlon — 2011-07-13T21:34:15Z
@Steven Schveighoffer thank you . I update my dmd to 2.054, and build the debug libphobos2 on linux , use gdb catch this error . --------------------------------------------------------------------- Program received signal SIGSEGV, Segmentation fault. 0x00000000004cd10c in rt.lifetime.rt_finalize (p=0x7ffff729d000, det=false) at src/rt/lifetime.d:1154 1154 ClassInfo c = **pc; ----------------------------------------------------------------------
Comment #6 by schveiguy — 2011-07-14T04:47:25Z
A stack trace would help.
Comment #7 by changlon — 2011-07-14T07:40:42Z
Hi Steven Schveighoffer , I import core.runtime, but the stack trace is not auto printed . Can you tell how to print the stack trace ?
Comment #8 by schveiguy — 2011-07-14T09:45:27Z
I meant a stack trace from gdb... use bt I think.
Comment #9 by changlon — 2011-07-14T18:12:04Z
Starting program: /web/www/tmp/jade/jade2test [Thread debugging using libthread_db enabled] f = 0x4fc4b0,32, t = 0x713030,32, size = 1 f = 0x4fed20,176, t = 0x7ffff7ed5f00,176, size = 1 f = 0x4ff200,72, t = 0x71f490,72, size = 1 f = 0x4f8c10,64, t = 0x7ffff7ed8fc0,64, size = 1 f = 0x4f8d00,64, t = 0x7ffff7ed8f80,64, size = 1 f = 0x4f0880,12, t = 0x7ffff7ed9ff0,12, size = 1 f = 0x4f3920,56, t = 0x7ffff7ed8f00,56, size = 1 f = 0x4f3920,56, t = 0x7ffff7ed8ec0,56, size = 1 1 times use time = 1ms Program received signal SIGSEGV, Segmentation fault. 0x00000000004cda08 in rt.lifetime.rt_finalize (p=0x7ffff729d000, det=false) at src/rt/lifetime.d:1154 1154 ClassInfo c = **pc; (gdb) bt #0 0x00000000004cda08 in rt.lifetime.rt_finalize (p=0x7ffff729d000, det=false) at src/rt/lifetime.d:1154 #1 0x00000000004cb1de in gc.gcx.Gcx.fullcollect (this=0x713060, stackTop=0x7fffffffe260) at src/gc/gcx.d:2631 #2 0x00000000004caaf3 in gc.gcx.Gcx.fullcollectshell (this=0x713060) at src/gc/gcx.d:2391 #3 0x00000000004c902b in gc.gcx.GC.fullCollectNoStack (this=0x713030) at src/gc/gcx.d:1329 #4 0x00000000004c721d in gc.gc.gc_term () at src/gc/gc.d:133 #5 0x00000000004abc4d in rt.dmain2.main.runAll (this=0x7fffffffe4a0) at src/rt/dmain2.d:515 #6 0x00000000004ab6f5 in rt.dmain2.main.tryExec (this=0x7fffffffe4a0, dg=0x00000000004abbdc00007fffffffe4a0) at src/rt/dmain2.d:471 #7 0x00000000004ab684 in rt.dmain2.main (argc=1, argv=0x7fffffffe588) at src/rt/dmain2.d:518
Comment #10 by schveiguy — 2011-07-15T05:49:10Z
So here is what I can learn from this information: 1. The crash is happening on the final collection cycle when the runtime is shutting down. 2. The memory block (pointer value 0x7ffff729d000) is marked as having a finalizer. 3. The memory block being collected does not have a valid classinfo pointer (which resides at the very beginning of the block), which means either: a. It's not really a class, and is incorrectly marked as having a finalizer or b. The pointer has somehow been corrupted. The issue with a problem like this is, the corruption could happen anywhere. Given that dtors allocating memory has now been disallowed by 2.054 (a known cause of corruption), I don't think your code could be doing that. So that leaves examining your code for incorrect memory operations. I don't really have time to look through your code, but I'd recommend looking suspiciously at things where casts are used, or where you are using raw pointers. One other thing is to add (or uncomment) some druntime debug printf statements -- print out the classinfo name and addresses for memory blocks being allocated. That at least should tell you what the *original* type was being allocated for the failed memory block. Sometimes this is the only way to debug such corruption issues.
Comment #11 by changlon — 2011-07-17T19:39:01Z
I do not understand the mechanism of druntime , this problem has troubled me for a long time . According to my simple understanding, the following code does not cover up the failure, But in fact it failed. Is I got it wrong or druntime has a bug? ------------------------------------------------------ import core.memory; void main(){ auto attr = cast(GC.BlkAttr) 0b1 ; auto test = GC.malloc(10, attr); GC.setAttr(test, 0); auto _attr = GC.getAttr(test); assert(attr != _attr); } ----------------------------------------------------- I use GC.malloc and GC.realloc to speed up the memory alloc, A memory block attr has be changed before exit main function .
Comment #12 by changlon — 2011-08-08T19:46:06Z
I think the bug is not because class cast . in the code i have Token and ASTNode, The Token is struct and ASTNode is class. if I apply them both on heap then it working fine, if I only apply ASTNode on heap the problem is still there , If I apply only Token on heap the test is work fine . I storage the pointer of Token struct on a global pointer, and print it before exit main function, find it is diffent , that mean the Pool.data has been moved , and after exit main I got a Segmentation . I simply do nothing but just change the struct Token to class Token, they still apply on pool but not heap, The problem is not exists anymore. So, I guess this is not a cast(class) issue, It is a struct issue, and related with druntime GC . the Segmentation is very rare, If i change a lite things on example.jade, the Segmentation will not exists . If i apply stuct Token on heap the performance will be very bad, It will cause 100 times than not apply on heap . After several months of debug and test, I finally resolved this problem . Thanks a lot for Steven Schveighoffer help . my problem is not exists by switch stuck Token to class Token, But I believe there is also a hidden an druntime GC bug, So I will not close this bug .
Comment #13 by changlon — 2011-08-08T19:57:26Z
The struct implement with issue test case : http://gool.googlecode.com/files/jade_dtor_bug.tar.bz2 the class implement without issue test case : http://gool.googlecode.com/files/jade_dtor_bug_fixed.tar.bz2 I realy can't reduce the test case, because it is a runtime issue . jade is a web view template compiler, like http://www.smarty.net/ for php . jade will convert jade template language to d source for web deveplop purpose. the hark part is if i change any jade view template source ( example.jade) the isuse will not exists, so I really do not know how to reduce this test case .
Comment #14 by dsimcha — 2011-08-13T13:09:09Z
FWIW this is where/how/why the sporadic segfaults in the std.parallelism unittests on Linux and FreeBSD that the auto tester keeps flagging are occurring.
Comment #15 by sean — 2011-08-19T14:35:28Z
------------------------------------------------------ import core.memory; void main(){ auto attr = cast(GC.BlkAttr) 0b1 ; auto test = GC.malloc(10, attr); GC.setAttr(test, 0); auto _attr = GC.getAttr(test); assert(attr != _attr); } ----------------------------------------------------- The above code is broken. GC.setAttr sets a flag. It basically does "flags |= newflag", so setAttr(x,0) will leave the flags unchanged. What you want to do in the setAttr line is call GC.clrAttr(test, attr).
Comment #16 by clugdbug — 2011-09-06T01:24:59Z
*** Issue 5766 has been marked as a duplicate of this issue. ***
Comment #17 by code — 2011-09-13T17:13:13Z
It could as well be a double finalization. The vtable pointer is cleared when calling rt_finalize on a class. There is also a deterministic bug happening due to an oversight in the finalization design. Finalization is done in memory order and does not take hierarchies into account. --- class A { ~this() {} void cleanup() {} } class B { this(A a) { this.a = a; } ~this() { a.cleanup(); } A a; } void main() { auto a = new A(); auto b = new B(a); // allocating a at a lower address than b causes it to be finalized earlier assert(cast(void*)b.a < cast(void*)b); } --- When b.a is finalized before b it's vtable is set to null, hence the segfault at accessing the classinfo. It seems like we need to somehow sort the to be finalized memory while scanning. Any cheap ideas to do that are welcome.
Comment #18 by code — 2011-09-13T17:52:15Z
@changlon You won't like the cause of your bug. All fields in a struct are default initialized. Pointers with null, Integrals with 0, Floats with NaN and enums with the first enum member. enum BlkAttr : uint { FINALIZE = 0b0000_0001, /// Finalize the data in this block on collect. NO_SCAN = 0b0000_0010, /// Do not scan through this block on collect. NO_MOVE = 0b0000_0100, /// Do not move this memory block on collect. APPENDABLE = 0b0000_1000, /// This block contains the info to allow appending. NO_INTERIOR = 0b0001_0000 } That means the attr flag in your memory pool is always set to BlkAttr.FINALIZE. Every GC.malloc you do will get a wrong finalization. It can be avoided this by giving a default value to the field. GC.BlkAttr attr = cast(GC.BlkAttr)0; Arguably this could be the default member in BlkAttr. I will close this bug and open a new one for the order of class finalization.
Comment #19 by schveiguy — 2011-09-14T05:22:43Z
(In reply to comment #17) > There is also a deterministic bug happening due to an oversight in the > finalization design. Finalization is done in memory order and does not take > hierarchies into account. Just to clarify as you discovered in your new bug report, this is by design -- a destructor cannot rely on any heap-allocated data being present. A concept in many GC-based languages is to have two "destructors", one which is only ever called synchronously, and one that can be called asynchronously by the GC. The synchronous one always calls the asynchronous one. This is sometimes called a finalizer (and in fact, ~this is a finalizer).
Comment #20 by sean — 2011-09-14T13:51:11Z
I've added BlkAttr.NONE as a default for this enum. Seems like an easy way to avoid weird errors like this.
Comment #21 by code — 2013-10-09T21:41:30Z