Bug 318 – wait does not release thread resources on Linux

Status
RESOLVED
Resolution
FIXED
Severity
blocker
Priority
P2
Component
phobos
Product
D
Version
D1 (retired)
Platform
All
OS
Linux
Creation time
2006-09-02T13:58:00Z
Last change time
2014-02-15T13:29:08Z
Assigned to
braddr
Creator
mclysenk
Blocks
322

Attachments

IDFilenameSummaryContent-TypeSize
197thread.diffProposed fix for phobos 1.x - v1text/plain3523
198thread.diffpatch v2text/plain13102

Comments

Comment #0 by mclysenk — 2006-09-02T13:58:49Z
While wait is supposed to release a thread's resources, it will fail if the thread has already completed. This makes it impossible to use more than 400 threads reliably. Here is an example which demonstrates the problem: import std.stdio, std.thread; void main() { for(int i=0; i<80000; i++) { writefln("Creating thread %d", i); Thread t = new Thread({writefln(" Created!"); return 0;}); t.start; for(int x=0; x<1000; x++) Thread.yield; t.wait; writefln(" Finished."); } } Within a few hundred iterations, this code will likely produce a "failed to start" error. From my testing, this issue only affects Linux. So far, there are no workarounds.
Comment #1 by sean — 2006-09-02T14:25:17Z
[email protected] wrote: > http://d.puremagic.com/issues/show_bug.cgi?id=318 > > Summary: wait does not release thread resources on Linux > Product: D > Version: 0.165 > Platform: All > OS/Version: Linux > Status: NEW > Severity: blocker > Priority: P2 > Component: Phobos > AssignedTo: [email protected] > ReportedBy: [email protected] > > > While wait is supposed to release a thread's resources, it will fail if the > thread has already completed. This makes it impossible to use more than 400 > threads reliably. Here is an example which demonstrates the problem: > > > import std.stdio, std.thread; > > void main() > { > for(int i=0; i<80000; i++) > { > writefln("Creating thread %d", i); > Thread t = new Thread({writefln(" Created!"); return 0;}); > t.start; > for(int x=0; x<1000; x++) > Thread.yield; > t.wait; > writefln(" Finished."); > } > } > > > Within a few hundred iterations, this code will likely produce a "failed to > start" error. From my testing, this issue only affects Linux. I think line 667 of thread.d should be changed from: if (state == TS.RUNNING) to: if (state != TS.INITIAL) Because it is not only legal to call pthread_join on a thread that has run and finished, but calling pthread_join or pthread_detach is required for the thread resources to be released. However, it is illegal to call pthread_join more than once, and I believe it is also illegal to detach a thread that has already been joined, so 'id' should probably be cleared after join/detach is called, and this value tested along with 'state' before performing thread ops. As an unrelated issue, I just noticed that CloseHandle is not being called on the thread handle for Win32, and pthread_detach is not being called for Posix. I think this should be done in a thread dtor or the equivalent to ensure resources are properly released. Sean
Comment #2 by chris — 2006-11-19T19:05:14Z
> As an unrelated issue, I just noticed that CloseHandle is not being > called on the thread handle for Win32, and pthread_detach is not being > called for Posix. I think this should be done in a thread dtor or the > equivalent to ensure resources are properly released. > Yes, this is very important. This is a huge bug. Sometimes one uses "throwaway" threads that just do one thing and terminate. Currently, it will cause a huge leak and potential errors.
Comment #3 by braddr — 2007-10-19T23:52:25Z
Created attachment 197 Proposed fix for phobos 1.x - v1 I've run this through a bit of testing of this diff, both 1.x and 2.x, using the provided example test case and a few variations of my own. (so far just on linux, but I'll test on windows shortly). I can no longer reproduce the problem. That said, threading problems are notoriously difficult to be sure about. I'd appreciate it if some of you could take a look and hopefully even build your own phobos and do some testing. I need to think a little bit more about the running -> terminated -> finished transition steps a bit to make sure it's safe in all cases. I really would prefer not to have to make state management synchronized. Thanks, Brad
Comment #4 by braddr — 2007-10-21T05:54:04Z
Created attachment 198 patch v2 Further testing showed race conditions between the gc and the thread library so I went ahead with the conservative approach. I'm not happy with this many sync points, but my test cases no longer show any problems.
Comment #5 by bugzilla — 2007-11-03T21:41:04Z
Fixed dmd 1.023 and 2.007