Bug 14863 – CLOCK_BOOTTIME should be optional to support <2.6.39 kernels
Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
Other
OS
Linux
Creation time
2015-08-02T09:54:00Z
Last change time
2015-08-05T00:52:14Z
Assigned to
schveiguy
Creator
code
Comments
Comment #0 by code — 2015-08-02T09:54:31Z
Currently crashes every D binary on an older kernel.
We might reconsider to lazily initialize the resolutions.
http://forum.dlang.org/post/[email protected]
MonoTime is the replacement for TickDuration and it's initialized from the runtime initialization function (rt_init). This is because the GC and others may need time functionality.
Unfortunately, it looks like MonoTime does not currently support your kernel version. It needs at least Linux 2.6.39. The reason being is that it has the CLOCK_BOOTTIME clock which was implemented in Linux 2.6.39. Without this clock, the minimum version would be Linux 2.6.32.
Comment #1 by issues.dlang — 2015-08-02T23:33:29Z
Great... People are still using kernels that old? This sort of problem didn't even occur to me. We really aren't set up to have different C headers depending on the kernel version and the like, and we're certainly not set up to support different ClockTypes depending on your kernel version - though in theory, it's supposed to be set up so that if you don't explicitly use a particular ClockType, it wouldn't matter.
The GC and runtime should only care about MonoTime itself - i.e. MonoTimeImpl!(ClockType.normal) and not any of the other instantiations of MonoTimeImpl. I guess that in one of the rounds of changes that we made relating to static constructor bugs must have resulted in the clock frequency being grabbed for all of the ClockTypes that the system supports (or at least supposedly supports). *sigh*
The simplest solution of course would be to simply get rid of ClockType.bootTime, though that definitely would suck, and I don't know how we'd decide when to readd it (I really don't want to have start worrying about kernel versions). Other than that, presumably, we have to make sure that none of the ClockTypes other than ClockType.normal get used in any way unless someone uses them in their own program.
I guess that I'll have to find time to dig into this ASAP.
Comment #2 by schveiguy — 2015-08-03T12:20:46Z
According to clock_getres, it should return -1 with a value of EINVAL for errno. So this seems like an issue with the kernel. I've never heard of a system call causing a segfault like this.
However, I noticed that we have an assert(0) if that happens. This shouldn't be the way it's handled. If we get the "Invalid clock" return value, we should only assert if someone actually tries to USE that clock (put in some sort of sentinel value for the resolution).
In addition, we should be checking the kernel version and only doing CLOCK_BOOTTIME, if the kernel version is high enough (and skipping otherwise, put in the sentinel value). I still think lazy initialization is incorrect.
Jonathan, I see it's assigned to you, I can look into this if you want.
(In reply to Jonathan M Davis from comment #1)
> I guess that in one of the rounds of changes that we made
> relating to static constructor bugs must have resulted in the clock
> frequency being grabbed for all of the ClockTypes that the system supports
> (or at least supposedly supports). *sigh*
If you recall, it was because of the issue with static constructors being used in templates cause any importing module to be flagged as containing static constructors. So we now construct all the resolutions eagerly (which shouldn't cause issues like this).
I think lazy initialization is not what we need, when someone uses an "unsupported" clock, they should not get a segfault either. Neither would the original solution of only eagerly fetching clocks that are instantiated (the segfault would then occur in a static ctor). The only correct answer is to eliminate the segfault.
Comment #3 by schuetzm — 2015-08-03T14:01:05Z
`assert(0);` becomes a `HTL` instruction with -release, which in turn triggers SIGSEGV. That's probably the cause.
Comment #4 by schveiguy — 2015-08-03T14:12:06Z
(In reply to Marc Schütz from comment #3)
> `assert(0);` becomes a `HTL` instruction with -release, which in turn
> triggers SIGSEGV. That's probably the cause.
OH! then this is easily fixable.
I thought an assert(0) in release mode still did an assert. We just need to change how we treat that specific failure. Interestingly, assert(0, somemessage) seems to be a waste of somemessage. We may want to consider what assert(0) should do (another topic).
Comment #5 by jansen.gerald — 2015-08-03T14:27:45Z
(In reply to Marc Schütz from comment #3)
> `assert(0);` becomes a `HTL` instruction with -release, which in turn
> triggers SIGSEGV. That's probably the cause.
Please note that the segfault occurs even without -release, i.e. "dmd hello.d".
Comment #6 by schveiguy — 2015-08-03T14:36:04Z
(In reply to Gerald Jansen from comment #5)
> Please note that the segfault occurs even without -release, i.e. "dmd
> hello.d".
Right, but druntime *is* compiled in release mode. If you compiled druntime in non-release mode, you would see the assert error instead.
Are we sure this bug is fixed? (I've built dmd from git head.) My older kernel still segfaults when initializing MonoTime:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004ad881 in _d_initMonoTime ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.7.x86_64
(gdb) bt
#0 0x00000000004ad881 in _d_initMonoTime ()
#1 0x72656b616d6b756c in ?? ()
#2 0x0000000000000008 in ?? ()
#3 0x00000000006d9fc0 in ?? ()
#4 0x0000000000000000 in ?? ()
$ cat /proc/version
Linux version 2.6.32-504.23.4.el6.x86_64
$ dmd --version
DMD64 D Compiler v2.068-devel-286906b
Is there a workaround?
Thank you,
Ali