Bug 10820 – curly brakets prevent inlining with DMD

Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-08-14T09:33:00Z
Last change time
2013-10-22T13:44:55Z
Keywords
performance, pull
Assigned to
nobody
Creator
monarchdodra

Comments

Comment #0 by monarchdodra — 2013-08-14T09:33:58Z
DMD 2.064 BETA DMD affected. gdc unaffected. ldc2 untested. I was doing some benchmarks on a very tight loop, and I discovered that when a branch is "curly" enclosed, then it prevents inlining: Here are 4 equivalent functions: uint foo1(char c) @safe pure nothrow { if (c < 0x80) return 1; else return 2; } uint foo2(char c) @safe pure nothrow { if (c < 0x80) { return 1; } else { return 2; } } uint foo3(char c) @safe pure nothrow { if (c < 0x80) return 1; else return foo3impl(c); } uint foo3impl(char c) @safe pure nothrow { return 2; } uint foo4(char c) @safe pure nothrow { if (c < 0x80) { return 1; } else return foo4impl(c); } uint foo4impl(char c) @safe pure nothrow { return 2; } And a program then benches them: //---- import std.stdio, std.datetime; enum N = 5000; void main() { char c = 'a'; StopWatch st1; StopWatch st2; StopWatch st3; StopWatch st4; immutable len = 1000; foreach(_ ; 0 .. N) { for (size_t i ; i < len ; ) { i += foo1(c); i += foo2(c); i += foo3(c); i += foo4(c); } } foreach(K ; 0 .. 10) { st1.start; foreach(_ ; 0 .. N) { size_t i = 0; while(i != len) i += foo1(c); } st1.stop; st2.start; foreach(_ ; 0 .. N) { size_t i = 0; while(i != len) i += foo2(c); } st2.stop; st3.start; foreach(_ ; 0 .. N) { size_t i = 0; while(i != len) i += foo3(c); } st3.stop; st4.start; foreach(_ ; 0 .. N) { size_t i = 0; while(i != len) i += foo4(c); } st4.stop; } writefln("foo1: %sms.", st1.peek.msecs); writefln("foo2: %sms.", st2.peek.msecs); writefln("foo3: %sms.", st3.peek.msecs); writefln("foo4: %sms.", st4.peek.msecs); } //---- When compiled with DMD without -inline: foo1: 2338ms. foo2: 2337ms. foo3: 2333ms. foo4: 2337ms. when compiled with DMD with -O -inline: foo1: 282ms. foo2: 2244ms. foo3: 282ms. foo4: 2246ms. This is very strange, as foo1 and foo2 are *strictly* equivalent, save for some curlies, and so are foo3 and foo4. As a matter of fact, in my original usecase, I got better performance by cascading function calls to remove curlies, rather than have blocks in my ifs. I don't have any proof, but I'd be willing to bet this is a cause for *major* performance issues for DMD.
Comment #1 by dmitry.olsh — 2013-08-14T10:21:17Z
> I was doing some benchmarks on a very tight loop, and I discovered that when a branch is "curly" enclosed, then it prevents inlining: It's far simpler then that - take a look at inline.c and observe that it only ever inlines if/else that immediately containt return statement. Braces turn return statement into a block hence destroying this hack. What needs to be done is to IMPLEMENT inlining of if/else/switch/while statements. One general way to do that is to treat all statements as expressions (e.g. yielding void) inside of the compiler.
Comment #2 by k.hara.pg — 2013-10-11T01:12:11Z
https://github.com/D-Programming-Language/dmd/pull/2654 With patched dmd, the OP code output is: $ dmd -run test.d DMD v2.064 DEBUG foo1: 426ms. foo2: 454ms. foo3: 391ms. foo4: 401ms. $ dmd -O -inline -run test.d DMD v2.064 DEBUG foo1: 67ms. foo2: 35ms. foo3: 35ms. foo4: 33ms.
Comment #3 by monarchdodra — 2013-10-11T01:26:06Z
(In reply to comment #2) > https://github.com/D-Programming-Language/dmd/pull/2654 > > With patched dmd, the OP code output is: > > $ dmd -run test.d > DMD v2.064 DEBUG > foo1: 426ms. > foo2: 454ms. > foo3: 391ms. > foo4: 401ms. > > $ dmd -O -inline -run test.d > DMD v2.064 DEBUG > foo1: 67ms. > foo2: 35ms. > foo3: 35ms. > foo4: 33ms. Most awesome! I was looking at the pull you submitted. It seems to only deal with "ReturnStatement". This is already
Comment #4 by monarchdodra — 2013-10-11T04:07:58Z
(In reply to comment #2) > https://github.com/D-Programming-Language/dmd/pull/2654 > > With patched dmd, the OP code output is: > > $ dmd -run test.d > DMD v2.064 DEBUG > foo1: 426ms. > foo2: 454ms. > foo3: 391ms. > foo4: 401ms. > > $ dmd -O -inline -run test.d > DMD v2.064 DEBUG > foo1: 67ms. > foo2: 35ms. > foo3: 35ms. > foo4: 33ms. Most awesome! I will stress test your pull with variations of my above code, and see what I can report. Thank you very much.
Comment #5 by github-bugzilla — 2013-10-22T13:44:41Z
Commits pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/0c99632eea96de66926c5a2d8aca650992aabf34 fix Issue 10820 - curly brakets prevent inlining with DMD https://github.com/D-Programming-Language/dmd/commit/0962e2eaefa516e09a302ad85f84bdbca69be1b0 Merge pull request #2654 from 9rnsr/fix10820 Issue 10820 - curly brakets prevent inlining with DMD