split from bug 859:
Leandro Lucarella 2010-06-27 18:49:49 PDT
I'm having some performance problems moving some stuff from a lower-level
C-style to a higher-lever D-style. Here is an example:
---
int find_if(bool delegate(ref int) predicate)
{
for (int i = 0; i < 100; i++)
if (predicate(i))
return i;
return -1;
}
int main()
{
// for (int i = 0; i < 100; i++)
// if (i == 99)
// return i;
// return -1;
return find_if((ref int i) { return i == 99; });
}
---
The program produced by this source executes 4 times more instructions than the
more direct (lower-level) version commented out. I would expect DMD to inline
all functions/delegates and produce the same asm for both, but that's not the
case.
This is a reduced test-case, but I'm working on improving the GC and I'm really
hitting this problem. If I use this higher-level style in the GC, a Dil run for
generating the Tango docs is 3.33 times slower than the C-ish style used by the
current GC.
So I think this is a real problem for D, it's really important to be able to
encourage people to use the higher-level D constructs.
Comment #1 by leandro.lucarella — 2010-07-09T08:38:57Z
I'm not very keen at reading asm, but I'm seeing a couple of 'call's in the generated code:
---
$ dmd -release -O -gc -inline -c test.d
$ objdump -d test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <.text>:
0: c3 ret
1: 60 pusha
2: b8 38 00 00 00 mov $0x38,%eax
7: b9 00 00 00 00 mov $0x0,%ecx
c: 8b 11 mov (%ecx),%edx
e: 89 10 mov %edx,(%eax)
10: 89 01 mov %eax,(%ecx)
12: 61 popa
13: c3 ret
Disassembly of section .text._D4test7find_ifFDFKiZbZi:
00000000 <_D4test7find_ifFDFKiZbZi>:
0: 55 push %ebp
1: 8b ec mov %esp,%ebp
3: 83 ec 0c sub $0xc,%esp
6: 53 push %ebx
7: 8b 55 0c mov 0xc(%ebp),%edx
a: 8b 45 08 mov 0x8(%ebp),%eax
d: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
14: 89 d3 mov %edx,%ebx
16: 8d 4d fc lea -0x4(%ebp),%ecx
19: 8b 45 08 mov 0x8(%ebp),%eax
1c: 51 push %ecx
1d: ff d3 call *%ebx
1f: 84 c0 test %al,%al
21: 74 0a je 2d <_D4test7find_ifFDFKiZbZi+0x2d>
23: 8b 45 fc mov -0x4(%ebp),%eax
26: 5b pop %ebx
27: 8b e5 mov %ebp,%esp
29: 5d pop %ebp
2a: c2 08 00 ret $0x8
2d: ff 45 fc incl -0x4(%ebp)
30: 83 7d fc 64 cmpl $0x64,-0x4(%ebp)
34: 7c e0 jl 16 <_D4test7find_ifFDFKiZbZi+0x16>
36: 5b pop %ebx
37: 8b e5 mov %ebp,%esp
39: b8 ff ff ff ff mov $0xffffffff,%eax
3e: 5d pop %ebp
3f: c2 08 00 ret $0x8
42: 90 nop
43: 90 nop
Disassembly of section .text._Dmain:
00000000 <_Dmain>:
0: 55 push %ebp
1: 8b ec mov %esp,%ebp
3: b9 00 00 00 00 mov $0x0,%ecx
8: 51 push %ecx
9: 31 c0 xor %eax,%eax
b: 50 push %eax
c: e8 fc ff ff ff call d <_Dmain+0xd>
11: 5d pop %ebp
12: c3 ret
13: 90 nop
Disassembly of section .text._D4test4mainFZi12__dgliteral1MFKiZb:
00000000 <_D4test4mainFZi12__dgliteral1MFKiZb>:
0: 55 push %ebp
1: 8b ec mov %esp,%ebp
3: 50 push %eax
4: 8b 4d 08 mov 0x8(%ebp),%ecx
7: b8 01 00 00 00 mov $0x1,%eax
c: 83 39 63 cmpl $0x63,(%ecx)
f: 74 02 je 13 <_D4test4mainFZi12__dgliteral1MFKiZb+0x13>
11: 31 c0 xor %eax,%eax
13: 8b e5 mov %ebp,%esp
15: 5d pop %ebp
16: c2 04 00 ret $0x4
---
Am I understanding it all wrong?
Comment #2 by braddr — 2010-07-11T10:13:14Z
Reducing the test case further:
extern(C) int printf(const char*, ...);
bool if_delegate(int i, bool delegate(ref int) predicate)
{
return predicate(i);
}
bool if_nodelegate(int i)
{
return i == 99;
}
void main()
{
foreach(i; 1 .. 100)
if (if_delegate(i, (ref int j) { return j == 99; }))
printf("i = %d\n", i);
foreach(i; 1 .. 100)
if (if_nodelegate(i))
printf("i = %d\n", i);
}
In the previous version of the code, the use of the for loops makes the cost of the functions too high for the inliner to bother inlining find_if into main. Also,in the non-inlined version of the function, there's no way for it to inline the delegate since it's a variable, not a constant thing.
This changed version reduces the problem to: will it inline the delegate? The answer is still no, but allows that specific problem to be explored.
Leandro, does this stray too far away from the underlying code that led you to file this bug?
Comment #3 by braddr — 2010-07-11T11:19:18Z
Ok.. still doesn't inline the delegate in this version:
extern(C) int printf(const char*, ...);
void main()
{
auto d = (ref int j) { return j == 99; };
foreach(i; 1 .. 100)
if (d(i))
printf("i = %d\n", i);
}
There's nothing in the dmd frontend inliner that even tries to deal with this sort of inlining right now.
Comment #4 by leandro.lucarella — 2010-07-11T11:27:57Z
(In reply to comment #2)
> Reducing the test case further:
> extern(C) int printf(const char*, ...);
>
> bool if_delegate(int i, bool delegate(ref int) predicate)
> {
> return predicate(i);
> }
>
> bool if_nodelegate(int i)
> {
> return i == 99;
> }
>
> void main()
> {
> foreach(i; 1 .. 100)
> if (if_delegate(i, (ref int j) { return j == 99; }))
> printf("i = %d\n", i);
>
> foreach(i; 1 .. 100)
> if (if_nodelegate(i))
> printf("i = %d\n", i);
> }
>
> In the previous version of the code, the use of the for loops makes the cost of
> the functions too high for the inliner to bother inlining find_if into main.
> Also,in the non-inlined version of the function, there's no way for it to
> inline the delegate since it's a variable, not a constant thing.
>
> This changed version reduces the problem to: will it inline the delegate? The
> answer is still no, but allows that specific problem to be explored.
>
> Leandro, does this stray too far away from the underlying code that led you to
> file this bug?
Yes, the loop is important, because this happens a lot with opApply(), which almost always includes a loop, and when I use (x having an opApply() method):
foreach (a; x) {
// some code
}
I almost always want // some code to be inlined, since it's not even a function (from the POV of the programmer), but maybe that should be another bug report, I really don't know.
Comment #5 by braddr — 2010-07-11T11:31:38Z
Ok. I'll keep that in mind. One step at a time for now though; need to get _any_ delegate inlining to happen before worrying about popping up the layers up to an opApply function.
Comment #6 by leandro.lucarella — 2010-07-11T13:22:56Z
(In reply to comment #5)
> Ok. I'll keep that in mind. One step at a time for now though; need to get
> _any_ delegate inlining to happen before worrying about popping up the layers
> up to an opApply function.
Agreed, thanks for giving it some time to this :)
Comment #7 by braddr — 2010-07-11T14:39:39Z
Created attachment 690
add support for inlining function literals
This patch adds support for inlining function literals. This won't cover many cases as most hide behind a variable, but it's a building block in the right direction:
The test case for this patch.. very contrived:
extern(C) int printf(const char*, ...);
void main()
{
int i = 99;
if ((delegate(ref int j) { return j == 99; })(i))
printf("i = %d\n", i);
}
same result if the keyword 'delegate' is omitted.
It passes an abbreviated (only used ARGS="-O -inline -release") run of the test suite.
Comment #8 by dsimcha — 2010-08-29T13:55:31Z
Blocks 4264 because I need delegate literal inlining to get decent performance out of opApply-based map, filter, etc.
Comment #9 by dsimcha — 2010-08-29T14:26:50Z
I just tried this patch out and on second thought I'm not sure it's worth integrating as-is because it's so limited. It won't even handle the case where a function that calls a delegate gets inlined and the delegate passed from the call site is a literal. Even the following doesn't get inlined:
bool evaluateDelegate(bool delegate() dg) {
return dg();
}
void main() {
evaluateDelegate({return false;});
}
Here's the disassembly of main():
__Dmain PROC NEAR
; COMDEF __Dmain
sub esp, 24 ; 0000 _ 83. EC, 18
xor eax, eax ; 0003 _ 31. C0
mov ecx, offset _D5test94mainFZv12__dgliteral1MFZb; 0005 _ B9, 00000000(segrel)
push ebx ; 000A _ 53
mov edx, ecx ; 000B _ 8B. D1
mov dword ptr [esp+4H], eax ; 000D _ 89. 44 24, 04
mov eax, dword ptr [esp+4H] ; 0011 _ 8B. 44 24, 04
mov ebx, dword ptr [esp+4H] ; 0015 _ 8B. 5C 24, 04
mov dword ptr [esp+8H], ecx ; 0019 _ 89. 4C 24, 08
call edx ; 001D _ FF. D2
xor eax, eax ; 001F _ 31. C0
pop ebx ; 0021 _ 5B
add esp, 24 ; 0022 _ 83. C4, 18
ret ; 0025 _ C3
__Dmain ENDP
Comment #10 by braddr — 2010-08-29T14:31:55Z
Agreed. Getting delegate inlining in dmd is going to take serious work. The problem is that the inliner is no where near a const propagation pass. It needs to know that the variables in play have a single, unchangeable, value.
At the point of inlining, there's just a variable with an unknown -- to the compiler -- value.
So, while this change does help that one narrow use case, it's of almost no practical value.
Comment #11 by bugzilla — 2015-08-23T02:48:58Z
You're right, Brad. Fixing this involves doing constant propagation in the front end. It's possible to do it, as constant propagation is the easiest of the data flow optimizations to do.
Comment #12 by robert.schadek — 2024-12-13T17:52:35Z