Bug 12442 – inefficient code with scope(exit)

Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P5
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-03-23T01:55:50Z
Last change time
2020-03-21T03:56:42Z
Keywords
performance
Assigned to
No Owner
Creator
Rainer Schuetze

Comments

Comment #0 by r.sagitario — 2014-03-23T01:55:50Z
Usage of scope(exit) comes with a significant performance cost, even if the executed code is "nothrow": ///////////////////////// uint fun() nothrow; __gshared int recurse; uint wrapper_scopeexit() { recurse++; scope(exit) recurse--; return fun(); } uint wrapper_linear() { recurse++; uint rc = fun(); recurse--; return rc; } ////////////////////////// This is the assembly for Win64 with "-O": _D4test17wrapper_scopeexitFZk: 0000000000000000: 55 push rbp 0000000000000001: 48 8B EC mov rbp,rsp 0000000000000004: 48 83 EC 18 sub rsp,18h 0000000000000008: 53 push rbx 0000000000000009: 56 push rsi 000000000000000A: 57 push rdi 000000000000000B: 41 54 push r12 000000000000000D: 41 55 push r13 000000000000000F: 41 56 push r14 0000000000000011: 41 57 push r15 0000000000000013: FF 05 00 00 00 00 inc dword ptr [_D4test7recursei] 0000000000000019: 48 83 EC 20 sub rsp,20h 000000000000001D: E8 00 00 00 00 call _D4test3funFNbZk 0000000000000022: 48 83 C4 20 add rsp,20h 0000000000000026: 48 89 45 F8 mov qword ptr [rbp-8],rax 000000000000002A: 48 83 EC 08 sub rsp,8 000000000000002E: E8 28 00 00 00 call 000000000000005B 0000000000000033: 48 83 C4 08 add rsp,8 0000000000000037: 48 8B 45 F8 mov rax,qword ptr [rbp-8] 000000000000003B: 41 5F pop r15 000000000000003D: 41 5E pop r14 000000000000003F: 41 5D pop r13 0000000000000041: 41 5C pop r12 0000000000000043: 5F pop rdi 0000000000000044: 5E pop rsi 0000000000000045: 5B pop rbx 0000000000000046: 48 8D 65 00 lea rsp,[rbp] 000000000000004A: 5D pop rbp 000000000000004B: C3 ret 000000000000004C: 48 83 EC 08 sub rsp,8 0000000000000050: E8 06 00 00 00 call 000000000000005B 0000000000000055: 48 83 C4 08 add rsp,8 0000000000000059: EB 07 jmp 0000000000000062 000000000000005B: FF 0D 00 00 00 00 dec dword ptr [_D4test7recursei] 0000000000000061: C3 ret 0000000000000062: 41 5F pop r15 0000000000000064: 41 5E pop r14 0000000000000066: 41 5D pop r13 0000000000000068: 41 5C pop r12 000000000000006A: 5F pop rdi 000000000000006B: 5E pop rsi 000000000000006C: 5B pop rbx 000000000000006D: 48 8D 65 00 lea rsp,[rbp] 0000000000000071: 5D pop rbp 0000000000000072: C3 ret _D4test14wrapper_linearFZk: 0000000000000000: 55 push rbp 0000000000000001: 48 8B EC mov rbp,rsp 0000000000000004: FF 05 00 00 00 00 inc dword ptr [_D4test7recursei] 000000000000000A: 48 83 EC 20 sub rsp,20h 000000000000000E: E8 00 00 00 00 call _D4test3funFNbZk 0000000000000013: 48 83 C4 20 add rsp,20h 0000000000000017: FF 0D 00 00 00 00 dec dword ptr [_D4test7recursei] 000000000000001D: 5D pop rbp 000000000000001E: C3 ret For Win32, it is slightly worse because the exception frames are still set up: _D4test17wrapper_scopeexitFZk comdat assume CS:_D4test17wrapper_scopeexitFZk L0: push EBP mov EBP,ESP mov EDX,FS:__except_list push 0FFFFFFFFh push offset _D4test17wrapper_scopeexitFZk[07Ch] push EDX mov FS:__except_list,ESP sub ESP,8 push EBX push ESI push EDI inc dword ptr _D4test7recursei mov dword ptr -4[EBP],0 call near ptr _D4test3funFNbZk mov dword ptr -4[EBP],0FFFFFFFFh push EAX call near ptr L5A pop EAX mov ECX,-0Ch[EBP] mov FS:__except_list,ECX pop EDI pop ESI pop EBX mov ESP,EBP pop EBP ret call near ptr L5A jmp short L68 L5A: mov dword ptr -4[EBP],0FFFFFFFFh dec dword ptr _D4test7recursei ret L68: mov ECX,-0Ch[EBP] mov FS:__except_list,ECX pop EDI pop ESI pop EBX mov ESP,EBP pop EBP ret mov EAX,offset FLAT:_DATA jmp near ptr __d_framehandler _D4test17wrapper_scopeexitFZk ends _D4test14wrapper_linearFZk comdat assume CS:_D4test14wrapper_linearFZk L0: push EAX inc dword ptr _D4test7recursei call near ptr _D4test3funFNbZk dec dword ptr _D4test7recursei pop ECX ret _D4test14wrapper_linearFZk ends In addition, even if the code might throw, the usual path should inline the code for scope(exit) instead of calling the exception handler.
Comment #1 by r.sagitario — 2014-03-23T02:01:32Z
The same happens with RAII: struct SCount { this(bool) nothrow { recurse++; } ~this() nothrow { recurse--; } } uint wrapper_raii() nothrow { SCount sc = SCount(true); return fun(); } produces almost the same code as scope(exit).
Comment #2 by andrej.mitrovich — 2014-03-23T09:24:36Z
Similar performance worry about scope(success); http://forum.dlang.org/thread/[email protected]?page=1
Comment #3 by code — 2014-04-05T04:13:57Z
The inefficiency comes from the extra call for the finally block, but we need a good idea how to make it cheaper. Maybe it's somehow possible to inline the finally block for the normal control flow.
Comment #4 by bugzilla — 2014-05-04T00:01:32Z
Comment #5 by andrej.mitrovich — 2014-05-05T22:48:26Z
(In reply to Andrej Mitrovic from comment #2) > Similar performance worry about scope(success); > http://forum.dlang.org/thread/mailman.493.1358378360.22503.digitalmars- > [email protected]?page=1 Can this be optimized separately from scope(exit)? I still have to go and re-read that thread again but in the meantime maybe someone else can chime in.
Comment #6 by b2.temp — 2018-04-05T04:18:53Z
Nowadays with -O same code is generated, likely since quire recently (https://forum.dlang.org/thread/[email protected]).