Bug 10932 – Useless temporaries and other absurd in inlined code

Status
NEW
Severity
normal
Priority
P3
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-08-30T13:55:02Z
Last change time
2024-12-13T18:11:05Z
Keywords
performance
Assigned to
No Owner
Creator
Dmitry Olshansky
Moved to GitHub: dmd#18662 →

Comments

Comment #0 by dmitry.olsh — 2013-08-30T13:55:02Z
This is drilling down on the issue of why multi-stage lookup tables of new std.uni ahve decent speed in LDC and painstakingly slow in DMD. Observe that the following snippet (inlined opIndex of a Trie) does 2 remarkably stupid things: a) See that read/write of arg_0 on stack, instead of direct "mov edx, ebx" b) 2 push eax at the begining of function ... and "add esp, 8" at end - WAT? Note that eax is never written to until the very end (there is simply no need to save it). public _D3std3uni146__T4TrieTS3std3uni19__T9BitPackedTbVi1Z9BitPackedTwVk1114112TS3std3uni21__T9sliceBi39F8D5E3EE00191D27B7780CD5A2FFED _D3std3uni146__T4TrieTS3std3uni19__T9BitPackedTbVi1Z9BitPackedTwVk1114112TS3std3uni21__T9sliceBi39F8D5E3EE00191D27B7780CD5A2FFED proc near ; CODE XREF: _D9trie_test4mainFAAyaZv17__foreachbody6846MFNfKwZi+Ap var_8 = dword ptr -8 arg_0 = dword ptr 4 push eax mov ecx, [esp+4+arg_0] shr ecx, 8 push eax and ecx, 1FFFh mov edx, [eax+14h] push ebx mov bx, [edx+ecx*2] push esi mov esi, [esp+10h+arg_0] and esi, 0FFh push edi mov edi, [eax+4] lea ecx, [edx+edi*4] and ebx, 0FFFFh shl ebx, 8 add ebx, esi mov [esp+14h+var_8], ebx mov edx, [esp+14h+var_8] shr ebx, 5 and edx, 1Fh bt [ecx+ebx*4], edx sbb eax, eax neg eax pop edi pop esi pop ebx add esp, 8 retn 4 _D3std3uni146__T4TrieTS3std3uni19__T9BitPackedTbVi1Z9BitPackedTwVk1114112TS3std3uni21__T9sliceBi39F8D5E3EE00191D27B7780CD5A2FFED endp And the D code where the above object code can be seen: //Command line: dmd -O -release -inline -noboundscheck import std.uni, std.stdio; //match bits that std.regex had before alias codepointSetTrie!(13, 8) makeTrie; void main(string argv[]) { auto tr = makeTrie(unicode.Alphabetic); int count; foreach(arg; argv) foreach(dchar ch; arg) if(tr[ch]) count++; writeln(count); }
Comment #1 by dmitry.olsh — 2013-08-30T13:58:54Z
> See that read/write of arg_0 on stack ... Should be var_8, of course.
Comment #2 by bugzilla — 2015-09-23T07:19:30Z
The PUSH EAX is there to align the stack to 16 bytes.
Comment #3 by robert.schadek — 2024-12-13T18:11:05Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/18662 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB