If one looks at GCC output of PIC code they use a small function to obtain the PC, like so.
__x86.get_pc_thunk.bx LABEL NEAR
mov ebx, dword ptr [esp] ; 0430 _ 8B. 1C 24
ret ; 0433 _ C3
Then the thunk is called to get the PC into EBX.
call __x86.get_pc_thunk.bx ; 0444 _ E8, FFFFFFE7
What we currently do looks like this.
push ebx ; 0564 _ 53
call ?_014 ; 0565 _ E8, 00000000
?_014 LABEL NEAR
pop ebx ; 056A _ 5B
This is suboptimal for two reasons, first we're wasting 2 bytes per call for the push and pop of ebx, second we invalidate the return stack buffer for each of these calls, because there is no matching return.
I suggest to switch to the same technique that GCC is using.
We could even use the exact same naming so that the thunks produced by GCC for C code and dmd for D code could be merged at link time.
Comment #1 by robert.schadek — 2024-12-13T18:16:38Z