Bug 14937 – Slow code compared to ldc/gdc on calculation with real variables

Status
NEW
Severity
enhancement
Priority
P4
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2015-08-19T19:26:26Z
Last change time
2024-12-13T18:44:17Z
Keywords
performance
Assigned to
No Owner
Creator
secondaryAccount
Moved to GitHub: dmd#19030 →

Attachments

IDFilenameSummaryContent-TypeSize
1542file_14937.txtbenchmarked codetext/plain1086
1543input.tar.bz2compressed benchmark input fileapplication/x-bzip991664
1544test.dfull benchmark code for reduced input filetext/plain2253

Comments

Comment #0 by secondaryAccount — 2015-08-19T19:26:26Z
Created attachment 1542 benchmarked code http://forum.dlang.org/thread/[email protected]?page=9#post-mr2ef5:241e2a:241:40digitalmars.com The code is in the attachment. Timings with some test input described below: dmd -O -inline -release -noboundscheck 3700-3800 ms gdc -O3 -march=native -frelease -fno-bounds-check ~1000 ms ldc2 -O3 -release -disable-boundscheck ~800 ms versions: dmd 2.068 gdc based on 4.9.2 ldc 0.15.2beta1 OS: linux X86_64 Some notes: - the benchmark is calling cosineSimilarity 1 million times with different input and sum the return values (large text input file + IO functions omitted here. I can add them if helpful.) - timing with std.datetime around the loop - no IO included. - pragma(inline, true) shows that dmd is unable to inline scalarProduct and normSquared. disabling inlining for ldc causes no noticeable slowdown. - elements of SparseVector are sorted by index. - SparseVector.length is usually between 50 and 100 and maximal index is 47,000 - v1 and v2 are not pointing to the same data. - gap between dmd and ldc/gdc is much smaller when replacing "real" with double.
Comment #1 by secondaryAccount — 2015-08-19T20:37:07Z
Created attachment 1543 compressed benchmark input file reduced benchmark input file to pass size limit.
Comment #2 by secondaryAccount — 2015-08-19T20:45:30Z
Created attachment 1544 full benchmark code for reduced input file The attached input file contains 1400 example vectors. This benchmark programm calls cosineSimilarity 700 x 700 times (controlled by slices / foreach loops in main). Sufficient to reproduce. The timings in the bug report are based on 1000 x 1000 calls of cosineSimilarity and 2000 example vectors. Path to input file is hard coded (same directory) -> line 11
Comment #3 by robert.schadek — 2024-12-13T18:44:17Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/19030 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB