Bug 5282 – Optimize array comparison which use memcmp to something better and remove unnecessary indirections.

Status
NEW
Severity
enhancement
Priority
P4
Component
dmd
Product
D
Version
D2
Platform
Other
OS
All
Creation time
2010-11-27T22:30:10Z
Last change time
2024-12-13T17:54:15Z
Keywords
performance
Assigned to
No Owner
Creator
Jonathan M Davis
Moved to GitHub: dmd#18318 →

Comments

Comment #0 by issues.dlang — 2010-11-27T22:30:10Z
This thread makes it fairly clear that in some cases, array comparisons in D are unnecessarily slow: http://www.mail-archive.com/[email protected]/msg43150.html What likely should be done is to use memcmp in cases where you don't actually need to call opEquals() on the individual elements of the arrays - such as when comparing arrays of primitives or arrays of structs where none of the struct's member variables have opEquals() (be they primitives or structs without an opEquals() and no member variables with opEquals()). strings would naturally be among the types which would benefit from the optimization. Other optimizations for speeding up array comparisons may be appropriate, but I do think that it's clear that array comparisons can be better optimized in many common cases. Obviously, this isn't a high priority, but I do think that this is an optimization which should be added at some point if we really want D to be as fast as it should be.
Comment #1 by bearophile_hugs — 2010-11-28T03:20:18Z
(In reply to comment #0) > What likely should be done is to use memcmp in cases where you don't actually > need to call opEquals() on the individual elements of the arrays A simple solution for the string case: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=123044
Comment #2 by bearophile_hugs — 2010-11-28T09:05:27Z
The solution I suggest for this problem is: when DMD knows at compile-time the length of one of the two strings to equate, and such length is small (like < 6), then it may replace the string eq code like this: (c == "TAC") With code like: (c.length == 3 && c[0] == 'T' && c[1] == 'A' && c[2] == 'C')
Comment #3 by smjg — 2010-11-28T18:16:25Z
The conditions for memcmp working as a way of comparing structs are: - no opEquals of a compatible parameter type - no holes due to alignment - no members with reference semantics (dynamic arrays, AAs, classes) - all of this applies recursively to any struct or union members
Comment #4 by sandford — 2010-11-28T21:59:20Z
Here are two optimized routines for bit-wise comparing two arrays for equality, for both the dynamic/dynamic and dynamic/static case. These are significantly faster than memcmp (~2.5x) and D's built-in == operator (~3.3x) for the dynamic/dynamic case in a micro-benchmark. (The static case is naturally even faster) The difference between memcmp and '==' appears to be due to inlining: calling memcmp via a member function of a class (instead of from a free function) had approximately the same performance as D's '=='. This seems to indicate that a) use of memcmp in druntime for equality tests should be replaced with the below routine and b) DMD should be calling free-function versions of these routines instead of using multiple indirections. (By the way, does anyone know where array comparisons live in D? I tried modifying TypeInfo_Ag, but that didn't seem to work) /// Returns: true if the contents of two arrays are bit-wise equal. bool bitwiseEquality(T) (const T[] a, const T[] b) pure nothrow @trusted { if(a.length != b.length ) return false; if(a.ptr is b.ptr) return true; size_t byte_length = a.length*T.sizeof; size_t alignment = byte_length % ulong.sizeof; if(( (*(cast(ulong*)a.ptr) ^ *(cast(ulong*)b.ptr)) << 8 * (ulong.sizeof-alignment) )) return false; alignment += (!alignment) * ulong.sizeof; auto pa = cast(ulong*)(cast(ubyte*)a.ptr + alignment); auto pb = cast(ulong*)(cast(ubyte*)b.ptr + alignment); auto pa_end = cast(ulong*)(cast(ubyte*)a.ptr + byte_length); while (pa < pa_end) if(*pa++ != *pb++ ) return false; return true; } /// Returns: true if the contents of two arrays are bit-wise equal. bool bitwiseEquality(T, size_t N)(const T[] a, ref const T[N] b) pure nothrow @trusted { static if(T.sizeof*N <= uint.sizeof) { return a.length == N && !( (*(cast(uint*)a.ptr) ^ *(cast(uint*)b.ptr)) & (uint.max >> 8*(uint.sizeof - T.sizeof*N) )); } else static if(T.sizeof*N <= ulong.sizeof) { return a.length == N && !( (*(cast(ulong*)a.ptr) ^ *(cast(ulong*)b.ptr)) & (ulong.max>> 8*(ulong.sizeof - T.sizeof*N) )); } else { // Fall back to a loop if(a.length != N || (*(cast(ulong*)a.ptr) != *(cast(ulong*)b.ptr)) ) return false; enum alignment = T.sizeof*N % ulong.sizeof > 0 ? T.sizeof*N % ulong.sizeof : ulong.sizeof; auto pa = cast(ulong*)(cast(ubyte*)a.ptr + alignment); auto pb = cast(ulong*)(cast(ubyte*)b.ptr + alignment); auto pa_end = cast(ulong*)(cast(ubyte*)a.ptr + T.sizeof*N); while (pa < pa_end) { if(*pa++ != *pb++ ) return false; } return true; } }
Comment #5 by sandford — 2010-11-28T22:17:11Z
(In reply to comment #4) P.S. The performance gain of bitwiseEquality does drop for very large files (as things become more I/O bound), but it still maintains a ~1.5x advantage over memcmp.
Comment #6 by smjg — 2010-11-29T03:29:34Z
(In reply to comment #3) > The conditions for memcmp working as a way of comparing structs are: > > - no opEquals of a compatible parameter type > - no holes due to alignment > - no members with reference semantics (dynamic arrays, AAs, classes) > - all of this applies recursively to any struct or union members Looks like I was wrong.... http://www.digitalmars.com/d/1.0/expression.html#CmpExpression "Equality for struct objects means the bit patterns of the objects match exactly (the existence of alignment holes in the objects is accounted for, usually by setting them all to 0 upon initialization)."
Comment #7 by dlang-bugzilla — 2013-04-01T14:50:30Z
I've written a patch to optimize the simple case of comparison of arrays of basic types (integral and void[]), which was merged yesterday: http://d.puremagic.com/issues/show_bug.cgi?id=9477 https://github.com/D-Programming-Language/dmd/pull/1766 It uses the memcmp intrinsic. DMD currently does not do anything special if the array length is known at compile-time (i.e. at least one of the arrays is static), but I imagine smarter backends might take advantage of that and unroll it, as mentioned in comment #2. It would be possible to expand it to structs, if someone could implement the checks as described in comment #3.
Comment #8 by dmitry.olsh — 2018-05-22T09:09:57Z
Should be better now with recent templated array ops. Need to double-check this later.
Comment #9 by robert.schadek — 2024-12-13T17:54:15Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/18318 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB