Bug 10403 – memchr optimization for std.algorithm.find
Status
RESOLVED
Resolution
FIXED
Severity
enhancement
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2013-06-18T05:52:00Z
Last change time
2015-06-09T05:15:21Z
Assigned to
nobody
Creator
tommitissari
Comments
Comment #0 by tommitissari — 2013-06-18T05:52:38Z
Add an optimization for:
R find(alias pred = "a == b", R, E)(R haystack, E needle);
...to use the c function memchr when pred is the default "a == b" and...
CASE 1:
All of the following are true:
1) R is an array of elements of type char, byte, ubyte or an enum type whose base type is one of those.
2) E is integral
3) For the element type of R:
alias Elem = ElementType!R;
...the following is true:
(Elem.min <= needle && needle <= Elem.max)
This is important because memchr only cares about bitwise equality. If the value of needle is beyound the limits of possible values for type Elem, then we know needle cannot be found within haystack and the function can just return early without searching.
CASE 2:
All of the following are true:
1) R is an array of elements of type E
2) E.sizeof == 1
3) For type E equality means bitwise equality
Comment #1 by monarchdodra — 2013-06-19T12:57:30Z
(In reply to comment #0)
> Add an optimization for:
>
> R find(alias pred = "a == b", R, E)(R haystack, E needle);
>
> ...to use the c function memchr when pred is the default "a == b" and...
>
> CASE 1:
> All of the following are true:
>
> 1) R is an array of elements of type char, byte, ubyte or an enum type whose
> base type is one of those.
> 2) E is integral
> 3) For the element type of R:
> alias Elem = ElementType!R;
> ...the following is true:
> (Elem.min <= needle && needle <= Elem.max)
> This is important because memchr only cares about bitwise equality. If the
> value of needle is beyound the limits of possible values for type Elem, then we
> know needle cannot be found within haystack and the function can just return
> early without searching.
>
> CASE 2:
> All of the following are true:
>
> 1) R is an array of elements of type E
> 2) E.sizeof == 1
> 3) For type E equality means bitwise equality
I'm just about finished implementing the above, along with quite a few other tricks, and I'm getting some 'great' performance improvements.
I implemented "CASE 2".
However, I don't think it is worth implementing "CASE 1": I think the case of searching for an element that is simply outside of the possible values of the range's element type should be a very rare case. I mean, who would write:
find(myRangeOfBytes, 5000);
???
I come to the conclusion that the cost of checking the condition all the time just to reduce the time of a special case is not worth it.
It's kind of like of the "opAssign check for self assignment" issue. If you can write the code in such a case that the common case goes faster, but self is more costly, then that is better.
So yeah, I implemented "CASE 2".
Comment #2 by tommitissari — 2013-06-19T14:48:56Z
(In reply to comment #1)
> (In reply to comment #0)
> > Add an optimization for:
> >
> > R find(alias pred = "a == b", R, E)(R haystack, E needle);
> >
> > ...to use the c function memchr when pred is the default "a == b" and...
> >
> > CASE 1:
> > All of the following are true:
> >
> > 1) R is an array of elements of type char, byte, ubyte or an enum type whose
> > base type is one of those.
> > 2) E is integral
> > 3) For the element type of R:
> > alias Elem = ElementType!R;
> > ...the following is true:
> > (Elem.min <= needle && needle <= Elem.max)
> > This is important because memchr only cares about bitwise equality. If the
> > value of needle is beyound the limits of possible values for type Elem, then we
> > know needle cannot be found within haystack and the function can just return
> > early without searching.
> >
> > CASE 2:
> > All of the following are true:
> >
> > 1) R is an array of elements of type E
> > 2) E.sizeof == 1
> > 3) For type E equality means bitwise equality
>
> I'm just about finished implementing the above, along with quite a few other
> tricks, and I'm getting some 'great' performance improvements.
>
> I implemented "CASE 2".
>
> However, I don't think it is worth implementing "CASE 1": I think the case of
> searching for an element that is simply outside of the possible values of the
> range's element type should be a very rare case. I mean, who would write:
> find(myRangeOfBytes, 5000);
> ???
> I come to the conclusion that the cost of checking the condition all the time
> just to reduce the time of a special case is not worth it.
>
> It's kind of like of the "opAssign check for self assignment" issue. If you can
> write the code in such a case that the common case goes faster, but self is
> more costly, then that is better.
>
> So yeah, I implemented "CASE 2".
Checking for (Elem.min <= needle && needle <= Elem.max) is not an optimization, it's there for code correctness sake. It's the memchr that is the optimization. And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when it's possible that it might be true, i.e. only when the signedness of Elem is different from the signedness of the type of needle.
Comment #3 by tommitissari — 2013-06-19T14:53:11Z
(In reply to comment #2)
> And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when
> it's possible that it might be true, [..]
I mean... checking for (Elem.min <= needle && needle <= Elem.max) is needed only when it's possible that it might be false
Comment #4 by monarchdodra — 2013-06-19T15:30:33Z
(In reply to comment #3)
> (In reply to comment #2)
> > And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when
> > it's possible that it might be true, [..]
>
> I mean... checking for (Elem.min <= needle && needle <= Elem.max) is needed
> only when it's possible that it might be false
Oh...
OK. Makes sense, hadn't thought of that :/ Will deal with it.
Comment #5 by monarchdodra — 2013-06-20T00:11:35Z
Actually, I thought about it some more(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when
> > > it's possible that it might be true, [..]
> >
> > I mean... checking for (Elem.min <= needle && needle <= Elem.max) is needed
> > only when it's possible that it might be false
>
> Oh...
>
> OK. Makes sense, hadn't thought of that :/ Will deal with it.
Hum... dealing with this optimization is making the code much more complicated then I care for right now...
So I'm just leaving it out. Sorry. I did put quite a few other in though...
Comment #6 by github-bugzilla — 2013-10-31T13:32:19Z