← Back to index | Original Bugzilla link

Bug 8946 – « any » function does not what it should do

Status: RESOLVED
Resolution: INVALID
Severity: major
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2012-11-02T11:07:00Z
Last change time: 2015-06-09T05:13:44Z
Assigned to: nobody
Creator: dimitri.sabadie

Comments

Comment #0 by dimitri.sabadie — 2012-11-02T11:07:46Z

The any function calls find, which is really weird implemented because it returns a subrange, doing (re)allocation(s) on the fly. I think it’s way too much for just applying a predicate on a range. That’s why I have rewritten any as it would be (a simple loop and apply the predicate), and I’d propose a patch here for the std one.

Comment #1 by dmitry.olsh — 2012-11-03T07:18:01Z

Ever heard of slicing an array in D? Check this out: http://dlang.org/d-array-article.html Dig down in std.algotrithm.find and you'll find out that it's lazy iterates a forward range. Thus like the most of std.algorithm functions it DOESN'T allocate. In case of array it just returns a slice of original array. So this report is invalid.

Comment #2 by dimitri.sabadie — 2012-11-03T14:09:28Z

(In reply to comment #1) > Ever heard of slicing an array in D? > > Check this out: http://dlang.org/d-array-article.html > > Dig down in std.algotrithm.find and you'll find out that it's lazy iterates a > forward range. Thus like the most of std.algorithm functions it DOESN'T > allocate. In case of array it just returns a slice of original array. > > So this report is invalid. Ok, my argument wasn’t worth it I agree. But, why any would use such a function where a simple loop and a test will do it? It’s the same argument for not to use countUntil and check if the index is invalid, it consumes more memory that we want. any does things we don’t want to IMHO.

Comment #3 by dmitry.olsh — 2012-11-03T14:18:25Z

> Ok, my argument wasn’t worth it I agree. But, why any would use such a function where a simple loop and a test will do it? Because find is a simple loop? To not duplicate code? If you have liner search then it's obvious (for me) to reuse it for exactly the same task. any basically gives you a bit less then find/countUntill (bool vs pointer-index) by doing the same amount of work. > it consumes more memory that we want. any does things we don’t want to IMHO. Simply not true. I'm not sure where this 'memory' argument comes from at all. In fact it saves some code space by not duplicating the same code over again. At best this could be an enhancement request (or better yet - a pull request) if there is a soild proof that the resulting code is measurably _faster_.

Comment #4 by dimitri.sabadie — 2012-11-03T14:35:35Z

(In reply to comment #3) > > Ok, my argument wasn’t worth it I agree. But, why any would use such a function > where a simple loop and a test will do it? > > Because find is a simple loop? To not duplicate code? If you have liner search > then it's obvious (for me) to reuse it for exactly the same task. any basically > gives you a bit less then find/countUntill (bool vs pointer-index) by doing the > same amount of work. > > > it consumes more memory that we want. any does things we don’t want to IMHO. > > Simply not true. I'm not sure where this 'memory' argument comes from at all. > In fact it saves some code space by not duplicating the same code over again. > At best this could be an enhancement request (or better yet - a pull request) > if there is a soild proof that the resulting code is measurably _faster_. There’s a popFront() function, so why would it be the correct way to search? Reusing a « linear search » code is not a point here because we want to _find_ something. countUntil gives us the position of an element, find should give whether the element is or not in a range. It returns a range: why ? It’s clearly implementation-related, and we don’t care about that detail. I’ll propose a pull request because the simple fact there’s « find » and « any » is not good.

Comment #5 by timon.gehr — 2012-11-03T14:47:42Z

(In reply to comment #4) > ... > There’s a popFront() function, so why would it be the correct way to search? > Reusing a « linear search » code is not a point here because we want to _find_ > something. countUntil gives us the position of an element, find should give > whether the element is or not in a range. Well, no. > It returns a range: why ? > ... Because that is its specification. What is the issue?

Comment #6 by dimitri.sabadie — 2012-11-03T16:45:23Z

(In reply to comment #5) > (In reply to comment #4) > > ... > > There’s a popFront() function, so why would it be the correct way to search? > > Reusing a « linear search » code is not a point here because we want to _find_ > > something. countUntil gives us the position of an element, find should give > > whether the element is or not in a range. > > Well, no. > > > It returns a range: why ? > > ... > > Because that is its specification. What is the issue? The issue is: a function that is designed to « search » for something is excepted to return a boolean, not top popFront() shit and returns the result range…

Comment #7 by andrej.mitrovich — 2012-11-03T16:58:47Z

(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > ... > > > There’s a popFront() function, so why would it be the correct way to search? > > > Reusing a « linear search » code is not a point here because we want to _find_ > > > something. countUntil gives us the position of an element, find should give > > > whether the element is or not in a range. > > > > Well, no. > > > > > It returns a range: why ? > > > ... > > > > Because that is its specification. What is the issue? > > The issue is: a function that is designed to « search » for something is > excepted to return a boolean, not top popFront() shit and returns the result > range… Then use canFind.

Comment #8 by timon.gehr — 2012-11-03T17:38:29Z

(In reply to comment #6) > ... > > The issue is: a function that is designed to « search » for something is > excepted to return a boolean, not top popFront() [...] and returns the result > range… Maybe the expectation is flawed? eg. see www.google.com. The service called 'search' delivers more information than just a boolean.

Comment #9 by issues.dlang — 2012-11-03T18:12:26Z

> Then use canFind. canFind does exactly the same thing. The _only_ difference between calling find and checking whether the result was empty rather than duplicating find's implementation inside of canFind or any is the fact that a range is returned, and since that range is bound to be a relatively small type (almost always either an array or a struct with just a handful of member variables), the cost of that return is generally minimal, and there's a halfway decent chance that a move was involved rather than a copy, making it that much more efficient. If there's a measurable difference in efficiency between calling find (and checking for empty) and reimplementing find inside of canFind or any, then maybe we'll look at making it so that canFind and any don't call find. But the overhead of that return is so small (especially in comparison to the algorithm itself), that I'd be surprised if that ever happened.

Comment #10 by dimitri.sabadie — 2012-11-04T04:11:05Z

(In reply to comment #9) > If there's a measurable difference in efficiency between calling find (and > checking for empty) and reimplementing find inside of canFind or any, then > maybe we'll look at making it so that canFind and any don't call find. But the > overhead of that return is so small (especially in comparison to the algorithm > itself), that I'd be surprised if that ever happened. The difference is that no one expect D developpers — who know the phobos — want to manipulate a range as return of a search function. An index would be okay since it returns a logic value for a search concept, but seriously, why would it return the range with all poped up values to find the object ? It’s clearly weird, and D is the single langage to do such a thing. Also, wanting to refactor all the world is clearly not a good idea. Sometimes, rewritting the wheel to adapt it for a specific concept is faraway better than reusing a weird code that makes the function weird. Moreover, any would be something like 3-4 lines of code… It’s really terrible. So you prefer to write it with just one line and propose a weird interface. Clever.

Comment #11 by issues.dlang — 2012-11-04T04:26:54Z

C++'s find takes a pair of iterators indicating the beginning and ending of a range and returns an iterator to the element found - i.e. the new beginning of the range. D's ranges can be viewed as being the same as that pair of iterators. But unlike iterators, they always have a begin and end. So, rathering than just return the new begin (which it can't do, because that would be one iterator instead of the pair), D's find returns the new begin and the end together as a new range. What D is doing is the logical progression of what happens when you move from iterators to ranges. You're free to not like it, but it works _very_ well, and it's very much in line with how C++ does things, only we're doing it with ranges rather than iterators, and the result is actually much more powerful than what C++ has, because ranges are far more powerful. As for returning an index, that doesn't work for D any more than it works for C++. For anything which isn't random access, you'd have to iterate over the entire range again to get to that element. Iterators are _far_ more efficient than using indices. If you were to ask a linked list for its nth element, you'd have to iterate over the entire list up to that element to get to it, meaning that something like for(size_t i = 0; i < list.size(); ++i) auto e = list.getIndex(i); would actually end up iterating over the entire list multiple times in order to get to each element. Contrast that with iterators, which iterate only once for(list::iterator iter; iter < list.end(); ++iter) auto e = *iter; Ranges are just a further evolution of iterators, and the result is surprisingly flexible and powerful, especially when you start chaining function calls. One of the major rationales behind ranges is discussed here: http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321