Comment #0 by andrej.mitrovich — 2012-03-15T09:55:27Z
Sometimes I want to find the index of a substring, but not necessarily the index of the first occurence. E.g. in the string "foo x foo y" I want to get the index of the second foo. Here's a simple implementation based on the existing countUntil:
import std.algorithm : startsWith;
import std.array;
import std.traits;
sizediff_t countUntil(alias pred = "a == b", R, N)(R haystack, N needle, size_t index)
if (is(typeof(startsWith!pred(haystack, needle))))
{
size_t count;
static if (isNarrowString!R)
{
// Narrow strings are handled a bit differently
auto length = haystack.length;
for (; !haystack.empty; haystack.popFront())
{
if (startsWith!pred(haystack, needle))
{
if (count == index)
return length - haystack.length;
else
count++;
}
}
}
else
{
typeof(return) result;
for (; !haystack.empty; ++result, haystack.popFront())
{
if (startsWith!pred(haystack, needle))
{
if (count == index)
return result;
else
count++;
}
}
}
return -1;
}
void main()
{
string s = "foo x foo y";
auto idx1 = s.countUntil("foo", 0);
auto idx2 = s.countUntil("foo", 1);
auto idx3 = s.countUntil("foo", 2);
assert(idx1 == 0);
assert(idx2 == 6);
assert(idx3 == -1);
}
Equivalent functionality could be added to std.string.indexOf to allow case-unsensitive searches.
Comment #1 by andrej.mitrovich — 2012-12-10T14:51:28Z
This niche feature belongs in user-space libraries, not Phobos.
Comment #2 by bearophile_hugs — 2012-12-10T15:28:19Z
This belongs in Phobos. See Python string functions, that have optional start-end:
>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub [,start [,end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
>>> help(str.index)
Help on method_descriptor:
index(...)
S.index(sub [,start [,end]]) -> int
Like S.find() but raise ValueError when the substring is not found.
Comment #3 by monarchdodra — 2012-12-11T05:30:54Z
(In reply to comment #0)
> Sometimes I want to find the index of a substring, but not necessarily the
> index of the first occurence. E.g. in the string "foo x foo y" I want to get
> the index of the second foo.
This would conflict with this other request:
http://d.puremagic.com/issues/show_bug.cgi?id=5507
which would allow counting until multiple needles, eg:
//----
string s = "foo x bar y";
size_t index = s.countUntil("bar", 'x');
assert(index == 4);
//----
So what you are asking for may have to be implemented as "countUntilN" or something:
//-----
string s = "foo x bar y";
size_t index = s.countUntilN(1, "bar", 'x');
assert(index == 6); //First found 'x', then found "bar".
//-----
> Here's a simple implementation based on the
> existing countUntil:
Just want to point out that that implementation is based on an old and buggy implementation of countUntil. 2 problems:
1. No support for reference type ranges.
2. Incorrect support of unicode.
The issue 1 is not that big of an issue, but is unacceptable of a standard library.
Issue 2 depends on what you want. Do you want the *index* in the string, or the logical *range_position*:
"日本語".countUntil(本); //1 popFront once
"日本語".countUntil(本); //3 slice index: "日本語"[3 .. $]
If you plan on using countUntil(N), then I'd recommend basing your implementation on https://github.com/D-Programming-Language/phobos/pull/951
Finally, keep in mind that the standard function "count" skips over overlaps:
"ababab".count("abab"); //produces 1. Only 1 match.
However, your countUntil implementation would find that second match:
"ababab".countUntilN(1, "abab"); //Produces 2
This would be kind of weird:
"Given a range which holds 1 instance of a needle, countUntilN is capable of finding the second occurrence of said needle :/"