Bug 10472 – lastIndexOf(string, string) does not find single character string at beginning of string
Status
RESOLVED
Resolution
FIXED
Severity
critical
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Linux
Creation time
2013-06-25T02:00:00Z
Last change time
2014-02-17T07:34:00Z
Assigned to
nobody
Creator
rburners
Comments
Comment #0 by rburners — 2013-06-25T02:00:16Z
While working on lastIndexOf with a slice index I noticed that,
assert(lastIndexOf(to!S("öabcdefcdef"), to!T("ö")) == 0);
fails on both x64 as welll x86.
It also fails for the single dchar version,
assert(lastIndexOf(to!S("öabcdefcdef"), to!dchar("ö")) == 0);
I hope I will find the time to fix this this week.
Comment #1 by monarchdodra — 2013-06-25T05:00:57Z
The problem is this condition:
----
if (cast(dchar)(cast(Char)c) == c)
----
This is basically saying "if the code_point_ representation fits in a singe code_unit_, then we look at the code_units_". This is wrong, since for UTF8 characters with code_point_s in the 0x80 0xFF range will "fit" in a single code_unit_, but actually have a dual code_unit_ representation. In particular:
ö is represented by \00F6, which fits in a single code unit, yet, when encoded into UTF8 take up two: "0xC3 0xB6"
The correct question is:
if (codeLength!Char(c) == 1)
Or, if you want to tweak a little, since you don't need the *actual* codeLength:
----
static if (Char.sizeof == 1) immutable fits = c <= 0x7F;
else static if (Char.sizeof == 2) immutable fits = c <= 0xFFFF;
else immutable fits = true;
if (fits)
{
...
----
------------------
BTW, implementation wise, I do believe a simple foreach_reverse is more efficient, because it pops *as* it decodes. The for loop needs to stride backwards (again) after a call to "back" ("back" strides backwards already)).
foreach_reverse (i, dchar c2 ; s)
{
if ( c2 == c)
return i;
}
and
immutable c1 = std.uni.toLower(c);
foreach_reverse (i, dchar c2 ; s)
{
if ( std.uni.toLower(c2) == c1)
return i;
}
In any case, it is much simpler.
Comment #2 by rburners — 2013-06-25T05:13:59Z
awesome, thanks for taking a look. This gives me somewhere to start, or just copy paste ;-) I properly will find some time tonight as it looks right now.
Comment #3 by monarchdodra — 2013-06-26T01:18:53Z
(In reply to comment #1)
> ----
> static if (Char.sizeof == 1) immutable fits = c <= 0x7F;
> else static if (Char.sizeof == 2) immutable fits = c <= 0xFFFF;
> else immutable fits = true;
>
> if (fits)
> {
> ...
> ----
Edit: The correct condition would actually be:
> ----
static if (Char.sizeof == 1) immutable fits = c <= 0x7F;
else static if (Char.sizeof == 2) immutable fits = c <= 0xD7FF || (0xE000 <= c && c <= 0xFFFF
else immutable fits = true;
if (fits)
{
...
----
This would be the "most correct" condition. As stated in the pull, both:
`if (std.ascii.isAscii(c))`
`if (codeLength!Char(c) == 1)`
would also be correct.