Comment #0 by peter.alexander.au — 2013-09-12T10:52:33Z
char[] s = new char[10_000_000];
s[] = 'A';
auto s2 = s.toLower;
This takes 4.3 seconds on my machine.
char[] s = new char[10_000_000];
s[] = 'A';
auto s2 = s.map!toLower.to!string;
This only takes 1.1 seconds.
Looking at the code for std.uni.toLower, it appears the string is constructed using repeated ~=. It should use an Appender of some sort.
Comment #1 by dmitry.olsh — 2013-09-12T11:59:08Z
(In reply to comment #0)
> char[] s = new char[10_000_000];
> s[] = 'A';
> auto s2 = s.toLower;
>
> This takes 4.3 seconds on my machine.
>
>
> char[] s = new char[10_000_000];
> s[] = 'A';
> auto s2 = s.map!toLower.to!string;
>
> This only takes 1.1 seconds.
>
There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S).
> Looking at the code for std.uni.toLower, it appears the string is constructed
> using repeated ~=. It should use an Appender of some sort.
This indeed could be fixed I do suspect put an optimisitc reserve(original.length) there would work even better. See also issue 10864:
http://d.puremagic.com/issues/show_bug.cgi?id=10864
Comment #2 by peter.alexander.au — 2013-09-12T12:45:45Z
(In reply to comment #1)
> There 2 things here to consider - first the 2nd one is not correct in general
> (1 codepoint can map to many e.g. german sharp S).
Good point, although std.uni.toUpper doesn't handle it either :-)
assert("ß".toUpper == "ß"); // passes
Comment #3 by dmitry.olsh — 2013-09-12T12:50:37Z
(In reply to comment #2)
> (In reply to comment #1)
> > There 2 things here to consider - first the 2nd one is not correct in general
> > (1 codepoint can map to many e.g. german sharp S).
>
> Good point, although std.uni.toUpper doesn't handle it either :-)
>
> assert("ß".toUpper == "ß"); // passes
To Lower will do. Sharp S is capital ;)
Comment #4 by peter.alexander.au — 2013-09-12T12:52:31Z
(In reply to comment #3)
> To Lower will do. Sharp S is capital ;)
assert("ß".toLower == "ß");
assert("ß".toUpper == "ß");
Both pass.
Comment #5 by dmitry.olsh — 2013-09-12T14:01:05Z
(In reply to comment #4)
> (In reply to comment #3)
> > To Lower will do. Sharp S is capital ;)
>
> assert("ß".toLower == "ß");
> assert("ß".toUpper == "ß");
>
> Both pass.
Something wicked have happend.
I see that I've messed up toUpper in table generator while introducing toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken in half of cases apparently.
How I missed that I've no idea ... gotta expand the test coverage around toLower/toUpper.
Comment #6 by dmitry.olsh — 2013-09-12T14:07:17Z
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > To Lower will do. Sharp S is capital ;)
> >
> > assert("ß".toLower == "ß");
> > assert("ß".toUpper == "ß");
> >
> > Both pass.
>
> Something wicked have happend.
> I see that I've messed up toUpper in table generator while introducing
> toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken
> in half of cases apparently.
> How I missed that I've no idea ... gotta expand the test coverage around
> toLower/toUpper.
P.S. And there are both kinds of sharp s ... \u1E9E and \u00df
Comment #7 by peter.alexander.au — 2014-02-22T12:25:47Z