Bug 5257 – std.algorithm.count works incorrectly with UTF8 and UTF16 strings

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
Other
OS
Mac OS X
Creation time
2010-11-22T10:54:00Z
Last change time
2010-11-25T21:48:06Z
Assigned to
repeatedly
Creator
andrei

Attachments

IDFilenameSummaryContent-TypeSize
831std_algorithm_count_support_utf_8_and_16.patchPatch for this issue.text/plain1536

Comments

Comment #0 by andrei — 2010-11-22T10:54:01Z
import std.stdio; import std.algorithm; void main() { writeln(count!("true")("日本語")); // Three characters. } The code prints 9 but should print 3.
Comment #1 by andrei — 2010-11-22T10:54:48Z
Submitted on behalf of Rainer Deyke.
Comment #2 by jakobovrum — 2010-11-22T12:28:48Z
This is almost entirely off-topic, but I don't think such a tiny change deserves its own issue... sorry if I should have :( When this gets fixed, count() will be useful as a generic way to count the amount of code points in a UTF encoded string. But I don't think the interface is very pretty for this simple use case. As a completely non-breaking change, how about changing: size_t count(alias pred, Range)(Range r) if (isInputRange!(Range)) to: size_t count(alias pred = "true", Range)(Range r) if (isInputRange!(Range)) So one could simply do count("日本語")?
Comment #3 by repeatedly — 2010-11-24T07:18:51Z
Created attachment 831 Patch for this issue. I wrote a simple patch. This patch decodes each char types to dchar and passes predication.
Comment #4 by andrei — 2010-11-25T14:51:45Z
Thanks, Masahiro. I fixed with simpler means that don't need special casing.
Comment #5 by repeatedly — 2010-11-25T21:48:06Z
(In reply to comment #4) > Thanks, Masahiro. I fixed with simpler means that don't need special casing. Good! Are you going to deprecate std.utf.count? std.algorithm.count(now, default pred is "true") and std.utf.count seem to be duplicate.