← Back to index | Original Bugzilla link

Bug 5257 – std.algorithm.count works incorrectly with UTF8 and UTF16 strings

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P2
Component: dmd
Product: D
Version: D2
Platform: Other
OS: Mac OS X
Creation time: 2010-11-22T10:54:00Z
Last change time: 2010-11-25T21:48:06Z
Assigned to: repeatedly
Creator: andrei

Attachments

ID	Filename	Summary	Content-Type	Size
831	std_algorithm_count_support_utf_8_and_16.patch	Patch for this issue.	text/plain	1536

Comments

Comment #0 by andrei — 2010-11-22T10:54:01Z

import std.stdio; import std.algorithm; void main() { writeln(count!("true")("日本語")); // Three characters. } The code prints 9 but should print 3.

Comment #1 by andrei — 2010-11-22T10:54:48Z

Submitted on behalf of Rainer Deyke.

Comment #2 by jakobovrum — 2010-11-22T12:28:48Z

This is almost entirely off-topic, but I don't think such a tiny change deserves its own issue... sorry if I should have :( When this gets fixed, count() will be useful as a generic way to count the amount of code points in a UTF encoded string. But I don't think the interface is very pretty for this simple use case. As a completely non-breaking change, how about changing: size_t count(alias pred, Range)(Range r) if (isInputRange!(Range)) to: size_t count(alias pred = "true", Range)(Range r) if (isInputRange!(Range)) So one could simply do count("日本語")?

Comment #3 by repeatedly — 2010-11-24T07:18:51Z

Created attachment 831 Patch for this issue. I wrote a simple patch. This patch decodes each char types to dchar and passes predication.

Comment #4 by andrei — 2010-11-25T14:51:45Z

Thanks, Masahiro. I fixed with simpler means that don't need special casing.

Comment #5 by repeatedly — 2010-11-25T21:48:06Z

(In reply to comment #4) > Thanks, Masahiro. I fixed with simpler means that don't need special casing. Good! Are you going to deprecate std.utf.count? std.algorithm.count(now, default pred is "true") and std.utf.count seem to be duplicate.