← Back to index | Original Bugzilla link

Bug 7689 – splitter() on ivalid UTF-8 sequences

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: x86
OS: Windows
Creation time: 2012-03-11T14:07:00Z
Last change time: 2013-11-18T02:36:29Z
Assigned to: monarchdodra
Creator: bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2012-03-11T14:07:42Z

Is this difference/inconsistency between split() and splitter() desired and good? import std.string, std.array, std.algorithm, std.range; void main() { char[] s = cast(char[])[131, 64, 32, 251, 22]; assert(std.string.split(s).length == 2); // no error assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8 sequence } Output, DMD 2.059head: std.utf.UTFException@std\utf.d(645): Invalid UTF-8 sequence (at index 1) ---------------- ...\dmd2\src\phobos\std\array.d(469): dchar std.array.front!(char[]).front(char[]) ...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28... ...\dmd2\src\phobos\std\range.d(971): D3std5range97__... ----------------

Comment #1 by monarchdodra — 2012-10-22T23:06:22Z

(In reply to comment #0) > Is this difference/inconsistency between split() and splitter() desired and > good? > > > import std.string, std.array, std.algorithm, std.range; > void main() { > char[] s = cast(char[])[131, 64, 32, 251, 22]; > assert(std.string.split(s).length == 2); // no error > assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence > assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8 > sequence > } > > > Output, DMD 2.059head: > > std.utf.UTFException@std\utf.d(645): Invalid UTF-8 sequence (at index 1) > ---------------- > ...\dmd2\src\phobos\std\array.d(469): dchar > std.array.front!(char[]).front(char[]) > ...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28... > ...\dmd2\src\phobos\std\range.d(971): D3std5range97__... > ---------------- This is a bug in string.split (which is actually a public import of array.split). Currently array.split only supports ascii white, and is oblivious to longer utf whites (but it does work on unicode).

Comment #2 by bearophile_hugs — 2013-11-18T02:36:29Z

Seems fixed: https://github.com/D-Programming-Language/phobos/pull/1502