Bug 5977 – String splitting with empty separator

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Windows
Creation time
2011-05-10T16:30:00Z
Last change time
2013-11-18T02:46:27Z
Keywords
patch
Assigned to
monarchdodra
Creator
bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2011-05-10T16:30:55Z
This D2 program seems to go in infinte loop (dmd 2.053beta): import std.string; void main() { split("a test", ""); } ------------------------ My suggestion is to add code like this in std.array.split(): if (delim.length == 0) return split(s); This means that en empty splitting string is like splitting on generic whitespace. This is useful in code like: auto foo(string txt, string delim="") { return txt.split(delim); } This means that calling foo with no arguments splits txt on whitespace, otherwise splits on the given string. This allows to use the two forms of split in foo() without if conditions. This is done in Python too, where None is used instead of an empty string. The modified split is something like (there is a isSomeString!S2 because are special, they aren't generic arrays, splitting on whitespace is meaningful for strings only): Unqual!(S1)[] split(S1, S2)(S1 s, S2 delim) if (isForwardRange!(Unqual!S1) && isForwardRange!S2) { Unqual!S1 us = s; if (isSomeString!S2 && delim.length == 0) { return split(s); } else { auto app = appender!(Unqual!(S1)[])(); foreach (word; std.algorithm.splitter(us, delim)) { app.put(word); } return app.data; } } Beside this change, I presume std.algorithm.splitter() too needs to test for an empty delim.
Comment #1 by bearophile_hugs — 2011-09-25T08:16:21Z
Alternative: throw an ArgumentError("delim argument is empty") exception if delim is empty.
Comment #2 by monarchdodra — 2012-10-22T02:42:42Z
*** Issue 8551 has been marked as a duplicate of this issue. ***
Comment #3 by monarchdodra — 2012-10-22T02:52:16Z
(In reply to comment #0) > This D2 program seems to go in infinte loop (dmd 2.053beta): > > > import std.string; > void main() { > split("a test", ""); > } > > ------------------------ > > My suggestion is to add code like this in std.array.split(): > > if (delim.length == 0) > return split(s); > > This means that en empty splitting string is like splitting on generic > whitespace. This is useful in code like: > > auto foo(string txt, string delim="") { > return txt.split(delim); > } I think it is a bad idea on two counts: 1. If the user wanted that behavior, he'd have written it as such. If the user actually passed a seperator that is an empty range, he probably didn't mean for it split by spaces. 2. I think it would also bring a deviation of behavior between strings and non-strings. Supposing r is empty: * "hello world".split(""); //Ok, split white * [1, 2].split(r); //Derp. (In reply to comment #1) > Alternative: throw an ArgumentError("delim argument is empty") exception if > delim is empty. I *really* think that is a *much* saner approach. Splitting with an empty separator is just not logic. Trying to force a default behavior in that scenario is wishful thinking (IMO). I think it should throw an error. I'll implement this.
Comment #4 by hsteoh — 2013-01-03T20:28:42Z
FWIW, in perl, splitting on an empty string simply returns an array of characters. I think that better reflects the symmetry of join("", array).
Comment #5 by bearophile_hugs — 2013-11-18T02:46:27Z
After this pull: https://github.com/D-Programming-Language/phobos/pull/1502 This program: void main() { import std.string, std.stdio; auto r = split("a test", ""); pragma(msg, typeof(r)); r.writeln; } Gives: string[] ["a", " ", "t", "e", "s", "t"] And this program: void main() { import std.algorithm, std.stdio; auto r = splitter("a test", ""); r.writeln; } Gives the same output: ["a", " ", "t", "e", "s", "t"] It's different from what Python does: >>> "a test".split("") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: empty separator But it's much better than an infinite loop, it can be often useful, and I think it's acceptable, so I close down the issue.