Bug 13829 – std.uni.byCodePoint for strings has length
Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-12-07T13:42:03Z
Last change time
2017-10-16T09:57:56Z
Assigned to
Dmitry Olshansky
Creator
Marc Schütz
Comments
Comment #0 by schuetzm — 2014-12-07T13:42:03Z
import std.uni;
static assert(__traits(compiles, "é".byCodePoint.length));
pragma(msg, typeof("é".byCodePoint)); // => string
The problem is that `byCodePoint(w?string.init)` returns its argument (string/wstring) which of course defines `length`, instead of a wrapper that doesn't.
The reason is once again auto-decoding. In std/uni.d(6644):
Range byCodePoint(Range)(Range range)
if(isInputRange!Range && is(Unqual!(ElementType!Range) == dchar))
{
return range;
}
`Unqual!(ElementType!string)` is of course `dchar`.
Brought up in this discussion:
http://forum.dlang.org/thread/[email protected]#post-ovzcetxbrdblpmyizdjr:40forum.dlang.org
Comment #1 by schuetzm — 2014-12-07T13:48:55Z
In case it wasn't clear:
For strings and wstrings, determining the actual number of code points is an O(n) operation and should therefore not be available via length at all. The current implementation returns the number of code units, not of code points.
Comment #2 by hsteoh — 2014-12-10T15:38:05Z
The documentation of byCodePoint states that it's the identity function when given a range of code points, and currently, strings are ranges of code points (due to autodecoding), so it simply returns the string as-is.
Should this be changed so that it returns a wrapper around the string that suppresses .length instead?
Comment #3 by peter.alexander.au — 2014-12-14T19:19:28Z
In a perfect world, I think it should return a different range, but it's now a breaking change, and even breaks its documented behaviour. So I'm voting that this shouldn't be fixed.
Note: hasLength will still return false.
Comment #4 by schuetzm — 2014-12-17T14:18:45Z
(In reply to Peter Alexander from comment #3)
> In a perfect world, I think it should return a different range, but it's now
> a breaking change, and even breaks its documented behaviour. So I'm voting
> that this shouldn't be fixed.
I strongly disagree with this. The status quo is clearly wrong.
Comment #5 by dmitry.olsh — 2017-09-11T09:01:49Z
(In reply to Peter Alexander from comment #3)
> In a perfect world, I think it should return a different range, but it's now
> a breaking change, and even breaks its documented behaviour. So I'm voting
> that this shouldn't be fixed.
>
> Note: hasLength will still return false.
Let's us not replicate the broken 'string has no length except it does' stuff even more.
If the user says byCodePoint he definetely expects a proper range.
I'll change the documentation to reflect this.