Bug 13829 – std.uni.byCodePoint for strings has length

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2014-12-07T13:42:03Z
Last change time
2017-10-16T09:57:56Z
Assigned to
Dmitry Olshansky
Creator
Marc Schütz

Comments

Comment #0 by schuetzm — 2014-12-07T13:42:03Z
import std.uni; static assert(__traits(compiles, "é".byCodePoint.length)); pragma(msg, typeof("é".byCodePoint)); // => string The problem is that `byCodePoint(w?string.init)` returns its argument (string/wstring) which of course defines `length`, instead of a wrapper that doesn't. The reason is once again auto-decoding. In std/uni.d(6644): Range byCodePoint(Range)(Range range) if(isInputRange!Range && is(Unqual!(ElementType!Range) == dchar)) { return range; } `Unqual!(ElementType!string)` is of course `dchar`. Brought up in this discussion: http://forum.dlang.org/thread/[email protected]#post-ovzcetxbrdblpmyizdjr:40forum.dlang.org
Comment #1 by schuetzm — 2014-12-07T13:48:55Z
In case it wasn't clear: For strings and wstrings, determining the actual number of code points is an O(n) operation and should therefore not be available via length at all. The current implementation returns the number of code units, not of code points.
Comment #2 by hsteoh — 2014-12-10T15:38:05Z
The documentation of byCodePoint states that it's the identity function when given a range of code points, and currently, strings are ranges of code points (due to autodecoding), so it simply returns the string as-is. Should this be changed so that it returns a wrapper around the string that suppresses .length instead?
Comment #3 by peter.alexander.au — 2014-12-14T19:19:28Z
In a perfect world, I think it should return a different range, but it's now a breaking change, and even breaks its documented behaviour. So I'm voting that this shouldn't be fixed. Note: hasLength will still return false.
Comment #4 by schuetzm — 2014-12-17T14:18:45Z
(In reply to Peter Alexander from comment #3) > In a perfect world, I think it should return a different range, but it's now > a breaking change, and even breaks its documented behaviour. So I'm voting > that this shouldn't be fixed. I strongly disagree with this. The status quo is clearly wrong.
Comment #5 by dmitry.olsh — 2017-09-11T09:01:49Z
(In reply to Peter Alexander from comment #3) > In a perfect world, I think it should return a different range, but it's now > a breaking change, and even breaks its documented behaviour. So I'm voting > that this shouldn't be fixed. > > Note: hasLength will still return false. Let's us not replicate the broken 'string has no length except it does' stuff even more. If the user says byCodePoint he definetely expects a proper range. I'll change the documentation to reflect this.
Comment #6 by dmitry.olsh — 2017-09-12T14:20:30Z
Comment #7 by github-bugzilla — 2017-09-25T16:47:11Z
Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/d46bd62bcaa080ea1bfa19fc8d80359226f304a6 Fix issue 13829 - byCodePoint has length https://github.com/dlang/phobos/commit/4cc17371b0994fe5aa494b800105dcae30ada674 Merge pull request #5733 from DmitryOlshansky/fix-issue-13829 Fix issue 13829 - byCodePoint has length merged-on-behalf-of: Dmitry Olshansky <[email protected]>
Comment #8 by github-bugzilla — 2017-10-16T09:57:56Z
Commits pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/d46bd62bcaa080ea1bfa19fc8d80359226f304a6 Fix issue 13829 - byCodePoint has length https://github.com/dlang/phobos/commit/4cc17371b0994fe5aa494b800105dcae30ada674 Merge pull request #5733 from DmitryOlshansky/fix-issue-13829