← Back to index | Original Bugzilla link

Bug 13829 – std.uni.byCodePoint for strings has length

Status: RESOLVED
Resolution: FIXED
Severity: normal
Priority: P1
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2014-12-07T13:42:03Z
Last change time: 2017-10-16T09:57:56Z
Assigned to: Dmitry Olshansky
Creator: Marc Schütz

Comments

Comment #0 by schuetzm — 2014-12-07T13:42:03Z

import std.uni; static assert(__traits(compiles, "é".byCodePoint.length)); pragma(msg, typeof("é".byCodePoint)); // => string The problem is that `byCodePoint(w?string.init)` returns its argument (string/wstring) which of course defines `length`, instead of a wrapper that doesn't. The reason is once again auto-decoding. In std/uni.d(6644): Range byCodePoint(Range)(Range range) if(isInputRange!Range && is(Unqual!(ElementType!Range) == dchar)) { return range; } `Unqual!(ElementType!string)` is of course `dchar`. Brought up in this discussion: http://forum.dlang.org/thread/[email protected]#post-ovzcetxbrdblpmyizdjr:40forum.dlang.org

Comment #1 by schuetzm — 2014-12-07T13:48:55Z

In case it wasn't clear: For strings and wstrings, determining the actual number of code points is an O(n) operation and should therefore not be available via length at all. The current implementation returns the number of code units, not of code points.

Comment #2 by hsteoh — 2014-12-10T15:38:05Z

The documentation of byCodePoint states that it's the identity function when given a range of code points, and currently, strings are ranges of code points (due to autodecoding), so it simply returns the string as-is. Should this be changed so that it returns a wrapper around the string that suppresses .length instead?

Comment #3 by peter.alexander.au — 2014-12-14T19:19:28Z

In a perfect world, I think it should return a different range, but it's now a breaking change, and even breaks its documented behaviour. So I'm voting that this shouldn't be fixed. Note: hasLength will still return false.

Comment #4 by schuetzm — 2014-12-17T14:18:45Z

Comment #5 by dmitry.olsh — 2017-09-11T09:01:49Z

(In reply to Peter Alexander from comment #3) > In a perfect world, I think it should return a different range, but it's now > a breaking change, and even breaks its documented behaviour. So I'm voting > that this shouldn't be fixed. > > Note: hasLength will still return false. Let's us not replicate the broken 'string has no length except it does' stuff even more. If the user says byCodePoint he definetely expects a proper range. I'll change the documentation to reflect this.

Comment #6 by dmitry.olsh — 2017-09-12T14:20:30Z

https://github.com/dlang/phobos/pull/5733

Comment #7 by github-bugzilla — 2017-09-25T16:47:11Z

Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/d46bd62bcaa080ea1bfa19fc8d80359226f304a6 Fix issue 13829 - byCodePoint has length https://github.com/dlang/phobos/commit/4cc17371b0994fe5aa494b800105dcae30ada674 Merge pull request #5733 from DmitryOlshansky/fix-issue-13829 Fix issue 13829 - byCodePoint has length merged-on-behalf-of: Dmitry Olshansky <[email protected]>

Comment #8 by github-bugzilla — 2017-10-16T09:57:56Z

Commits pushed to stable at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/d46bd62bcaa080ea1bfa19fc8d80359226f304a6 Fix issue 13829 - byCodePoint has length https://github.com/dlang/phobos/commit/4cc17371b0994fe5aa494b800105dcae30ada674 Merge pull request #5733 from DmitryOlshansky/fix-issue-13829