Bug 19518 – std.range.front() returns a dchar when applied to char[]
Status
RESOLVED
Resolution
INVALID
Severity
normal
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2018-12-26T23:34:55Z
Last change time
2020-03-21T03:56:40Z
Assigned to
No Owner
Creator
Vijay Nayar
Comments
Comment #0 by madric — 2018-12-26T23:34:55Z
Consider the following program:
```
import std.range;
void main()
{
char[] data = ['a', 'b', 'c'];
char a = data.front();
}
```
While std.range.front() works fine with most array types, there seems to be a problem with using char[] types. The above program actually produces a compiler error:
```
onlineapp.d(5): Error: cannot implicitly convert expression `front(data)` of type `dchar` to `char`
```
The workaround is to not use std.range.front(), but rather use basic array indexing, e.g. `data[0]`.
Comment #1 by b2.temp — 2018-12-27T06:10:19Z
This is not a bug. D standard library auto decoded input ranged of char and wchar so their ElementEncodingType is dchar. The reasoning behind is this is that imagine an array such as
['é','µ','ç'] (which is somewhat equivalent to the string "éµç".dup btw)
You'd expect 3 elements, not 6. So if you want to get rid of decoding, cast your array as ubyte[] (or use std.range.byCodeUnit)
Comment #2 by madric — 2018-12-27T09:08:05Z
That makes sense for character processing. Perhaps my understanding of what .front() and .popFront() do is incorrect then. I had assumed that they were general purpose range methods that could also be used on arrays to treat them like ranges as well.
In this particular case, I was implementing a DenseHashSet algorithm, optimized for low memory overhead, when during my unittests, I discovered that they were failing when I made a set of characters. The reason was that my template code was using .front() to manage an internal array.
That may be the dilemma. What does the user have in mind when they use 'char'? Is it strictly for unicode text processing, or is it piece of data with a well defined size? Is it incumbent upon those who use templates to not use 'char' for data in templates (and type-cast bytes), or is it incumbent upon template writers to always consider this special case?
Or is this just the wrong usage of .front(), and array indexing, like data[0], should be preferred?
Comment #3 by b2.temp — 2019-02-14T06:24:24Z
it was for phobos anyway.
Comment #4 by dfj1esp02 — 2019-02-14T08:58:30Z
One possible solution is to publish a fork of std.range that treats text as array of code units and use it instead of phobos std.range.
Comment #5 by greeenify — 2019-02-14T09:06:40Z
Or use .byCodeUnit, .byChar, . representation, or the upcoming rcstring ;-)
Comment #6 by madric — 2019-02-14T09:54:39Z
I think the tricky case is not so much when one begins and ends thinking of character processing, but when one is writing a generic algorithm using templates that makes use of std.range.front.
A template that takes a range type and an element and works with them will function fine in most cases for most types when they make use of ".front()" in their algorithms.
But as it stands right now, if anyone attempts to use said template with a `char` type, the template will no longer compile, because '.front()' returns a different element type than the range.
This means that either '.front()' shouldn't be used in generic algorithms that need to pull an element out of the range, in favor or something like '[0]', or it means that algorithm writers need to make `char` a special case in any algorithm they write.
I don't actually have a good answer for what approach is best.
Comment #7 by greeenify — 2019-02-14T10:11:08Z
Well, we all agree that it's not super nice, but it also has advantages.
Take e.g. 'ü'. If .front would only return a char, you would get the invalid utf symbol. Try printing "ü"[0]
Yes, it has downsides though, but with ElementType!R or auto must generic algorithms don't care about the actual return of .front and if they do, they need special casing for strings anyhow.
Auto-decoding by default is considered as the top2 design error of D, but it's super hard to fix it now. The solutions so far are:
- fork std.range
- use byCodeUnit or similar
- use rcstring (or similar)
If you come up with a better idea, please share it in the NG, but we can't change std.range.front because it would break room of code. Thanks!