Bug 8084 – std.stdio.ByLine is not true input range

Status
RESOLVED
Resolution
WONTFIX
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-05-11T10:55:00Z
Last change time
2012-05-12T19:50:30Z
Assigned to
nobody
Creator
k.hara.pg

Comments

Comment #0 by k.hara.pg — 2012-05-11T10:55:07Z
version(A) and version(B) should output same result, but A is completely broken. import std.array; import std.stdio; void main() { string fname = __FILE__; version(A) { auto lines = File(fname).byLine().array(); foreach (ln; lines) writeln(ln); } version(B) { foreach (ln; File(fname).byLine()) writeln(ln); } }
Comment #1 by schveiguy — 2012-05-11T12:10:30Z
In order for byLine to be efficient, it must reuse the buffer it's reading. In order for the above code to be equivlent, it must dup every line, which is hugely inefficient if you aren't going to use them beyond the scope of the foreach statement.
Comment #2 by timon.gehr — 2012-05-12T07:11:54Z
This is true, but byLine is not a true InputRange, therefore it is questionable whether or not it should expose an InputRange interface. int opApply(scope int delegate(scope string ln)) would be more descriptive.
Comment #3 by braddr — 2012-05-12T09:49:25Z
What part of the definition of an InputRange does it violate?
Comment #4 by k.hara.pg — 2012-05-12T09:55:49Z
(In reply to comment #1) > In order for byLine to be efficient, it must reuse the buffer it's reading. In > order for the above code to be equivlent, it must dup every line, which is > hugely inefficient if you aren't going to use them beyond the scope of the > foreach statement. OK. ByLine.front returns the slice of internal line buffer, so it specifies 'temporary' data. We should get dup in each iteration.
Comment #5 by issues.dlang — 2012-05-12T11:38:24Z
> OK. ByLine.front returns the slice of internal line buffer, so it specifies 'temporary' data. So? It's returning a reference type, so it could change on you. It _is_ potentially problematic in some cases, but I don't see how that makes it so that it isn't an input range > We should get dup in each iteration. It's specifically _avoiding_ that for efficiency purposes. Now, maybe having ByLine's front return the same array with changes values rather than a new array makes it so that it doesn't work with a lot of range-based functions, effectively making it useless as a range, even if it _does_ follow the API. If that's the case, maybe we should make it use opApply for the loop case (having it use the current non-duping behavior) and have front do a dup (and if front as pure, that result could even be implicitly converted to an immutable result, avoiding having to dup it twice to get an immutable copy).
Comment #6 by schveiguy — 2012-05-12T19:38:27Z
ByLine is a true input range, it has empty, popFront and front. If it was not a true input range, foreach would not work on it. I also think isInputRange would pass. The major issue I see with ByLine is that its default type of front is char[]. It really should be const(char)[]. And in fact, I don't think char[] should be possible. Perhaps, this bug can be redone to address that. (In reply to comment #4) > OK. ByLine.front returns the slice of internal line buffer, so it specifies > 'temporary' data. We should get dup in each iteration. If you mean ByLine should dup for each iteration, that is extremely wasteful. What if you only need the data during the loop and not afterward?
Comment #7 by k.hara.pg — 2012-05-12T19:50:30Z
(In reply to comment #5) (In reply to comment #6) > (In reply to comment #4) > > OK. ByLine.front returns the slice of internal line buffer, so it specifies > > 'temporary' data. We should get dup in each iteration. I mean we should dup the front of ByLine in *user code* if it is used after whole iteration, not ByLine should return dup-ed front. And I can agree that ByLine is *true* input range because it has empty, front, and popFront. Therefore, the summary was wrong. Now I think it is correct that this issue was marked as 'resolved wontfix'.