In using http://dlang.org/phobos/std_stdio.html#.File.byChunk, it turns out to be rather clumsy to get things character-by-character, one has to write a loop. This is not how ranges are supposed to work.
It exposes a more general problem - given a Range of a Range of Elements, how does one iterate over Elements?
The solution is a new algorithm - iterate.
And that's all it does - one could write .byChunk.iterate and voila! one is getting ubytes by ubyte. iterate takes a template argument of the number of Elements it should produce for each front():
.byChunk.iterate!4 // get ubyte[4]
iterate should produce results by value, not by ref. This is because byChunk produces references to ephemeral data.
iterate asserts if the number of elements does not evenly divide into the .byChunk size (or should it throw?). Obviously, by 1 should not assert or throw.
Pulling data out 4 bytes at a time is useful, for example, to read data as a sequence of ints.
iterate should be lazy.
Comment #1 by eco — 2014-10-09T00:22:57Z
> It exposes a more general problem - given a Range of a Range of Elements, how does one iterate over Elements?
Well, rng.joiner.chunks(4) comes to mind normally but that, of course, breaks down is with things like byChunk and byLine where the buffer is overwritten as you process the range.
Dmitry had an idea for how to solve it so you could do things much more naturally while still being extremely efficient.
The thread about it:
http://forum.dlang.org/post/[email protected]
His idea kind of evolves as the thread goes on so read more than the first post. As far as I know he's still planning on hammering out the details and making another proposal.
If memory serves me, with his proposal end users would be able to just do rng.chunks(4) to accomplish this (with chunks implementation taking advantage of the new buffer range primitives).
Comment #2 by monarchdodra — 2014-10-09T08:04:10Z
(In reply to Walter Bright from comment #0)
> In using http://dlang.org/phobos/std_stdio.html#.File.byChunk, it turns out
> to be rather clumsy to get things character-by-character, one has to write a
> loop. This is not how ranges are supposed to work.
Yeah, somebody in learn recently asked how to read a file character by character or byte by byte. I also realized we provide no D interface for that, let alone range interface.
> It exposes a more general problem - given a Range of a Range of Elements,
> how does one iterate over Elements?
>
> The solution is a new algorithm - iterate.
>
> And that's all it does - one could write .byChunk.iterate and voila! one is
> getting ubytes by ubyte.
std.algorithm.joiner does exactly that.
auto joiner(RoR)(RoR r)
> iterate takes a template argument of the number of
> Elements it should produce for each front():
>
> .byChunk.iterate!4 // get ubyte[4]
>
> iterate should produce results by value, not by ref. This is because byChunk
> produces references to ephemeral data.
byChunks also does this, with the slight differences that:
1) It is a run-time length
2) It returns a sub-range, rather than a static array
That said, by a "byStaticChunk" sounds like a good idea? Either way I think it is better to compose ranges with individual jobs, rather than trying to have a single range do 2 different jobs.
I'd just worry a bit if anybody asked for byStaticChunks!1024, performance wise, what with us returning a static array and all...
> iterate asserts if the number of elements does not evenly divide into the
> .byChunk size (or should it throw?). Obviously, by 1 should not assert or
> throw.
I think that's a bad idea: If you are reading your file (input range) 4 bytes by 4 bytes, how do you know ahead of time, that it will have %4 elements?
And even if you know ahead of time your file is 251 bytes big, then what? Maybe instead returning a Tuple!(T[N], uint size) could be a better idea?
Comment #3 by bugzilla — 2014-10-09T09:09:32Z
Given that joiner should do this job (thanks for letting me know) how about updating the std.stdio doc to show how to use this? It only shows loops with byChunk. It's not at all obvious that .joiner is the answer.
Comment #4 by github-bugzilla — 2015-03-22T20:42:37Z