Comment #0 by bioinfornatics — 2019-11-08T14:59:49Z
I tried to use the asyncBuf to speed file processing as it is described into the documentation: https://dlang.org/phobos/std_parallelism.html#.TaskPool.asyncBuf.2
Thus I use one script to generate file to a given size:
- https://paste.fedoraproject.org/paste/0zCnwLcPLpalAE7q0BDnyQ
usage: file_generator -o test_11k -w 11k
file_generator -o test_401k -w 401k
And another one which count line using an async buffer:
- https://paste.fedoraproject.org/paste/HW8Ti4rqLBVyvDQOD~GyMw
usage: counter_async_buffer -n 1 -t 1 -i test_11k
counter_async_buffer -n 1 -t 1 -i test_401k
The problem comes when I process the test_401k file, as the line counted is wrong (I checked using wc -l)
after a closer look it seem that comes from the reused buffer of asyncBuff
which at last iteration the result do not own a lesser size than the requested buffer size as explained into the documentation.
Indeed in this file the end is:
1
190121746114132251381321230342516302196252336238211523943272873744285119323293314107
316322221221132661353262123081115418570291330356278322215013742329426
714213310231111593822146521912312869120169289362332157427352432313112226373403123825
6812101511112462691294263101232
90182312212511430133514352114282271133753782360462351124233948222161956731321
11822481725231121323330910521376234322119392811262411335432102108273
463633112212153255811679207
while the code give:
1
190121746114132251381321230342516302196252336238211523943272873744285119323293314107
316322221221132661353262123081115418570291330356278322215013742329426
714213310231111593822146521912312869120169289362332157427352432313112226373403123825
6812101511112462691294263101232
90182312212511430133514352114282271133753782360462351124233948222161956731321
11822481725231121323330910521376234322119392811262411335432102108273
4636331122121532558116792071512
1553203330299234738282167126033321272154232912111312416461868182323242
111932223509162312223104310231321116573254736811479599513441112318312221230321154
11363210249282132717260536372386522748224512596323581311
121932131303243861212327470532121636029110222323531121763
2499315312114672213683218207122451351984311032612832096363463812
....
so they are an extra content
Comment #1 by dlang-bugzilla — 2019-11-08T15:25:04Z
> auto asyncReader = taskPool.asyncBuf((ref ubyte[] buf) => file.rawRead(buf),
I think this should be:
> auto asyncReader = taskPool.asyncBuf((ref ubyte[] buf) { buf = file.rawRead(buf); },
See https://dlang.org/library/std/stdio/file.raw_read.html
Comment #2 by bioinfornatics — 2019-11-08T15:47:34Z
Thanks Vladimir
Indeed this fix imply a correct number of line counted.
It is still much slower than wc but it works
thanks
Comment #3 by dlang-bugzilla — 2019-11-08T15:48:28Z
Marking as invalid as there is no bug in any D components.
Comment #4 by bioinfornatics — 2019-11-08T16:08:36Z
ok Vladimir, maybe to provide documentation more clear?
Comment #5 by dlang-bugzilla — 2019-11-08T16:41:51Z
The documentation looks good to me as it is, but if you can think of some way to improve it that would have avoided this mistake, please do submit a pull request.