Bug 20373 – Line counter with async Buffer

Status
RESOLVED
Resolution
INVALID
Severity
enhancement
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Windows
Creation time
2019-11-08T14:59:49Z
Last change time
2019-11-08T16:41:51Z
Assigned to
No Owner
Creator
bioinfornatics

Comments

Comment #0 by bioinfornatics — 2019-11-08T14:59:49Z
I tried to use the asyncBuf to speed file processing as it is described into the documentation: https://dlang.org/phobos/std_parallelism.html#.TaskPool.asyncBuf.2 Thus I use one script to generate file to a given size: - https://paste.fedoraproject.org/paste/0zCnwLcPLpalAE7q0BDnyQ usage: file_generator -o test_11k -w 11k file_generator -o test_401k -w 401k And another one which count line using an async buffer: - https://paste.fedoraproject.org/paste/HW8Ti4rqLBVyvDQOD~GyMw usage: counter_async_buffer -n 1 -t 1 -i test_11k counter_async_buffer -n 1 -t 1 -i test_401k The problem comes when I process the test_401k file, as the line counted is wrong (I checked using wc -l) after a closer look it seem that comes from the reused buffer of asyncBuff which at last iteration the result do not own a lesser size than the requested buffer size as explained into the documentation. Indeed in this file the end is: 1 190121746114132251381321230342516302196252336238211523943272873744285119323293314107 316322221221132661353262123081115418570291330356278322215013742329426 714213310231111593822146521912312869120169289362332157427352432313112226373403123825 6812101511112462691294263101232 90182312212511430133514352114282271133753782360462351124233948222161956731321 11822481725231121323330910521376234322119392811262411335432102108273 463633112212153255811679207 while the code give: 1 190121746114132251381321230342516302196252336238211523943272873744285119323293314107 316322221221132661353262123081115418570291330356278322215013742329426 714213310231111593822146521912312869120169289362332157427352432313112226373403123825 6812101511112462691294263101232 90182312212511430133514352114282271133753782360462351124233948222161956731321 11822481725231121323330910521376234322119392811262411335432102108273 4636331122121532558116792071512 1553203330299234738282167126033321272154232912111312416461868182323242 111932223509162312223104310231321116573254736811479599513441112318312221230321154 11363210249282132717260536372386522748224512596323581311 121932131303243861212327470532121636029110222323531121763 2499315312114672213683218207122451351984311032612832096363463812 .... so they are an extra content
Comment #1 by dlang-bugzilla — 2019-11-08T15:25:04Z
> auto asyncReader = taskPool.asyncBuf((ref ubyte[] buf) => file.rawRead(buf), I think this should be: > auto asyncReader = taskPool.asyncBuf((ref ubyte[] buf) { buf = file.rawRead(buf); }, See https://dlang.org/library/std/stdio/file.raw_read.html
Comment #2 by bioinfornatics — 2019-11-08T15:47:34Z
Thanks Vladimir Indeed this fix imply a correct number of line counted. It is still much slower than wc but it works thanks
Comment #3 by dlang-bugzilla — 2019-11-08T15:48:28Z
Marking as invalid as there is no bug in any D components.
Comment #4 by bioinfornatics — 2019-11-08T16:08:36Z
ok Vladimir, maybe to provide documentation more clear?
Comment #5 by dlang-bugzilla — 2019-11-08T16:41:51Z
The documentation looks good to me as it is, but if you can think of some way to improve it that would have avoided this mistake, please do submit a pull request.