Bug 14861 – Error in stdio.d in LockingTextReader.readFront()
Status
RESOLVED
Resolution
FIXED
Severity
regression
Priority
P1
Component
phobos
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2015-08-02T06:22:00Z
Last change time
2016-01-03T14:14:51Z
Keywords
pull
Assigned to
ag0aep6g
Creator
mgw
Comments
Comment #0 by mgw — 2015-08-02T06:22:21Z
This mistake, is result of wrong algorithm with reading from the ring buffer of the file and return of the read symbols there in stdio function ungetc().
This example doesn't work!
// dmd 2.067.1 Win 32
import std.stdio;
void main(string[] args) {
File fw = File("panic.csv", "w");
for(int i; i != 5000; i++) fw.writeln(i, ";", "Иванов;Пётр;Петрович");
fw.close();
// Test read
File fr = File("panic.csv", "r");
int nom; string fam, nam, ot;
// Error format read
while(!fr.eof) fr.readf("%s;%s;%s;%s\n", &nom, &fam, &nam, &ot);
}
Comment #1 by mgw — 2015-08-02T18:59:45Z
Size ring buffer = 16384 bytes in struct _iobuf*. If Utf-8 sequence is broken off on buffer boundary, it is impossible to return earlier read characters, as in the buffer absolutely other data.
Comment #2 by dlang-bugzilla — 2015-09-01T06:57:21Z
Reduced test case:
----
import std.stdio;
import std.array: replicate;
void main()
{
File fw = File("panic.csv", "w");
fw.rawWrite("a".replicate(16383) ~ "\xD1\x91\xD1\x82");
/* \xD1\x91 = U+0451 CYRILLIC SMALL LETTER IO */
/* \xD1\x82 = U+0442 CYRILLIC SMALL LETTER TE */
fw.close();
File fr = File("panic.csv", "r");
fr.rawRead(new char[16383]);
auto ltr = LockingTextReader(fr);
assert(ltr.front == '\u0451'); /* passes */
ltr.popFront(); /* "Invalid UTF-8 sequence" */
assert(ltr.front == '\u0442');
ltr.popFront();
assert(ltr.empty);
}
----
LockingTextReader essentially does this:
----
auto fps = fr.getFP();
auto fp = cast(_iobuf*) fps;
assert(FGETC(fp) == '\xD1'); /* passes */
assert(FGETC(fp) == '\x91'); /* passes */
assert(ungetc('\x91', fps) == '\x91'); /* passes */
assert(ungetc('\xD1', fps) == '\xD1'); /* passes */
assert(FGETC(fp) == '\xD1'); /* passes */
assert(FGETC(fp) == '\x91'); /* fails */
----
The problem is that ungetc is called multiple times. Apparently, the Windows 32 C runtime doesn't like that under these specific circumstances.
Checking the documentation for ungetc, calling it more than once is actually not guaranteed to work.
Here's a PR that replaces the ungetc calls with ftell/fseek:
https://github.com/D-Programming-Language/phobos/pull/3622