Bug 5173 – std.process.shell cannot handle non-UTF8 output

Status
RESOLVED
Resolution
WORKSFORME
Severity
minor
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2010-11-05T12:15:00Z
Last change time
2014-02-10T07:20:24Z
Keywords
patch, wrong-code
Assigned to
nobody
Creator
lars.holowko

Attachments

IDFilenameSummaryContent-TypeSize
801std.file.readtext_utf_aware.dreplacement std.file.readText that would fix the issuetext/plain2742

Comments

Comment #0 by lars.holowko — 2010-11-05T12:15:15Z
std.process.shell dies with an exception when the utility returns UTF-16. for example: import std.process, std.stdio, std.string; int main(string[] args) { auto output = shell("wmic NTDOMAIN GET DomainName /value"); writefln("Output: %s", output); return 0; } produces this output: dchar decode(in char[], ref size_t): Invalid UTF-8 sequence [255, 254, 13, 0, 10, 0, 13, 0, 10, 0, 68, 0, 111, 0, 109, 0, 97, 0, 105, 0, 110, 0, 78, 0, 97, 0, 109, 0, 101, 0, 61, 0, 13, 0, 10, 0, 13, 0, 10, 0, 13, 0, 10, 0] around index 0 wmic's output looks like UTF-16(little endian). As a work-around, if I modify std.process.shell slightly to use a wstring instead: import std.array, std.random, std.file, std.format, std.exception; wstring shell2(string cmd) { auto a = appender!string(); foreach (ref e; 0 .. 8) { formattedWrite(a, "%x", rndGen.front); rndGen.popFront; } auto filename = a.data; scope(exit) if (exists(filename)) remove(filename); errnoEnforce(system(cmd ~ "> " ~ filename) == 0); return readText!wstring(filename); } things seem to work for this case. But a proper fix would be to make readText try to determine the encoding based on the prefix and then do the necessary conversion before calling std.utf.validate. readText currently looks like this; S readText(S = string)(in char[] name) { auto result = cast(S) read(name); std.utf.validate(result); return result; }
Comment #1 by lars.holowko — 2010-11-05T12:16:25Z
forgot to mention: this is on 2.050
Comment #2 by lars.holowko — 2010-11-05T16:47:38Z
Created attachment 801 replacement std.file.readText that would fix the issue the attached std.file.readText function implements uses the UTF encoding detection "algorithm" described in TDPL and does the necessary conversions to fix the described bug.
Comment #3 by dlang-bugzilla — 2014-02-10T07:20:24Z
Closing this, as the shell() function is now deprecated, and the problem does not manifest when using its replacement, executeShell. However, the fact that readText does not parse UTF-8 BOMs is probably an issue of its own, and should be filed separately.