Bug 8642 – Fix `fopen` and friends signatures on Windows to not accept `char*`

Status
RESOLVED
Resolution
WONTFIX
Severity
major
Priority
P2
Component
druntime
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-09-11T10:39:00Z
Last change time
2012-09-11T20:16:31Z
Assigned to
nobody
Creator
verylonglogin.reg

Comments

Comment #0 by verylonglogin.reg — 2012-09-11T10:39:07Z
`fopen` and friends are really nasty sources of unportable code and encoding issues on Windows. I hope eventually we will change its signatures to not accept `char*` with our usual deprecation process. These functions work on POSIX systems and work-in-many-cases on Windows (read: hard to debug) so the situation is too dangerous to continue ignoring it (how about to count druntime/Phobos bugs because of misunderstanding of this issue?)
Comment #1 by issues.dlang — 2012-09-11T10:47:12Z
fopen is standard C, and druntime simply provides the bindings for the C library as well as the system calls specific to the OS. Most code should be using D functions, not the C ones anyway. Providing bindings to standard C functions or system call APIs is _not_ a bug. If you don't want to use them, don't use them.
Comment #2 by verylonglogin.reg — 2012-09-11T11:00:12Z
> druntime simply provides the bindings for the C library With wrong signatures.
Comment #3 by issues.dlang — 2012-09-11T14:38:12Z
In what way are the signatures wrong? According to the Linux man page, digitalmars' documentation, _and_ MSDN, fopen is FILE *fopen(const char *path, const char *mode); And in druntime, it's FILE* fopen(in char* filename, in char* mode); The only difference is that in is const scope instead of just const. in shouldn't have been used, but it won't affect the kind of stuff that you're talking about.
Comment #4 by verylonglogin.reg — 2012-09-11T14:50:52Z
The meaning of `char` is different. It is ASCII for C standard, CP_ACP for Windows, and UTF-8 for D and POSIX systems.
Comment #5 by issues.dlang — 2012-09-11T14:54:16Z
> The meaning of `char` is different. > It is ASCII for C standard, CP_ACP for Windows, and UTF-8 for D and POSIX > systems. That doesn't change the function signature, just what encoding you should be passing in.
Comment #6 by verylonglogin.reg — 2012-09-11T15:00:07Z
> That doesn't change the function signature, just what encoding you should be > passing in. `char` is UTF-8 codepoint in D. It is specified and there is no choice. And you know it. So I don't understand your comment.
Comment #7 by issues.dlang — 2012-09-11T15:13:21Z
> `char` is UTF-8 codepoint in D. It is specified and there is no choice. And you know it. So I don't understand your comment. Yes. But it's expected to use char when dealing with C's char. Any necessary conversion should be done at the call site. At most what you'd do is make C signature's take ubyte instead of char, which would cause all kinds of confusion. Yes. You need to be careful when passing strings to C functions which take char, because Microsoft was stupid, and ideally you'd use the w* functions on Windows precisely because of this nonsense, but that's something that the programmer needs to know and handle appropriately. The function signature itself is fine. Making it ubyte wouldn't solve anything, and you'd basically be arguing that ubyte should always be used instead of char in C bindings, and I don't think that you're going to find much traction on that. Another thing to remember is that we currently use digitalmars' C runtime, so stuff like fopen is provided by _it_ and not Microsoft, which could introduce its own set of quirks (and also means that the w* functions aren't even available for anything in the C runtime). And that situation is about to become that much more complicated when we start supporting Microsoft's runtime for 64-bit. If there's a bug, it's in the usage of fopen and friends, not in fopen itself (unless you count Microsoft's choice of CP_ACP as a bug, but that's not in our control in either case).
Comment #8 by verylonglogin.reg — 2012-09-11T15:23:02Z
> Making it ubyte wouldn't solve anything, and you'd basically be > arguing that ubyte should always be used instead of char in C bindings, and I > don't think that you're going to find much traction on that. IMHO it's a solution for such cases. Just my opinion. > the w* functions aren't even available for anything in the C runtime They are, by the way (see Issue 8643). MinGW also provides these functions. It looks like they are 'unofficial' standard of C file IO on Windows.
Comment #9 by bugzilla — 2012-09-11T15:55:14Z
C Standard library functions have always had character encoding issue problems, as there are innumerable encodings that C calls "char", including UTF-8 encoding. D has a policy of not attempting to fix, refactor, reengineer, paper over, improve, etc., Standard C functions nor operating system API functions. D merely provides a straightforward, direct call to them. It's up to the caller of those functions to understand them and call them correctly.
Comment #10 by verylonglogin.reg — 2012-09-11T16:00:31Z
> D merely provides a straightforward, direct call to them. And this is the point. We understand it in different ways.
Comment #11 by issues.dlang — 2012-09-11T20:16:31Z
>> D merely provides a straightforward, direct call to them. > And this is the point. We understand it in different ways. I have no idea how you could misunderstand that. I only see one way to interpret that, which is that we simply provide C bindings and you have to deal with whatever quirks the C function has, just like you would have to in C. The bindings in druntime are provided so that Phobos can build better abstractions on them and so that D programmers have direct access to system functions where necessary. If you want a cleaner API around a C function, then create a D wrapper. In the case of fopen, that's done with std.stdio.File. We're not trying to clean up C APIs or make them easy-to-use, just provide bindings for them.