Comment #0 by dlang-bugzilla — 2012-07-13T05:23:29Z
import std.conv;
import std.string;
unittest
{
static void test(T)(T lp)
{
assert(format("%s", lp) == "Hello, world!");
assert(to!string(lp) == "Hello, world!");
}
test("Hello, world!" .ptr);
test("Hello, world!"w.ptr);
test("Hello, world!"d.ptr);
}
wchar* conversion is commonly needed for Windows programming, as UTF-16 is the native encoding for Unicode Windows API functions.
Comment #1 by issues.dlang — 2012-07-13T12:00:53Z
So, you expect %s on a pointer to give you the string that it points to? Why? It's pointer, not a string. It's going to convert the pointer. That works as expected.
to!string should take null-terminated string and give you a string, and it does that. This code passes:
import std.conv;
import std.string;
void main()
{
static void test(T)(T lp)
{
assert(to!string(lp), "hello world");
}
test("Hello, world!" .ptr);
test("Hello, world!"w.ptr);
test("Hello, world!"d.ptr);
}
So, I'd say that as far as your code goes, there's nothing wrong with it. It functions exactly as expected. What _doesn't_ work is this:
import std.conv;
import std.string;
void main()
{
static void test(T)(T lp)
{
assert(to!wstring(lp), "hello world");
assert(to!dstring(lp), "hello world");
}
test("Hello, world!" .ptr);
test("Hello, world!"w.ptr);
test("Hello, world!"d.ptr);
}
The code doesn't even compile, giving these errors:
/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(819): Error: incompatible types for ((cast(immutable(dchar)[])_adDupT(&_D12TypeInfo_Aya6__initZ,value[cast(ulong)0..strlen(cast(const(char*))value)])) ? (null)): 'immutable(dchar)[]' and 'string'
/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(268): Error: template instance std.conv.toImpl!(immutable(dchar)[],immutable(char)*) error instantiating
q.d(8): instantiated from here: to!(immutable(char)*)
q.d(11): instantiated from here: test!(immutable(char)*)
q.d(8): Error: template instance std.conv.to!(immutable(dchar)[]).to!(immutable(char)*) error instantiating
q.d(11): instantiated from here: test!(immutable(char)*)
q.d(11): Error: template instance q.main.test!(immutable(char)*) error instantiating
Comment #2 by dlang-bugzilla — 2012-07-13T13:36:05Z
> to!string should take null-terminated string and give you a string, and it does
> that. This code passes:
Is it something that was fixed recently (within the last two weeks)? My two-week-old dmd git build and dpaste still print offsets for wchar* and dchar*: http://dpaste.dzfl.pl/26a2b284
> So, you expect %s on a pointer to give you the string that it points to? Why?
I think that, before all else, we should be looking for good reasons why format("%s", foo) and to!string(foo) produce different results. Why should one format the offset and the other do a conversion?
Second, I believe that the principle of least surprise is making this case rather clear: if the programmer tries to print a char*, it's almost certain that they want to print the null-terminated string at the given address, rather than a hexadecimal representation of the address (which are rarely useful to the end-user). Generic code is the only exception I can think of, in which case a cast to void* is in order.
> What _doesn't_ work is this:
I think this should call the appropriate toUTFx functions from std.utf.
Comment #3 by dlang-bugzilla — 2012-07-13T13:42:17Z
> I think this should call the appropriate toUTFx functions from std.utf.
Sorry about that, misread your example. I guess, ideally, conversion between any pair of {|w|d}{char*|string} should work.
Comment #4 by issues.dlang — 2012-07-13T13:59:09Z
format and writeln are supposed to behave the same, because they both operate on format strings (they _don't_ currently behave 100% the same, but format's current implementation will be replaced with the new xformat's implementation in a few months - after the "scheduled for deprecation" time period). to!string is an entirely different beast.
std.conv.to is asking for an explicit conversion to string, whereas format and writeln are converting according to the format specifiers, and %s indicates the default string representation of the type. char*, wchar*, and dchar* are pointers - _not_ strings - and should not be treated as strings. Pointers print their address with %s. Making char*, wchar*, and dchar* print themselves as strings would be inconsistent with other pointer types, and operating on char*, wchar*, and dchar* should be discouraged, not encouraged.
to!string is treated differently, because you're asking for an explicit conversion, and we _do_ need to be able to convert null-terminated strings to D strings.
So, while I can see your point, I really don't think that having format or writeln treat char*, wchar*, or dchar* as null-terminated strings is a good idea. We should provide a means of converting them to D strings but not do anything to encourage using them as-is without converting them.
Comment #5 by dlang-bugzilla — 2012-07-13T14:25:36Z
OK, fair enough.
I've updated the enhancement request's title according to my previous comment.
Test:
-----------------------------------------------------------------------------
import std.conv;
void test1(T)(T lp)
{
test2!( string)(lp);
test2!(wstring)(lp);
test2!(dstring)(lp);
test2!( char*)(lp);
test2!( wchar*)(lp);
test2!( dchar*)(lp);
}
void test2(D, S)(S lp)
{
D dest = to!D(lp);
assert(to!string(dest) == "Hello, world!");
}
unittest
{
test1("Hello, world!" );
test1("Hello, world!"w);
test1("Hello, world!"d);
test1("Hello, world!" .ptr);
test1("Hello, world!"w.ptr);
test1("Hello, world!"d.ptr);
}
Comment #6 by dlang-bugzilla — 2012-07-13T14:31:04Z
Oh, I forgot about constness.
I guess that raises the number of combinations to (2*3*3)^2 = 324.
Comment #7 by code — 2012-07-13T14:37:07Z
Hooray for using "static" foreach to conveniently enumerate all the cases to test!
Comment #8 by issues.dlang — 2012-07-13T14:48:31Z
> Hooray for using "static" foreach to conveniently enumerate all the cases to
test!
Yeah. I do that all of the time when I have to test with multiple types (especially with strings), and I always push for string-related tests to do that when I see that someone is looking to submit code to Phobos for a function that takes one or more strings as templated types, and their tests don't do that. It's just one of those things that everyone who writes much in the way of unit tests in D should learn and know about.
Comment #9 by dlang-bugzilla — 2012-08-15T13:24:08Z
Another case of confusion due to format treating C strings as pointers:
http://stackoverflow.com/q/11975353/21501
I still think that the current behavior, regardless of how much it makes sense from a design/consistency/orthogonality/etc. perspective, is simply not useful and fails the principle of least surprise in most expected cases.
I strongly believe that we should either forbid passing char pointers to format/writeln (and force the user to cast to void* or convert to a D string), or print them as C null-terminated strings.
Comment #10 by issues.dlang — 2012-08-15T13:35:28Z
char* acts identically to the other pointer types, and I fully believe that it should stay that way. We've pretty much removed all of the D features which involved either treating a string as char* or a char* as a string (including disallowing implicit conversion of string to const char*). The _only_ feature that the language has which supports that is the fact that string literals have a null character one past their end and will implicitly convert to const char*.
It would be a huge mistake IMHO to support doing _anything_ with character pointers which treats them as strings without requiring an explicit conversion of some kind. Anyone who continues to think of char* as being a string in D is just asking for trouble. They need to learn to use strings correctly.
If you really want to use char* as a string in functions like format or writeln, then simply either use to!string or ptr[0 .. strln(ptr)].
Comment #11 by dlang-bugzilla — 2012-08-15T13:48:30Z
Sorry, I don't think that your categorical point of view is constructive. As long as D will interface with C libraries and programs, people will continue to attempt to use C strings together or in place of D strings, and issues like the above will continue to appear.
How often would a typical D user want to print / format the address of a character, versus the null-terminated string at that address?
> It would be a huge mistake IMHO to support doing _anything_ with character
> pointers which treats them as strings without requiring an explicit conversion
> of some kind.
Why would it be a mistake? What exactly do we lose by allowing writeln/format to understand C strings?
> Anyone who continues to think of char* as being a string in D is
> just asking for trouble.
What kind of trouble?
> They need to learn to use strings correctly.
D printing an address when text was expected will sooner generate a "D sucks" reaction than a "Oops, I need to learn to use strings correctly" one.
> If you really want to use char* as a string in functions like format or
writeln, then simply either use to!string or ptr[0 .. strln(ptr)].
That's not really simple, considering some spots where that (verbose) modification needs to be made would be discovered only late at runtime, and even then the actual problem is not obvious to identify (as seen in the SO question above).
Comment #12 by dlang-bugzilla — 2012-08-15T13:56:00Z
I would like to stress out a point that I hope could clear up my view of the logic that writeln/format should use.
Printing/formatting memory addresses is extremely rarely useful!
Except for some dirty debugging, I can't imagine a case where the user expects that passing a pointer to something to format would yield the hex representation of that address.
I believe that printing a pointer as a hex address should be the fallback, last-resort behavior, if there is no better representation for the said type. (This also allows discussion of calling toString() on struct pointers.)
For the rare case that the user intends to actually print a pointer, this is easily accomplished by a cast to size_t and using the appropriate hex format specifier.
Comment #13 by issues.dlang — 2012-08-15T13:57:15Z
Anyone who does not understand that char* is _not_ a string will continue to make mistakes like trying to concatenate a char* to a string ( http://stackoverflow.com/questions/11914070/why-can-i-not-concatenate-a-constchar-to-a-string-in-d ) or try and pass string directly to a C function. They will constantly run into problems when dealing with strings. char* is _not_ a string and should not be treated as such. Treating it as a string with something like writeln will just help further the misconception that char* is a string and hinder people learning and using D. D programmers need to understand the difference between char* and string. char* should _not_ be treated as special, because it's not.
Comment #14 by dlang-bugzilla — 2012-08-15T14:01:42Z
First of all, you are conflating ignorance between the two string types with my arguments. Users who are aware that D has its own way of handling strings are still open to making frustrating mistakes.
Second, getting unexpected output is not a good way to teach people about this. Hence my earlier proposal to make writeln/format REJECT char pointer types, on the basis that the user's intention is ambiguous (I don't think so personally, but obviously that's just my opinion).
Comment #15 by issues.dlang — 2012-08-15T14:06:49Z
I'm saying that we shouldn't treat char* differently from int* just because some newbies expect char* to act like a string. And if you know D, then you know that char* is _not_ a string, and I don't see how you could expect it to be treated as one. Either making char* act like a string or disallowing printing it would make it act differently from other pointer types just to appease the folks who mistakingly think that char* is a string.
Comment #16 by dlang-bugzilla — 2012-08-15T14:08:44Z
Well, then how about removing the pointer-printing feature entirely, and issue a compile-time error on all pointer types?
Comment #17 by dlang-bugzilla — 2012-08-15T14:12:50Z
> And if you know D, then you know that char* is _not_ a string,
> and I don't see how you could expect it to be treated as one.
I don't think this argument is valid, because it assumes that all D users are always aware of the types they pass to writeln/format. In the SO case, the argument is a function result, and the function's return type is not explicitly written in the user's code.
People often expect the compiler to shout at them if they try to pass incompatible types to a function. writeln/format accept char pointers, but ultimately do something with them that in 99% of cases is simply not useful, and put the user in search of their mistake all across the data flow.
Comment #18 by destructionator — 2012-08-15T14:34:54Z
I think rejecting might be the best option because if you treat it as a string, what if it doesn't have a 0 terminator? That could easily happen if you pass it a pointer to a D string.
I don't think that is technically un-@safe, but it could be a problem anyway to get an unexpected crash because of it. At least with to!string(char*) you might think about it for a minute and avoid the problem.
So on one hand, I think it should just work, but on the other hand the compile time error might be the most sane.
Comment #19 by issues.dlang — 2012-08-15T14:40:14Z
> Well, then how about removing the pointer-printing feature entirely, and issue
a compile-time error on all pointer types?
So, you're suggesting that we remove a useful feature because newbies coming from C/C++ keep mistakingly thinking that char* is a string?
Comment #20 by dlang-bugzilla — 2012-08-15T14:44:20Z
Your formulation is misrepresenting the weight of the scales. Please seriously take into account the overall benefit for D for both decisions. The feature is nearly useless and more harmful, and "newbies coming
from C/C++" is, again, a misrepresentation as discussed above. It is also incorrect - someone used to e.g. using SDL bindings on another language may expect that the types returned by the binding would be compatible with the language's native functionality.
Comment #21 by andrej.mitrovich — 2013-01-13T10:34:43Z
*** Issue 6157 has been marked as a duplicate of this issue. ***
Comment #22 by andrej.mitrovich — 2013-01-13T10:35:51Z
(In reply to comment #21)
> *** Issue 6157 has been marked as a duplicate of this issue. ***
FYI: http://d.puremagic.com/issues/show_bug.cgi?id=6157 has an experimental implementation in the attachment (for conv.to), but I'm not an expert on things unicode.
Comment #23 by dfj1esp02 — 2014-02-20T10:21:15Z
(In reply to comment #19)
> So, you're suggesting that we remove a useful feature because newbies coming
> from C/C++ keep mistakingly thinking that char* is a string?
char* is the way to represent null-terminated strings and C programmers are not mistaken in that.
As to the useful feature, it can be done with %p format specifier - that's what printf does.
Comment #24 by simen.kjaras — 2016-04-14T21:41:58Z
https://github.com/D-Programming-Language/phobos/pull/4199
PR covers conversion from {X}char* to {Y}char[], but not the other way around. no such conversions are currently supported at all, so took the liberty of not implementing that without a bit more discussion.
Are there convincing reasons to support any of those conversions at all?
Comment #25 by github-bugzilla — 2016-04-26T20:14:12Z