Bug 5354 – formatValue: range templates introduce 3 bugs related to class & struct cases

Status
RESOLVED
Resolution
FIXED
Severity
major
Priority
P3
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2010-12-15T01:36:00Z
Last change time
2012-06-11T23:40:44Z
Keywords
patch, spec
Assigned to
andrei
Creator
denis.spir

Attachments

IDFilenameSummaryContent-TypeSize
849format.d.patchproposed patchtext/plain1122
850rbug.dtestcase for various typestext/plain1900

Comments

Comment #0 by denis.spir — 2010-12-15T01:36:05Z
formatValue: range templates introduce 3 bugs related to class & struct cases This issue concerns class case, the struct case, and the 3 range cases of the set of formatValue templates in std.format. As this set is currently written and commented (1), it seems to be intended to determine the following cases (about class/struct/range only): * An input range is formatted like an array. * A class object is formatted using toString. * A struct is formatted: ~ using an input range interface, if it implements one, ~ using toString, if it defines it, ~ in last resort, using the type's 'stringof' property. To be short: I think the right thing to do is to remove range cases. Explanations, details, & reasoning below. In the way the set of templates is presently implemented, and because of how template selection works (as opposed to inheritance, eg), the following 3 bugs come up: 1. When a class defines an input range, compiler-error due to the fact that both class and input range cases match: /usr/include/d/dmd/phobos/std/format.d(1404): Error: template std.format.formatValue(Writer,T,Char) if (is(const(T) == const(void[]))) formatValue(Writer,T,Char) if (is(const(T) == const(void[]))) matches more than one template declaration, /usr/include/d/dmd/phobos/std/format.d(1187):formatValue(Writer,T,Char) if (isInputRange!(T) && !isSomeString!(T) && isSomeChar!(ElementType!(T))) and /usr/include/d/dmd/phobos/std/format.d(1260):formatValue(Writer,T,Char) if (is(T == class)) This, due to inheritance from Object, even if no toString is _explicitely_ defined. 2. For a struct, a programmer-defined output format in toString is shortcut if ever the struct implements a range interface! 3. If a range's element type (result type of front) is identical to the range's own type, writing runs into an infinite loop... This is well possible, for instance a textual type working like strings in high-level/dynamic languages (a character is a singleton string). To solve these bugs, I guess the following changes would have to be done: * The 3 ranges case must have 2 additional _negative_ constraints: ~ no toString defined on the type ~ (ElementType!T != T) * The struct case must be split in 2 sub-cases: ~ use toString if defined ~ [else use range if defined, as given above] ~ if neither toString nore range, use T.stringof I have tried to implement and test this modif, but ran into build errors (seemingly unrelated, about isTuple) I could not solve. Now, I think it is worth wondering whether all these complications, only to have _default_ formatValue's for input ranges, is worth it at all. On one hand, in view of the analogy, it looks like a nice idea to have them expressed like arrays. On the other, when can this feature be useful? An first issue comes up because there is no way, AFAIK, to tell apart inherited and explicite toString methods of classes: is(typeof(val.toString() == string)) is always true for a class. So that the range case would never be triggered for classes -- only for structs. So, to use this feature, (1) the type must be a struct (2) which defines no toString (3) whch implements a range interface, and (4) the range's element type must not be the range type itself. In addition, the most sensible output form for it should be precisely the one of an array. Note that unlike for structs, programmers cannot define custom forms of array output ;-) This is the reason why a default array format is so helpful -- but this reason does not exist for structs, thank to toString (and later writeTo). If no default form exists for ranges, then in the rare cases where a programmer would implement a range interface on a struct _and_ need to re-create an array-like format for it, this takes a few lines in toString, for instance: string toString () { string[] contents = new string[this.elements.length]; foreach (i,e ; this.elements) contents[i] = to!string(this.elements[i]); return format("[%s]", join(contents, ", ")); } As a conclusion, I would recommend to get rid of the (3) range cases in the set of formatValue templates. (This would directly restore correctness, I guess --showing that range cases where probably added later.) (1) There is at least a doc/comment error, namely for the struct case (commentted as AA instead). Also, the online doc does not hold template constraints, so that it is not possible to determine which one is selected in given situations. Denis
Comment #1 by denis.spir — 2010-12-15T01:45:30Z
started thread: http://lists.puremagic.com/pipermail/digitalmars-d/2010-December/090043.html I marked the bug(s) with keyword 'spec', as it depends on: how do we want struct/class/range formatting semantics to be?
Comment #2 by elfy.nv — 2010-12-15T09:59:49Z
My thoughts: 1. Using Range interface for formatting classes and structs is a good thing and should stay. 2. There is a conflict of priority between using toString and iterating through a Range. It's worse for classes where toString is always present and can't be used to deduce programmer's intent. IMHO it's more important to keep things uniform, than to make best guess in every case, so iterating through range must have priority over using toString. At least unless there is more direct way to tell what's programmer's intent about default formatting of struct or class. 3. Range with (ElementType!T == T) must be either detected throughout all library as a special case or not detected as a Range at all. I'm under impression that algorithms (not just formatting routines) expect that front() yields some value. This value /may/ be another Range, there may be hierarchical structures containing Ranges of Ranges, yet this hierarchy is expected to be finite, so full traversal of it is possible. I expect there are more trouble waiting to happen with Ranges like that if they go generally undetected. I may be wrong here, it would be great to have someone with knowledge of both current practice and original intent clarify this matter. 4. > Also, the online doc does not hold template constraints, so that it is not possible to determine which one is selected in given situations. +1! 5. attached a testcase of various combination (class|struct, normal range|recursive range|no range, has override for toString|no override toString) and patch which makes all cases compile and print uniform output for struct and class. For this case changes are really very simple, constraints still look manageable, and one can still enjoy specific formatting for ranges.
Comment #3 by elfy.nv — 2010-12-15T10:02:09Z
Created attachment 849 proposed patch
Comment #4 by elfy.nv — 2010-12-15T10:03:24Z
Created attachment 850 testcase for various types
Comment #5 by denis.spir — 2010-12-15T10:45:23Z
(In reply to comment #2) > My thoughts: > > 1. Using Range interface for formatting classes and structs is a good thing and > should stay. Why? Please criticise my arguments above, especially: * Formatting a type exactly according to the builtin default format of an array has no reason to be a common case. Note that a range interface is only _one aspect_ of a type. * Even in this case, writing a 3-4 line toString is not a big deal. * Introducing default array-like formatting for ranges also introduces semantic and implementation issues & complication of the code base. > 2. There is a conflict of priority between using toString and iterating through > a Range. It's worse for classes where toString is always present and can't be > used to deduce programmer's intent. IMHO it's more important to keep things > uniform, than to make best guess in every case, so iterating through range must > have priority over using toString. At least unless there is more direct way to > tell what's programmer's intent about default formatting of struct or class. No! _Default_ range interface formatting cannot have priority over _explicitely_ defined formatting by the programmer. This is a serious conceptual bug. A programmer who defines toString *wants* it to be used, else why would one define it at all? You take here precedence considerations upside down. [See also (*) below.] > 3. Range with (ElementType!T == T) must be either detected throughout all > library as a special case or not detected as a Range at all. I'm under > impression that algorithms (not just formatting routines) expect that front() > yields some value. This value /may/ be another Range, there may be hierarchical > structures containing Ranges of Ranges, yet this hierarchy is expected to be > finite, so full traversal of it is possible. I expect there are more trouble > waiting to happen with Ranges like that if they go generally undetected. I may > be wrong here, it would be great to have someone with knowledge of both current > practice and original intent clarify this matter. Agreed. In addition to my example above (of a string type behaving like in most high-level languages): common forms of link-list, tree, graph hold nodes which themselves are lists, trees, graphs. They must be properly considered as ranges. This special case needs not be detected, I guess. The bug is not due to their recursive nature (else we could never write out a tree ;-), but lies somewhere in D's current writing algorithm for ranges (*). Indeed, the recursive call should end some day, namely on terminal nodes... Actually, in such cases of recursive range, I would simply recommand toString to be defined [because leaf nodes must end formatting recursion, again see (*)]. And default range formatting should neven be used. > 4. > > Also, the online doc does not hold template constraints, so that it is not possible to determine which one is selected in given situations. > +1! > > 5. attached a testcase of various combination (class|struct, normal > range|recursive range|no range, has override for toString|no override toString) > and patch which makes all cases compile and print uniform output for struct and > class. For this case changes are really very simple, constraints still look > manageable, and one can still enjoy specific formatting for ranges. (*) The bug seems to be similar to left-recursive PEG-parsing: when I try to write out a struct object implementing the input range interface, I get an "infinite" series of '[', then segfault. The error seems to be writing out the opening character '[' for each nesting level before having computed the whole string at this level -- which can be empty ot otherwise end the recursion. Actually, more fondamentally, the error is precisely caused by ignoring the user-defined toString that would end recursion by a special, non-recursive, form for terminal elements (leaves). One more reason to respect programmer-defined toString instead of shortcutting it. Denis
Comment #6 by elfy.nv — 2010-12-15T11:46:24Z
> > 1. Using Range interface for formatting classes and structs is a good thing and > > should stay. > > Why? Please criticise my arguments above, especially: > * Formatting a type exactly according to the builtin default format of an array > has no reason to be a common case. Note that a range interface is only _one > aspect_ of a type. It looks for me that foremost property of Range is that it can be iterated and something can be accessed through it. It makes perfect sense that default formatting tries exactly this -- iterate and format what can be accessed. Now if we bundle data and Range interface together all kind of funny things happen. If we separate data and Range object -- everything makes sense. Data is stored in container which may or may not define toString, while Range only gives generic access to underlying data. Of course one may define toString for Range object, but if you think of a Range this way -- as a separate concept with limited purpose -- there is no need for it. In a sense I disagree with the notion of "range interface is only _one aspect_ of a type." I think Range should be considered foremost aspect of a type... Well, just my opinion, of course. for me mixing Range interface with other things is not a good practice. > * Even in this case, writing a 3-4 line toString is not a big deal. True. But 3-4 line for every Range? Of course one may just provide template for currently default formatting of Ranges and let user decide what to use. Actually I think this is what the issue boils down to: we need proper way to define custom formatting which would be preferred over library generics if provided. Something of higher level than toString. > * Introducing default array-like formatting for ranges also introduces semantic > and implementation issues & complication of the code base. I don't see it. Unability to override default formatting is an issue, yet default formatting in itself is a good thing. > No! _Default_ range interface formatting cannot have priority over > _explicitely_ defined formatting by the programmer. I would totally agree with you if there was any way to distinguish overridden toString for classes from original one. I don't know one, so I place priority on uniformity, simplicity and predictability. Structs and classes behaving same way is a good thing. > This is a serious conceptual bug. I would say it just "conceptual". It's not pretty, it may be somewhat limiting ATM, but it's better than increasing complexity, generating more special cases, placing a burden on programmers for what should be provided by library automagically... (*) I mean it's way easier to cope with clearly stated limits that deal with mess of complex condition and special cases. Alternative would be cleaner design for whole system of object to string conversion. (*) Note, default formatting is widely used inside of library for debugging purposes, it must deal with all sort of objects in uniform way and not place any requirements on code. When _programmer_ wants to format object he's free to call toString directly or even use custom method for converting. One or another way for defaults does not really limit programmer other than how he sees some debug messages.
Comment #7 by denis.spir — 2010-12-15T14:36:42Z
(In reply to comment #6) > > > 1. Using Range interface for formatting classes and structs is a good thing and > > > should stay. > > > > Why? Please criticise my arguments above, especially: > > * Formatting a type exactly according to the builtin default format of an array > > has no reason to be a common case. Note that a range interface is only _one > > aspect_ of a type. > > It looks for me that foremost property of Range is that it can be iterated and > something can be accessed through it. It makes perfect sense that default > formatting tries exactly this -- iterate and format what can be accessed. Now > if we bundle data and Range interface together all kind of funny things happen. > If we separate data and Range object -- everything makes sense. Data is stored > in container which may or may not define toString, while Range only gives > generic access to underlying data. Of course one may define toString for Range > object, but if you think of a Range this way -- as a separate concept with > limited purpose -- there is no need for it. > > In a sense I disagree with the notion of "range interface is only _one > aspect_ of a type." I think Range should be considered foremost aspect of a > type... Well, just my opinion, of course. for me mixing Range interface with > other things is not a good practice. > > > * Even in this case, writing a 3-4 line toString is not a big deal. > > True. But 3-4 line for every Range? Of course one may just provide template for > currently default formatting of Ranges and let user decide what to use. > Actually I think this is what the issue boils down to: we need proper way to > define custom formatting which would be preferred over library generics if > provided. Something of higher level than toString. > > > * Introducing default array-like formatting for ranges also introduces semantic > > and implementation issues & complication of the code base. > > I don't see it. Unability to override default formatting is an issue, yet > default formatting in itself is a good thing. > > > No! _Default_ range interface formatting cannot have priority over > > _explicitely_ defined formatting by the programmer. > > I would totally agree with you if there was any way to distinguish overridden > toString for classes from original one. I don't know one, so I place priority > on uniformity, simplicity and predictability. Structs and classes behaving same > way is a good thing. > > > This is a serious conceptual bug. > > I would say it just "conceptual". It's not pretty, it may be somewhat limiting > ATM, but it's better than increasing complexity, generating more special cases, > placing a burden on programmers for what should be provided by library > automagically... (*) I mean it's way easier to cope with clearly stated limits > that deal with mess of complex condition and special cases. Alternative would > be cleaner design for whole system of object to string conversion. > > (*) Note, default formatting is widely used inside of library for debugging > purposes, it must deal with all sort of objects in uniform way and not place > any requirements on code. When _programmer_ wants to format object he's free to > call toString directly or even use custom method for converting. One or another > way for defaults does not really limit programmer other than how he sees some > debug messages. Well, our views are clearly pointing to opposite directions and cannot compromise. First, you seem to consider ranges as types, while for me they are aspects of types, implemented as parts of type interfaces. For me, they just play a role, possibly among others. I agree it's nice to have a default (array-like) output form for types that happen to implement a range interface if, and only if, the programmer does not specify any custom form. I also agree uniformity may be a nice _option_ in some particuliar cases; as long as it is chosen by the programmer, not imposed. In which proportion of cases will the default range format happily fit the programmer's needs for a type that (also) implements the range interface? Say you wraps a custom string type in a struct to provide specific functionality, or a set of filenames and dirnames representing a dir structure, or a symbol table; will it fit? The case of ranges is completely different from the one of arrays, precisely. First, because array types are types; second because array types can only be that, there is no "array aspect" of a type that would also be something else; third, because one cannot specify any output form of an array. For all these reasons, D's default format for arrays is a great feature (and languages that do not provide any such feature are painful). But none of these reasons apply to range interfaces. I agree the impossiblity to distinguish explicite and inherited toString for classes is an issue. But for this reason, your choice is to ignore the programmer's explicite intent in all other cases. I find this totally inacceptable. Firstly for debug, as you say, programmers want feedback output to be exactky the way they state it to be; not in a default form that may by chance express half of what they need in a form that more or less fits their wishes. I don't understand you point about "Now if we bundle data and Range interface together all kind of funny things happen." A type that implement a range always holds data, usually provides many other features that just range/iteration, and sometimes provides several ranges: for instance, a tree can hold differents kind of data fields, expose various operations like inserting a subtree, and have several ranges to iterates depth-first or breadth-first, or only on leaves, etc. Maybe an different point of view would be to find a way for the user to express "use the range interface for output formatting". (Now, basta.) Denis
Comment #8 by bearophile_hugs — 2010-12-15T16:06:00Z
For a different but related thing, see the Comment 8 of bug 3813: http://d.puremagic.com/issues/show_bug.cgi?id=3813#c8 It says that I prefer lazy sequences to be printed in a way different from arrays, for example: [0; 1; 2; 3; 4]
Comment #9 by denis.spir — 2010-12-15T22:47:49Z
(In reply to comment #8) > For a different but related thing, see the Comment 8 of bug 3813: > > http://d.puremagic.com/issues/show_bug.cgi?id=3813#c8 > > It says that I prefer lazy sequences to be printed in a way different from > arrays, for example: > > [0; 1; 2; 3; 4] +++ I would also find it better that ranges do not _exactly_ look like arrays. Denis
Comment #10 by bearophile_hugs — 2010-12-15T23:03:16Z
(In reply to comment #9) > +++ You may also actually vote for the bug 3813 :-)
Comment #11 by sandford — 2010-12-30T12:26:10Z
I've decreased the importance of this as there are work arounds. As for my thoughts, this bug is causing a major loss of function in my update to std.variant and is going to cause major issues with any type that defines/needs a permissive opDispatch. Conceptually, I believe formatValue makes a major mistake by assuming that just because a type satisfies isRange, that it is, in fact a range. So I believe that if toString is present, it should take priority. I don't see a problem with ranges that return their own types (i.e. trees) or the current method of formating in general. (Note that the patch only deals with ElementType!T == T and not the toString issue)
Comment #12 by andrei — 2011-01-23T13:28:48Z
There are good arguments for going either way wrt the relative priority of toString and range interface. Using toString by default is in a way the "right" thing to do as it's the default formatting for all non-range objects. For class objects, I oppose distinguishing between introduced toString and inherited toString; that goes against what inheritance is about. There are a few problems with toString, however. It needs to format everything in memory before writing, which makes formatting large ranges slow. Also, if you have a range but you want to use toString it's easy to simply write r.toString, whereas there is no simple method to say "even though this range has toString, please use the range functions to format it". It's true that there's a danger of accidental conformance to inputRange. But at this point the input range troika is about as widespread as toString, so I don't think that's a major risk. I have the changes proposed by Nick with a few edits in my tree. Unless a solid argument comes forward, I'll commit them soon.
Comment #13 by denis.spir — 2011-01-23T14:25:33Z
(In reply to comment #12) > There are good arguments for going either way wrt the relative priority of > toString and range interface. > > Using toString by default is in a way the "right" thing to do as it's the > default formatting for all non-range objects. For class objects, I oppose > distinguishing between introduced toString and inherited toString; that goes > against what inheritance is about. Actually, my point is not about which of toString or range format should have precedence. Rather that a programmer does defines toString in purpose: for it to be used by builtin routines like write* functions. Ignoring it is not acceptable. Moreover, there is no reason that a _default_ format fits specific needs. About toString issues such as memory usage, I do agree. They are planned to be solved with writeTo. Then, writeTo defined by the programmer should be used for any kind of object output, just like currently toString should be used when defined. Finally, as explained above, letting range default format shortcut custom toString does not permit outputting any range which ElementType is itself. (infinite loop bug) Denis
Comment #14 by sandford — 2011-01-23T17:54:37Z
(In reply to comment #12) > It's true that there's a danger of accidental conformance to inputRange. But at > this point the input range troika is about as widespread as toString, so I > don't think that's a major risk. It's not a major risk in _hand-written_ code, but in generic code it's a major risk due to opDispatch. See comment 11. Reading over Nick's suggestions (comment 2), he predicates point 2 on the assumption that programmer intent is unclear in the case of classes, due to inheritance of toString from Object and then argues that structs, which do have clear programmer intent should behave like class for uniformity. However, in D2 it is trivial to determine if a programmer of a class has actually implemented a custom toString routine or not (see the code listing below). Therefore, I'd recommend formatValue to prioritize the use of custom toString routines when they exist. bool hasCustomToStringHelper(T)() { foreach(type; TypeTuple!(T,TransitiveBaseTypeTuple!T) ) { if(is(Unqual!type == Object)) return false; foreach(member;__traits(derivedMembers,type)) { if(member == "toString") return true; } } return false; } template hasCustomToString(T) { static if( __traits(hasMember,T,"toString") ) { static if( is(T==class) ) { enum hasCustomToString = hasCustomToStringHelper!T; } else { // structs, etc enum hasCustomToString = true; } } else { enum hasCustomToString = false; } }
Comment #15 by samukha — 2011-01-24T05:23:19Z
(In reply to comment #14) > > bool hasCustomToStringHelper(T)() { > foreach(type; TypeTuple!(T,TransitiveBaseTypeTuple!T) ) { > if(is(Unqual!type == Object)) > return false; > foreach(member;__traits(derivedMembers,type)) { > if(member == "toString") > return true; > } > } > return false; > } > template hasCustomToString(T) { > static if( __traits(hasMember,T,"toString") ) { > static if( is(T==class) ) { > enum hasCustomToString = hasCustomToStringHelper!T; > } else { // structs, etc > enum hasCustomToString = true; > } > } else { > enum hasCustomToString = false; > } > } This wouldn't work because the dynamic type of the object being formatted can differ from the static type the template was instantiated with. Even if the compile-time check won't detect the overriden toString, we would still need to perform the check at run-time. The easiest way is to compare the address of the passed-in object's toString to that of Object.toString. Unfortunately that won't work for objects coming from DLLs since they have distinct Object.toString functions. So the correct way is to do proper lookup by name via run-time reflection.
Comment #16 by denis.spir — 2011-01-24T05:40:24Z
(In reply to comment #15) > This wouldn't work because the dynamic type of the object being formatted can > differ from the static type the template was instantiated with. Even if the > compile-time check won't detect the overriden toString, we would still need to > perform the check at run-time. The easiest way is to compare the address of the > passed-in object's toString to that of Object.toString. Unfortunately that > won't work for objects coming from DLLs since they have distinct > Object.toString functions. So the correct way is to do proper lookup by name > via run-time reflection. We may find a way to tell apart, for classes, builtin from toString from custom one. If we cannot, then in doubt toString must have precedence for classes. Note there is another bug: one currently cannot implement a range interface on a class that defines toString (because formatValue template constraints are not mutually exclusive). In any case, struct toString must have precedence. It is an obvious choice: programmers explicitely define toString (later, writeTo) precisely for that ;-) Denis
Comment #17 by samukha — 2011-01-24T06:27:00Z
(In reply to comment #16) > > It is an obvious choice: programmers explicitely define toString (later, > writeTo) precisely for that ;-) > > Denis I agree that user-specified toString should take precedence over the ranges. What I wanted to point out is that static checks for classes are not enough.
Comment #18 by k.hara.pg — 2011-10-20T10:10:38Z
We can check the toString method is overridden like follows: string delegate() dg = &obj.toString; auto overridden = dg.funcptr != &Object.toString; https://github.com/D-Programming-Language/phobos/pull/298 For class range objects, if the toString method is actually overridden, use it.
Comment #19 by k.hara.pg — 2012-06-11T23:40:44Z
Now, various bugs are fixed in std.format module. In class object: - The overridden toString is priority than inherited one. - The overridden toString is priority than user-defined range interface. - The user-defined range interface is priority than inherited toString. In struct object: - User-defined toString is priority than range interface. - If there isn't defined neither toString nor range interface, alias this is considered as proper super type. In all aggregate types: - They can format object lazily with user-specified toString that taking output range. So I close this issue.