← Back to index | Original Bugzilla link

Bug 3248 – lossless floating point formatting

Status: NEW
Severity: enhancement
Priority: P4
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2009-08-12T10:31:08Z
Last change time: 2024-12-01T16:13:15Z
Assigned to: No Owner
Creator: assorted
See also: https://issues.dlang.org/show_bug.cgi?id=6925, https://issues.dlang.org/show_bug.cgi?id=7341, https://issues.dlang.org/show_bug.cgi?id=8424, https://issues.dlang.org/show_bug.cgi?id=9297, https://issues.dlang.org/show_bug.cgi?id=9489, https://issues.dlang.org/show_bug.cgi?id=9593, https://issues.dlang.org/show_bug.cgi?id=9594, https://issues.dlang.org/show_bug.cgi?id=9872, https://issues.dlang.org/show_bug.cgi?id=9889, https://issues.dlang.org/show_bug.cgi?id=12284, https://issues.dlang.org/show_bug.cgi?id=12627, https://issues.dlang.org/show_bug.cgi?id=12743, https://issues.dlang.org/show_bug.cgi?id=13055, https://issues.dlang.org/show_bug.cgi?id=13568, https://issues.dlang.org/show_bug.cgi?id=13680, https://issues.dlang.org/show_bug.cgi?id=13971, https://issues.dlang.org/show_bug.cgi?id=15227, https://issues.dlang.org/show_bug.cgi?id=15386, https://issues.dlang.org/show_bug.cgi?id=16078, https://issues.dlang.org/show_bug.cgi?id=16336, https://issues.dlang.org/show_bug.cgi?id=17281, https://issues.dlang.org/show_bug.cgi?id=17381

Comments

Comment #0 by moi667 — 2009-08-12T10:31:08Z

Could an option be added to the formatting to elide trailing zero's for %f ? That way it is possible to create an optimal lossless formatting for which the following holds: float f; s = format(f); float f2 = to!(float)(s); assert(f==f2); The formatting I'm trying to get can be seen here (decimal): http://www.h-schmidt.net/FloatApplet/IEEE754.html %g fails to format like this because it uses %f for as small as 10^-5, thus loosing precision for floats with leading zero's, like 0.00001234567. Fixing this by using %f for 10^-5..10^-1 fails because it doesn't elide trailing zero's making it suboptimal space-wise. It would be even nicer to have this lossless formatting added to std.format! I would even suggest making this the default formatting for floating point; floating point isn't as straight forward as integral and it is easy to think the current formatting holds all information. Compared to the hex %a format this new lossless format will be better readable (less bug-prone) and generally shorter (0.1 will be 0.1).

Comment #1 by clugdbug — 2009-08-12T12:22:26Z

It's not that easy, actually. When should it print 0.09999999999999999, and when should it print 0.1 ? The code to do it correctly is amazingly complicated. Just be aware that what you're asking for is much more difficult than you probably imagine.

Comment #2 by moi667 — 2009-08-12T15:40:10Z

(In reply to comment #1) > It's not that easy, actually. When should it print 0.09999999999999999, and > when should it print 0.1 ? The code to do it correctly is amazingly > complicated. > Just be aware that what you're asking for is much more difficult than you > probably imagine. It is less difficult than you imagine :) Lets take floats: A float has at most 24bits of precision 2^-24 = 0.000000059604644775390625 2^-23 = 0.00000011920928955078125 to distinguish between these two you only need a precision of 8. Thus %.8e will always be lossless but isn't always the nicest way of representation. %g fixes this by using %f if the exponent for an e format is greater than -5 and less than the precision. The less than precision part is correct, but the greater than 10^-5 is bad as the precision specifies the number of digits generated after the decimal point; not excluding leading zeros. If %g would be changed to use %f only between 10^-1 and precision that would solve that problem, if %f were to elide trailing zeros. Back to the 0.1 question. 0.1 is actually saved as 0.1000000012... Eliding trailing zeros from %f.8 would be sufficient to get 0.1

Comment #3 by smjg — 2009-08-12T18:21:21Z

I can see a few possible approaches to lossless floating point formatting: (a) decimal with infinite precision, minus trailing zeros (b) minimum number of significant figures guaranteed to be unique, minus trailing zeros (c) the shortest possible string that, when parsed as a floating point, is exactly this number (a) clearly isn't what the reporter is asking for. (b) seems straightforward. (Is the number of s.f. in question just the .dig property?) (c) is optimal, and could probably be implemented quite simply (not sure whether it would be most efficient though) with the aid of the nextUp and nextDown functions. This would also address the question in comment 1, though I'm not sure how easy it would be to implement this efficiently. But (b) and (c) are ambiguous: do we go by uniqueness/exactitude in the real type or in the actual floating point type being used? I can see that sometimes the app'll know what type it will later be read into, and sometimes it won't.

Comment #4 by moi667 — 2009-08-12T19:45:28Z

As far as I understand it, removing trailing zeros from .8 precision and (c) are the same. This is because the first (right to left) non-zero you encounter is there because of 2^x. I actually used nextUp to test a few ranges of floats :) (I have a not so fast computer) I remember .dig being 6 for all floats (could be wrong here, not close to any dmd.exe)

Comment #5 by andrei — 2009-08-12T22:43:37Z

I recommend anyone interested in the subject to peruse the papers: "How to Read Floating Point Numbers Accurately" ftp://ftp.ccs.neu.edu/pub/people/will/howtoread.ps and "Printing Floating-Point Numbers Quickly and Accurately" www.cs.indiana.edu/~burger/FP-Printing-PLDI96.pdf

Comment #6 by bugzilla — 2009-08-14T22:47:28Z

Right, this problem is an old one, and there's no reason to reinvent the wheel. Also, the formatting for them works by simply forwarding the job to the underlying C library. Some C implementations of this are better than others.

Comment #7 by moi667 — 2009-08-15T09:55:41Z

Does this mean I can forget about getting this in phobos? Could then at least an option be added to remove those trailing zeros for %f? I don't see why %g should be that privileged ;)

Comment #8 by smjg — 2009-09-07T02:58:19Z

(In reply to comment #4) > As far as I understand it, removing trailing zeros from .8 precision and (c) > are the same. I doubt it ... I think the optimal number of decimal s.f. would depend on the binary exponent. But I'll experiment when I have time. > I remember .dig being 6 for all floats (could be wrong here, not close to any > dmd.exe) The spec describes .dig as "number of decimal digits of precision", which seems ambiguous. Is it a property of the type or the value? If it's a type property, is it the maximum number of s.f. that may be required to express a number of the type unambiguously, or the number of s.f. to which numbers are guaranteed to be storeable unambiguously? If a value property, it is the number of s.f. according to which of the approaches I listed, or something else?

Comment #9 by clugdbug — 2009-09-07T04:25:35Z

(In reply to comment #8) > (In reply to comment #4) > > As far as I understand it, removing trailing zeros from .8 precision and (c) > > are the same. > > I doubt it ... I think the optimal number of decimal s.f. would depend on the > binary exponent. But I'll experiment when I have time. You are correct. Some numbers need an extra digit. > > I remember .dig being 6 for all floats (could be wrong here, not close to any > > dmd.exe) > > The spec describes .dig as "number of decimal digits of precision", which seems > ambiguous. Is it a property of the type or the value? It's a property of the type. If it's a type > property, is it the maximum number of s.f. that may be required to express a > number of the type unambiguously, or the number of s.f. to which numbers are > guaranteed to be storeable unambiguously? Neither. It's the number of sic figs which are accurate in the worst case. So it's the _minimum_ number of digits which are stored. To unambiguously define the number, more digits are almost always required.

Comment #10 by smjg — 2009-09-07T04:41:58Z

> Neither. It's the number of sic figs which are accurate in the worst case. So > it's the _minimum_ number of digits which are stored. To unambiguously define > the number, more digits are almost always required. So, if you try to put a decimal number into a float, it's how many s.f. you can get out again and be sure they'll be the same. I don't see in what cases this differs from "the number of s.f. to which numbers are guaranteed to be storeable unambiguously"....

Comment #11 by clugdbug — 2009-09-07T05:02:15Z

(In reply to comment #10) > > Neither. It's the number of sic figs which are accurate in the worst case. So > > it's the _minimum_ number of digits which are stored. To unambiguously define > > the number, more digits are almost always required. > > So, if you try to put a decimal number into a float, it's how many s.f. you can > get out again and be sure they'll be the same. I don't see in what cases this > differs from "the number of s.f. to which numbers are guaranteed to be > storeable unambiguously".... It may be the same. I wasn't quite sure what you meant by "unambiguously". In both directions binary<->decimal there is nearly always more than one choice.

Comment #12 by moi667 — 2009-09-07T11:24:37Z

(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #4) > > > As far as I understand it, removing trailing zeros from .8 precision and (c) > > > are the same. > > > > I doubt it ... I think the optimal number of decimal s.f. would depend on the > > binary exponent. But I'll experiment when I have time. You are correct, removing trailing zeros from %.8e isn't optimal, but I thought it was at least lossless.. > > You are correct. Some numbers need an extra digit. > Could you maybe provide one? As I did some ranges with nextUp and didn't find any. A near optimal lossless formatting is fine too :)

Comment #13 by hsteoh — 2014-09-19T21:33:16Z

Wouldn't the most lossless format be to just dump the representation in hexadecimal (i.e., in the same format as a hexadecimal float literal)? That way you're guaranteed that you don't get excess precision where there is none, nor do you lose any bits.

Comment #14 by yebblies — 2014-11-04T08:10:33Z

(In reply to hsteoh from comment #13) > Wouldn't the most lossless format be to just dump the representation in > hexadecimal (i.e., in the same format as a hexadecimal float literal)? That > way you're guaranteed that you don't get excess precision where there is > none, nor do you lose any bits. That would no longer be human-readable.

Comment #15 by andrei — 2015-11-03T19:26:59Z

Anyone working on this?

Comment #16 by ben.james.jones — 2019-04-16T04:27:32Z

I'd be interested in taking a stab at this based on the new algorithm presented here: https://dl.acm.org/citation.cfm?id=3192369 . STL (the person) has been tweeting about adding this to the MS STL (the library) implementation for char_conv in C++ 17(?) and it sounds like a big speed win in addition to having nice round trip properties. Currently formatValueImpl seems to just call snprintf.

Comment #17 by uplink.coder — 2019-04-16T08:01:27Z

@Ben I already have a ctfeable implementation which is lossless for doubles. I am going to push it to druntime soon

Comment #18 by robert.schadek — 2024-12-01T16:13:15Z

THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9761 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB