In the general case I agree it’s not necessarily trivial; e.g. if your program uses the whole range of decimal places to a meaningful degree, or performs calculations that can compound floating point errors up to higher decimal places. (Though I’d argue that in both of those cases pure floating point is probably not the best system to use.) In my case I knew that the intended precision of the input would never be precise enough to overlap with floating point errors, so I could just round anything past the 15th decimal place down to 0.
Hmm, interesting. The exact choice of decimal place at which to cut off the comparison is certainly arbitrary, and that doesn’t feel very elegant. My thinking is that within the constraint of using floating point numbers, there fundamentally isn’t a perfect solution. Floating point notation changes some numbers into other numbers, so there are always going to be some cases where number comparisons are wrong. What we want to do is define a problem domain and check if floating point will cause problems within that domain; if it doesn’t, go for it, if it does, maybe don’t use floating point.
In this case my fix solves the problem for what I think is the vast majority of the most likely inputs (in particular it solves it for all the inputs that my particular program was going to get), and while it’s less fundamental than e.g. using arbitrary-precision arithmetic, it does better on the cost-benefit analysis. (Just like how “completely overhaul our company” addresses things on a more fundamental level than just fixing the structural simulation, but may not be the best fix given resource constraints.)
The main purpose of my example was not to argue that my particular approach was the “correct” one, but rather to point out the flaws in the “multiply by an arbitrary constant” approach. I’ll edit that line, since I think you’re right that it’s a little more complicated than I was making it out to be, and “trivial” could be an unfair characterization.
BTW as a concrete note, you may want to sub in 15 - ceil(log10(n)) instead of just “15”, which really only matters if you’re dealing with numbers above 10 (e.g. 1000 is represented as 0x408F400000000000, while the next float 0x408F400000000001 is 1000.000000000000114, which differs in the 13th decimal place).
In the general case I agree it’s not necessarily trivial; e.g. if your program uses the whole range of decimal places to a meaningful degree, or performs calculations that can compound floating point errors up to higher decimal places. (Though I’d argue that in both of those cases pure floating point is probably not the best system to use.) In my case I knew that the intended precision of the input would never be precise enough to overlap with floating point errors, so I could just round anything past the 15th decimal place down to 0.
That makes sense. I think I may have misjudged your post, as I expected that you would classify that kind of approach as a “duct tape” approach.
Hmm, interesting. The exact choice of decimal place at which to cut off the comparison is certainly arbitrary, and that doesn’t feel very elegant. My thinking is that within the constraint of using floating point numbers, there fundamentally isn’t a perfect solution. Floating point notation changes some numbers into other numbers, so there are always going to be some cases where number comparisons are wrong. What we want to do is define a problem domain and check if floating point will cause problems within that domain; if it doesn’t, go for it, if it does, maybe don’t use floating point.
In this case my fix solves the problem for what I think is the vast majority of the most likely inputs (in particular it solves it for all the inputs that my particular program was going to get), and while it’s less fundamental than e.g. using arbitrary-precision arithmetic, it does better on the cost-benefit analysis. (Just like how “completely overhaul our company” addresses things on a more fundamental level than just fixing the structural simulation, but may not be the best fix given resource constraints.)
The main purpose of my example was not to argue that my particular approach was the “correct” one, but rather to point out the flaws in the “multiply by an arbitrary constant” approach. I’ll edit that line, since I think you’re right that it’s a little more complicated than I was making it out to be, and “trivial” could be an unfair characterization.
BTW as a concrete note, you may want to sub in
15 - ceil(log10(n))
instead of just “15”, which really only matters if you’re dealing with numbers above 10 (e.g. 1000 is represented as 0x408F400000000000, while the next float 0x408F400000000001 is 1000.000000000000114, which differs in the 13th decimal place).It’s duct tapes all the way down!