If a computer makes random small errors 0.01% of the time in e.g. arithmetic operations, it’s not an almost-working computer, it’s a completely non-functioning computer, that will crash almost immediately.
Floating point arithmetic in computers is usually not precise, and has many failure modes that are hard to understand even for experts. Here’s a simple one: when calculating the sum of many numbers, adding them from smallest to biggest or from biggest to smallest will often give different results, and the former one will be more correct. Here’s a more complex one: a twenty page paper about computing the average of two numbers. But there are programs that do trillions of floating point operations and don’t crash.
Floating point arithmetic in computers is usually not precise, and has many failure modes that are hard to understand even for experts.
Floating point arithmetic might not be precise, but it’s in-precise in KNOWN ways.
As in, for a given operation done given a certain set of instruction you can know that cases X/Y/Z have undefined behavior. (e.g. using instruction A to multiply c and d will only give a precise result up to the nth decimal place)
By your same definition basically every single popular programming language is not precise since they can manifest UB, but that doesn’t stop your kernel from working since it’s written in such a way to (mainly) avoid any sort of UB.
Pragmatically speaking, I can take any FP computation library and get deterministic results even if I run a program millions of times on different machines.
Heck, even with something like machine learning where your two main tools are fp operations and randomness you can do something like (example for torch):
and you will get the same results by running the same code millions and millions of time over on many different machine.
Even if the library gets itself in an UB situation (e.g. number get too large and go to nan) , it will be precisely reach that UB at the exact same point each time.
So I think the better way to think of FPA is “defined only in a bounded domain”, but the implementations don’t bother to enforce those definitions programatically, since that would take too long. Saying nan is cheaper than checking if a number is nan each time kinda thing.
Well, your broader claim was that computer algorithms shouldn’t kinda sorta work, they need to work 100%. And floating point arithmetic belies that claim. For that matter, so does integer arithmetic—practically no programs come with a rigorous analysis of when integer overflow or division by zero can or can’t happen. For example, binary search in Java was buggy for many years, because the (high+low)/2 operation on integers is funny.
The claim that a given algorithm or circuit really adds two numbers is very precise. Even a single pair of numbers that it adds incorrectly refutes the claim, and very much risks making this algorithm/circuit useless.
For almost every arithmetic operation in actual computers, on every type of numbers, there are many inputs for which that operation returns the wrong result. (Yeah, arbitrary size integers are an exception, but most programs don’t use those, and even they can fail if you try making a number that doesn’t fit in memory.) But still, lots of algorithms are useful.
Are you referring to overflow? If so, that’s the right result, the function to compute is “adding integers mod N” not “adding integers” (I agree I said “adding integers” but anyway addition mod N is a different very, very precise claim). Otherwise that’s a hardware bug and quality assurance is supposed to get rid of those.
I still don’t think the programming example supports your point.
For example, in C and C++, integer overflow is undefined behavior. The compiler is allowed to break your program if it happens. Undefined behavior is useful for optimizations—for example, you can optimize x<x+1 to true, which helps eliminate branches—and there have been popular programs that quietly broke when a new compiler release got better at such optimizations. John Regehr’s blog is a great source on this.
Almost nothing in programming is 100% reliable, most things just kinda seem to work. Maybe it would be better to use an example from math.
Floating point arithmetic in computers is usually not precise, and has many failure modes that are hard to understand even for experts. Here’s a simple one: when calculating the sum of many numbers, adding them from smallest to biggest or from biggest to smallest will often give different results, and the former one will be more correct. Here’s a more complex one: a twenty page paper about computing the average of two numbers. But there are programs that do trillions of floating point operations and don’t crash.
Floating point arithmetic might not be precise, but it’s in-precise in KNOWN ways.
As in, for a given operation done given a certain set of instruction you can know that cases X/Y/Z have undefined behavior. (e.g. using instruction A to multiply c and d will only give a precise result up to the nth decimal place)
By your same definition basically every single popular programming language is not precise since they can manifest UB, but that doesn’t stop your kernel from working since it’s written in such a way to (mainly) avoid any sort of UB.
Pragmatically speaking, I can take any FP computation library and get deterministic results even if I run a program millions of times on different machines.
Heck, even with something like machine learning where your two main tools are fp operations and randomness you can do something like (example for torch):
and you will get the same results by running the same code millions and millions of time over on many different machine.
Even if the library gets itself in an UB situation (e.g. number get too large and go to
nan
) , it will be precisely reach that UB at the exact same point each time.So I think the better way to think of FPA is “defined only in a bounded domain”, but the implementations don’t bother to enforce those definitions programatically, since that would take too long. Saying
nan
is cheaper than checking if a number isnan
each time kinda thing.This does apply to floating point but I was thinking of integer operations here.
Well, your broader claim was that computer algorithms shouldn’t kinda sorta work, they need to work 100%. And floating point arithmetic belies that claim. For that matter, so does integer arithmetic—practically no programs come with a rigorous analysis of when integer overflow or division by zero can or can’t happen. For example, binary search in Java was buggy for many years, because the (high+low)/2 operation on integers is funny.
The claim was that if the arithmetic circuit that is supposed to add numbers fails 0.01% of the time the computer crashes, which is true.
You did also say that
For almost every arithmetic operation in actual computers, on every type of numbers, there are many inputs for which that operation returns the wrong result. (Yeah, arbitrary size integers are an exception, but most programs don’t use those, and even they can fail if you try making a number that doesn’t fit in memory.) But still, lots of algorithms are useful.
Are you referring to overflow? If so, that’s the right result, the function to compute is “adding integers mod N” not “adding integers” (I agree I said “adding integers” but anyway addition mod N is a different very, very precise claim). Otherwise that’s a hardware bug and quality assurance is supposed to get rid of those.
I still don’t think the programming example supports your point.
For example, in C and C++, integer overflow is undefined behavior. The compiler is allowed to break your program if it happens. Undefined behavior is useful for optimizations—for example, you can optimize x<x+1 to true, which helps eliminate branches—and there have been popular programs that quietly broke when a new compiler release got better at such optimizations. John Regehr’s blog is a great source on this.
Almost nothing in programming is 100% reliable, most things just kinda seem to work. Maybe it would be better to use an example from math.