Keep in mind that the AI could be wrong! Your attempts to validate its correctness could be mistaken (or even subject to some kind of blind spot, if we want to pursue that path). The more implausible the AI’s claim, the more you have to consider that the AI is mistaken. Even though a priori it seemed to be working properly, Bayes’ rule requires you to become more skeptical about that when it makes a claim that is easier to explain if the AI is broken. The more unlikely the claim, the more likely the machine is wrong.
Ultimately, you can’t accept any claim from the AI that is more implausible than that the AI isn’t working right. And given our very very limited human capabilities at correct software design, that threshold can’t realistically be very high, especially if we adjust for our inherent overconfidence. So AIs really can’t surprise us very badly.
If we observed the AI drawing a bunch of correct conclusions, we can pretty quickly build up more evidence that the AI is not insane, and that whatever it is actually thinking is the truth. Most of the bugs that I ever discover in the software I write (I know there is a selection bias here, but I know from experience that my testing is thorough enough to almost catch all bugs that have non-exceedingly-rare consequences for normal use) are the kind that are ovbious right away. It takes a very specific kind of bug to have the possibility of changing the output in a consequential way, but not to be wrong on the first few tests.
But lots of UFAIs would pretend to be FAIs, so it would be much harder to get evidence that it was friendly.
Could I add a brief thought? Even if you could program an AI with no bugs, that AI could make “mistakes”. What we consider a mistake depends on the practical purpose we have in mind for the AI. What we consider a bug depends on the operational purpose of the segment of code the bug appears in. But the operational purpose may not match up with our practical purpose. So code which achieves its operational purpose might not achieve its practical purpose.
Let me take this a bit deeper. The programmer may have an overarching intended purpose for the entire project of building the AI. They may also have an intended purpose for the entirety of the code written. They may also have an intended purpose for the programs A, B, C… in the code. They may… and so forth. At every stage of increasing generality, there is a potential for the more general purpose to not be fulfilled. Your programs might do what you want them to, but they might fail in combination to detect an absolute denial macro. The entirety of your code might resemble an AI, but it would take a computer far more powerful than any in existence to run on. You might get the whole project up and working, but only for the AI to decide to commit suicide. Or, more relevantly, you might get the whole project working, but the AI turns out to be dumb, because the way you thought an AI ought to think in order to be clever didn’t work out. Your intended purpose to create an AI which thought in the way you intended was successful, but one of the more general purposes, to create an AI that was clever, failed.
So it would seem that bug checking is more prone to human error than you implied, especially as intended purposes are themselves often vague. I don’t claim, however, that these challenges are insurmountable. Also, if anyone is uncomfortable with the phrase “intended purpose” I used, feel free to replace with “what the programmer had in mind”, as that is all I meant by it.
Hi. Checking back on this account on a whim after a long time of not using it. You’re right. 2012!Mestroyer was a noob and I am still cleaning up his bad software.
Keep in mind that the AI could be wrong! Your attempts to validate its correctness could be mistaken (or even subject to some kind of blind spot, if we want to pursue that path). The more implausible the AI’s claim, the more you have to consider that the AI is mistaken. Even though a priori it seemed to be working properly, Bayes’ rule requires you to become more skeptical about that when it makes a claim that is easier to explain if the AI is broken. The more unlikely the claim, the more likely the machine is wrong.
Ultimately, you can’t accept any claim from the AI that is more implausible than that the AI isn’t working right. And given our very very limited human capabilities at correct software design, that threshold can’t realistically be very high, especially if we adjust for our inherent overconfidence. So AIs really can’t surprise us very badly.
If we observed the AI drawing a bunch of correct conclusions, we can pretty quickly build up more evidence that the AI is not insane, and that whatever it is actually thinking is the truth. Most of the bugs that I ever discover in the software I write (I know there is a selection bias here, but I know from experience that my testing is thorough enough to almost catch all bugs that have non-exceedingly-rare consequences for normal use) are the kind that are ovbious right away. It takes a very specific kind of bug to have the possibility of changing the output in a consequential way, but not to be wrong on the first few tests.
But lots of UFAIs would pretend to be FAIs, so it would be much harder to get evidence that it was friendly.
Could I add a brief thought? Even if you could program an AI with no bugs, that AI could make “mistakes”. What we consider a mistake depends on the practical purpose we have in mind for the AI. What we consider a bug depends on the operational purpose of the segment of code the bug appears in. But the operational purpose may not match up with our practical purpose. So code which achieves its operational purpose might not achieve its practical purpose.
Let me take this a bit deeper. The programmer may have an overarching intended purpose for the entire project of building the AI. They may also have an intended purpose for the entirety of the code written. They may also have an intended purpose for the programs A, B, C… in the code. They may… and so forth. At every stage of increasing generality, there is a potential for the more general purpose to not be fulfilled. Your programs might do what you want them to, but they might fail in combination to detect an absolute denial macro. The entirety of your code might resemble an AI, but it would take a computer far more powerful than any in existence to run on. You might get the whole project up and working, but only for the AI to decide to commit suicide. Or, more relevantly, you might get the whole project working, but the AI turns out to be dumb, because the way you thought an AI ought to think in order to be clever didn’t work out. Your intended purpose to create an AI which thought in the way you intended was successful, but one of the more general purposes, to create an AI that was clever, failed.
So it would seem that bug checking is more prone to human error than you implied, especially as intended purposes are themselves often vague. I don’t claim, however, that these challenges are insurmountable. Also, if anyone is uncomfortable with the phrase “intended purpose” I used, feel free to replace with “what the programmer had in mind”, as that is all I meant by it.
Hi. Checking back on this account on a whim after a long time of not using it. You’re right. 2012!Mestroyer was a noob and I am still cleaning up his bad software.