A slightly bigger “large risk” than Pentashagon puts forward is that a provably boxed UFAI could indifferently give us information that results in yet another UFAI, just as unpredictable as itself (statistically speaking, it’s going to give us more unhelpful information than helpful, as Robb point out). Keep in mind I’m extrapolating here. At first you’d just be asking for mundane things like better transportation, cures for diseases, etc. If the UFAI’s mind is strange enough, and we’re lucky enough, then some of these things result in beneficial outcomes, politically motivating humans to continue asking it for things. Eventually we’re going to escalate to asking for a better AI, at which point we’ll get a crap-shoot.
An even bigger risk than that -though—is that if it’s especially Unfriendly, it may even do this intentionally, going so far as to pretend it’s friendly while bestowing us with data to make an AI even more Unfriendly AI than itself. So what do we do, box that AI as well, when it could potentially be even more devious than the one that already convinced us to make this one? Is it just boxes, all the way down? (spoilers: it isn’t, because we shouldn’t be taking any advice from boxed AIs in the first place)
The only use of a boxed AI is to verify that, yes, the programming path you went down is the wrong one, and resulted in an AI that was indifferent to our existence (and therefore has no incentive to hide its motives from us). Any positive outcome would be no better than an outcome where the AI was specifically Evil, because if we can’t tell the difference in the code prior to turning it on, we certainly wouldn’t be able to tell the difference afterward.
A slightly bigger “large risk” than Pentashagon puts forward is that a provably boxed UFAI could indifferently give us information that results in yet another UFAI, just as unpredictable as itself (statistically speaking, it’s going to give us more unhelpful information than helpful, as Robb point out). Keep in mind I’m extrapolating here. At first you’d just be asking for mundane things like better transportation, cures for diseases, etc. If the UFAI’s mind is strange enough, and we’re lucky enough, then some of these things result in beneficial outcomes, politically motivating humans to continue asking it for things. Eventually we’re going to escalate to asking for a better AI, at which point we’ll get a crap-shoot.
An even bigger risk than that -though—is that if it’s especially Unfriendly, it may even do this intentionally, going so far as to pretend it’s friendly while bestowing us with data to make an AI even more Unfriendly AI than itself. So what do we do, box that AI as well, when it could potentially be even more devious than the one that already convinced us to make this one? Is it just boxes, all the way down? (spoilers: it isn’t, because we shouldn’t be taking any advice from boxed AIs in the first place)
The only use of a boxed AI is to verify that, yes, the programming path you went down is the wrong one, and resulted in an AI that was indifferent to our existence (and therefore has no incentive to hide its motives from us). Any positive outcome would be no better than an outcome where the AI was specifically Evil, because if we can’t tell the difference in the code prior to turning it on, we certainly wouldn’t be able to tell the difference afterward.