Presumably all the math you are working on is required for your proof of friendliness? And if the assumptions behind the math do not match the physics, wouldn’t it invalidate the proof, or at least its relevance to the world we live in? And then all bets are off?
Even invalidating a proof doesn’t automatically mean the outcome is the opposite of the proof. The key question is whether there’s a cognitive search process actively looking for a way to exploit the flaws in a cage. An FAI isn’t looking for ways to stop being Friendly, quite the opposite. More to the point, it’s not actively looking for a way to make its servers or any other accessed machinery disobey the previously modeled laws of physics in a way that modifies its preferences despite the proof system. Any time you have a system which sets that up as an instrumental goal you must’ve done the Wrong Thing from an FAI perspective. In other words, there’s no super-clever being doing a cognitive search for a way to force an invalidating behavior—that’s the key difference.
The problem is that it’s a utility maximiser. If the ontology crises causes the FAI’s goals to slide a bit in the wrong direction, it may end up optimising us out of existence (even if “happy humans with worthwhile and exciting lives” is still high in its preference ordering, it might not be at the top).
This is a uniform problem among all AIs. Avoiding it is very hard. That is why such a thing as the discipline of Friendly AI exists in the first place. You do, in fact, have to specify the preference ordering sufficiently well and keep it sufficiently stable.
Stepping down from maximization is also necessary just because actual maximization is undoable, but then that also has to be kept stable (satisficers may become maximizers, etc.) and if there’s something above eudaimonia in its preference ordering it might not take very much ‘work’ to bring it into existence.
Hmm, I did not mean “actively looking”. I imagined something along the lines of being unable to tell whether something that is a good thing (say, in a CEV sense) in a model universe is good or bad in the actual universe. Presumably if you weren’t expecting this to be an issue, you would not be spending your time on non-standard numbers and other esoteric mathematical models not usually observed in the wild. Again, I must be missing something in my presumptions.
The model theory is just for understanding logic in general and things like Lob’s theorem, and possibly being able to reason about universes using second-order logic. What you’re talking about is the ontological shift problem which is a separate set of issues.
Presumably all the math you are working on is required for your proof of friendliness? And if the assumptions behind the math do not match the physics, wouldn’t it invalidate the proof, or at least its relevance to the world we live in? And then all bets are off?
Even invalidating a proof doesn’t automatically mean the outcome is the opposite of the proof. The key question is whether there’s a cognitive search process actively looking for a way to exploit the flaws in a cage. An FAI isn’t looking for ways to stop being Friendly, quite the opposite. More to the point, it’s not actively looking for a way to make its servers or any other accessed machinery disobey the previously modeled laws of physics in a way that modifies its preferences despite the proof system. Any time you have a system which sets that up as an instrumental goal you must’ve done the Wrong Thing from an FAI perspective. In other words, there’s no super-clever being doing a cognitive search for a way to force an invalidating behavior—that’s the key difference.
The problem is that it’s a utility maximiser. If the ontology crises causes the FAI’s goals to slide a bit in the wrong direction, it may end up optimising us out of existence (even if “happy humans with worthwhile and exciting lives” is still high in its preference ordering, it might not be at the top).
This is a uniform problem among all AIs. Avoiding it is very hard. That is why such a thing as the discipline of Friendly AI exists in the first place. You do, in fact, have to specify the preference ordering sufficiently well and keep it sufficiently stable.
Stepping down from maximization is also necessary just because actual maximization is undoable, but then that also has to be kept stable (satisficers may become maximizers, etc.) and if there’s something above eudaimonia in its preference ordering it might not take very much ‘work’ to bring it into existence.
Hmm, I did not mean “actively looking”. I imagined something along the lines of being unable to tell whether something that is a good thing (say, in a CEV sense) in a model universe is good or bad in the actual universe. Presumably if you weren’t expecting this to be an issue, you would not be spending your time on non-standard numbers and other esoteric mathematical models not usually observed in the wild. Again, I must be missing something in my presumptions.
The model theory is just for understanding logic in general and things like Lob’s theorem, and possibly being able to reason about universes using second-order logic. What you’re talking about is the ontological shift problem which is a separate set of issues.