If your values are such that they do not even allow a mechanism for creating an best effort approximation of values in the case of ontological enlightenment then you are out of luck no matter what you do.
If preference is expressed in terms of what you should do, not what’s true about the world, new observations never influence preference, so we can fix it at the start and never revise it (which is an important feature for constructing FAI, since you only ever have a hand in its initial construction).
(To whoever downvoted this without comment—it’s not as stupid an idea as it might sound; what’s true about the world doesn’t matter for preference, but it does matter for decision-making, as decisions are made depending on what’s observed. By isolating preference from influence of observations, we fix it at the start, but since it determines what should be done depending on all possible observations, we are not ignoring reality.)
If preference is expressed in terms of what you should do, not what’s true about the world, new observations never influence preference, so we can fix it at the start and never revise it (which is an important feature for constructing FAI, since you only ever have a hand in its initial construction).
In the situation described by Roko the agent has doubt about its understanding of the very ontology that its values are expressed in. If it were an AI that would effectively mean that we designed it using mathematics that we thought was consistent but turns out to have a flaw. The FAI has self improved to a level where it has a suspicion that the ontology that is used to represent its value system is internally inconsistent and must decide whether to examine the problem further. (So we should have been able to fix it at the start but couldn’t because we just weren’t smart enough.)
The FAI has self improved to a level where it has a suspicion that the ontology that is used to represent its value system is internally inconsistent and must decide whether to examine the problem further.
If its values are not represented in terms of an “ontology”, this won’t happen.
If preference is expressed in terms of what you should do, not what’s true about the world, new observations never influence preference, so we can fix it at the start and never revise it (which is an important feature for constructing FAI, since you only ever have a hand in its initial construction).
(To whoever downvoted this without comment—it’s not as stupid an idea as it might sound; what’s true about the world doesn’t matter for preference, but it does matter for decision-making, as decisions are made depending on what’s observed. By isolating preference from influence of observations, we fix it at the start, but since it determines what should be done depending on all possible observations, we are not ignoring reality.)
In the situation described by Roko the agent has doubt about its understanding of the very ontology that its values are expressed in. If it were an AI that would effectively mean that we designed it using mathematics that we thought was consistent but turns out to have a flaw. The FAI has self improved to a level where it has a suspicion that the ontology that is used to represent its value system is internally inconsistent and must decide whether to examine the problem further. (So we should have been able to fix it at the start but couldn’t because we just weren’t smart enough.)
If its values are not represented in terms of an “ontology”, this won’t happen.