Not entirely sure what you mean there; but are you saying that we can train the AI in a “black box/evolutionary algorithm” kind of way, and that this will extend across ontology crises?
Evolutionary algorithms work well for incompletely defined situations; no emergence is needed to explain that behaviour.
I think “The AI fails to understand its new environment enough to be able to manipulate it to implement its values.” is unlikely (that’s my first scenario) as the AI is the one discovering the new ontology (if we know it in advance, we give it to the AI).
Not sure how to approach #2; how could you specify a Newtonian “maximise pleasure over time” in such a way that it stays stable when the AI discovers relativity (and you have to specify this without using your own knowledge of relativity, of course)?
Not entirely sure what you mean there; but are you saying that we can train the AI in a “black box/evolutionary algorithm” kind of way, and that this will extend across ontology crises?
That sounds like an appeal to emergence.
I think there are two potential problems with ontology:
The AI fails to understand its new environment enough to be able to manipulate it to implement its values.
The AI discovers how to manipulate its new environment, but in the translation to the new ontology its values become corrupted.
For #1, all we can do is give it a better approximation of inductive inference. For #2 we can state the values in more ontology-independent terms.
These are both incredibly difficult to do when you don’t know (and probably can’t imagine) what kind of ontological crises the AI will face.
Evolutionary algorithms work well for incompletely defined situations; no emergence is needed to explain that behaviour.
I think “The AI fails to understand its new environment enough to be able to manipulate it to implement its values.” is unlikely (that’s my first scenario) as the AI is the one discovering the new ontology (if we know it in advance, we give it to the AI).
Not sure how to approach #2; how could you specify a Newtonian “maximise pleasure over time” in such a way that it stays stable when the AI discovers relativity (and you have to specify this without using your own knowledge of relativity, of course)?