You want the program to keep running in the context of the world. To specify what that means, you need to build on top of an ontology that refers to the world. But figuring out such ontology is a very difficult problem and you can’t even in principle refer to the whole world as it really is: you’ll always have uncertainty left, even in a general ontological model.
The program will have to know what tradeoffs to make, for example whether it’s important to survive in most possible worlds with fair probability, or in at least one possible world with high probability. These would lead to very different behavior, and the possibility of such tradeoffs exemplifies how much data such preference would require. If additionally you want to keep most of the world as it would be if the AI was never created, that’s another complex counterfactual for you to bake in into its preference.
It’s a very difficult problem, probably more difficult that FAI, since for FAI we at least have some hope of cheating and copying formal preference from an existing blueprint, and here you have to build that from scratch, translating your requirements from human-speak to formal specification.
You want the program to keep running in the context of the world. To specify what that means, you need to build on top of an ontology that refers to the world. But figuring out such ontology is a very difficult problem and you can’t even in principle refer to the whole world as it really is: you’ll always have uncertainty left, even in a general ontological model.
The program will have to know what tradeoffs to make, for example whether it’s important to survive in most possible worlds with fair probability, or in at least one possible world with high probability. These would lead to very different behavior, and the possibility of such tradeoffs exemplifies how much data such preference would require. If additionally you want to keep most of the world as it would be if the AI was never created, that’s another complex counterfactual for you to bake in into its preference.
It’s a very difficult problem, probably more difficult that FAI, since for FAI we at least have some hope of cheating and copying formal preference from an existing blueprint, and here you have to build that from scratch, translating your requirements from human-speak to formal specification.