Here is a possible mechanism for the Decision Theory Update Problem.
The agent first considers several scenarios, each with a weight for its importance. Then for each scenario the agent compares what the output of its current decision theory is with what the output of its candidate decision theory would be, and computes a loss for that scenario. The loss will be higher when the current decision theory considers the new preferred actions to be certainly wrong or very wrong. The agent will update its decision theory when the above weighted average loss is small enough to be compensated by a gain in representation simplicity.
This points to a possible solution to the ontological crisis as well. The agent will basically look for a simple decision theory under the new ontology that approximates its actions under the old one.
Here is a possible mechanism for the Decision Theory Update Problem.
The agent first considers several scenarios, each with a weight for its importance. Then for each scenario the agent compares what the output of its current decision theory is with what the output of its candidate decision theory would be, and computes a loss for that scenario. The loss will be higher when the current decision theory considers the new preferred actions to be certainly wrong or very wrong. The agent will update its decision theory when the above weighted average loss is small enough to be compensated by a gain in representation simplicity.
This points to a possible solution to the ontological crisis as well. The agent will basically look for a simple decision theory under the new ontology that approximates its actions under the old one.