My guess is that finding a fully satisfactory solution is hopeless, in much the same way as with specifying aligned goals (i.e. no solution is in closed form, without reference to human-derived systems doing decision theory/axiology).
A crucial problem is finding how agent’s decisions influence a given situation, but that situation can include things that reason approximately about the agent, and worse, things that reason about different but similar agents. Agent’s decision influences not just precise predictions of itself, but also approximate (and sometimes incorrect) guesses about it, and approximate guesses about similar decisions of similar agents. Judging how a decision influences a system that wrongly guesses the decision of a similar but different agent seems “arbitrary” in the same way as human goals are “arbitrary”, that is not arbitrary at all, but in practice not possible to express without reference to philosophy of human-derived things.
Another practical solution might be to characterize a class of situations where decision theory is mostly clear, and make sure to keep the world that way until more general decision theory is developed. This direction can benefit from more general decision theories, but they won’t be “fully general”, just describe more situations or understand the familiar situations better. (See also.)
My guess is that finding a fully satisfactory solution is hopeless, in much the same way as with specifying aligned goals (i.e. no solution is in closed form, without reference to human-derived systems doing decision theory/axiology).
A crucial problem is finding how agent’s decisions influence a given situation, but that situation can include things that reason approximately about the agent, and worse, things that reason about different but similar agents. Agent’s decision influences not just precise predictions of itself, but also approximate (and sometimes incorrect) guesses about it, and approximate guesses about similar decisions of similar agents. Judging how a decision influences a system that wrongly guesses the decision of a similar but different agent seems “arbitrary” in the same way as human goals are “arbitrary”, that is not arbitrary at all, but in practice not possible to express without reference to philosophy of human-derived things.
Another practical solution might be to characterize a class of situations where decision theory is mostly clear, and make sure to keep the world that way until more general decision theory is developed. This direction can benefit from more general decision theories, but they won’t be “fully general”, just describe more situations or understand the familiar situations better. (See also.)