There still might be problems with subagents, though. It could be optimal for the agent to create a subagent to protect it from being interfered with, while it “goes to sleep”.
Yep, that’s better. There’s still the risk of subagents being created—when the agent thinks that f−1(y)≠0, almost certainly, but not completely certainly. Then it might create a u-maximising subagent and then self-terminate.
Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).
There still might be problems with subagents, though. It could be optimal for the agent to create a subagent to protect it from being interfered with, while it “goes to sleep”.
I agree.
I think this might be solved by modifying the utility for the case f−1(y)≠0 to:
α1+[number of time-steps until the first "self-terminate" action]
Yep, that’s better. There’s still the risk of subagents being created—when the agent thinks that f−1(y)≠0, almost certainly, but not completely certainly. Then it might create a u-maximising subagent and then self-terminate.
That means that this design, like most indifference designs, is reflectively consistent but not reflectively stable.
Wow, I agree!
Let us modify the utility for the case f−1(y)=0 to:
u∗(h)={0h contains "self-terminate" actionu(h)otherwise
Meaning: no utility can be gained via subagents if the agent “jumps ship” (i.e. self-terminates to gain utility in case f−1(y)≠0).
Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).