But for the continuous limit the subagents become similar to each other at the same rate as they become more numerous. It seems intuitive to me that with a little grinding you could get a decision-making procedure whose policy is an optimum of an integral over “subagents” who bet on the button being pushed at different times, and so the whole system will change behavior upon an arbitrarily-timed press of the button.
Except I think in continuous time you probably lose guarantees about the system not manipulating humans to press/not press the button. Unless maybe each subagent believes the button can only be pressed exactly at their chosen time. But this highlights that maybe all of these counterfactuals give rise to really weird worlds, that in turn will give rise to weird behavior.
I could buy something like this with the continuous time limit.
I just mean if you want to extend this to cover things outside of the shutdown problem. Like you might want to request the AI to build you a fusion power plant, or cook you a chocolate cake, or make a company that sells pottery, or similar. You could have some way of generating a utility function for each possibility, and then generate subagents for all of them, but if you do this you’ve got an exponentially large conjunction.
But for the continuous limit the subagents become similar to each other at the same rate as they become more numerous. It seems intuitive to me that with a little grinding you could get a decision-making procedure whose policy is an optimum of an integral over “subagents” who bet on the button being pushed at different times, and so the whole system will change behavior upon an arbitrarily-timed press of the button.
Except I think in continuous time you probably lose guarantees about the system not manipulating humans to press/not press the button. Unless maybe each subagent believes the button can only be pressed exactly at their chosen time. But this highlights that maybe all of these counterfactuals give rise to really weird worlds, that in turn will give rise to weird behavior.
I could buy something like this with the continuous time limit.
I just mean if you want to extend this to cover things outside of the shutdown problem. Like you might want to request the AI to build you a fusion power plant, or cook you a chocolate cake, or make a company that sells pottery, or similar. You could have some way of generating a utility function for each possibility, and then generate subagents for all of them, but if you do this you’ve got an exponentially large conjunction.