I think that the Wentworth/Lorell approach differs slightly to the one I use here (in particular, they emphasise the counterfactual nature of the two expected utilities—something I don’t fully understand)...
Yup, I indeed think the do()-ops are the main piece missing here. They’re what remove the agent’s incentive to manipulate the shutdown button.
Yup, I indeed think the do()-ops are the main piece missing here. They’re what remove the agent’s incentive to manipulate the shutdown button.