Big crux here: I don’t actually expect useful research to occur as a result of my control-critique post. Even having updated on the discussion remaining more civil than I expected, I still expect basically-zero people to do anything useful as a result.
As a comparison: I wrote a couple posts on my AI model delta with Yudkowsky and with Christiano. For each of them, I can imagine changing ~one big piece in my model, and end up with a model which looks basically like theirs.
By contrast, when I read the stuff written on the control agenda… it feels like there is no model there at all. (Directionally-correct but probably not quite accurate description:) it feels like whoever’s writing, or whoever would buy the control agenda, is just kinda pattern-matching natural language strings without tracking the underlying concepts those strings are supposed to represent. (Joe’s recent post on “fake vs real thinking” feels like it’s pointing at the right thing here; the posts on control feel strongly like “fake” thinking.) And that’s not a problem which gets fixed by engaging at the object level; that type of cognition will mostly not produce useful work, so getting useful work out of such people would require getting them to think in entirely different ways.
… so mostly I’ve tried to argue at a different level, like e.g. in the Why Not Just… posts. The goal there isn’t really to engage the sort of people who would otherwise buy the control agenda, but rather to communicate the underlying problems to the sort of people who would already instinctively feel something is off about the control agenda, and give them more useful frames to work with. Because those are the people who might have any hope of doing something useful, without the whole structure of their cognition needing to change first.
That is indeed what I had in mind when I said we’d need another couple sentences to argue that the agent maximizes expected utility under the distribution. It is less circular than it might seem at first glance, because two importantly different kinds of probabilities are involved: uncertainty over the environment (which is what we’re deriving), and uncertainty over the agent’s own actions arising from mixed strategies.