I don’t think we necessarily disagree about anything except ‘how hard is CEV’? It sounds to me like we’d mostly have the same intuitions conditional on ‘CEV is very hard’
I disagree on the plausibility of a stop-the-world scheme. The way things are now is as safe or stable as they will ever be. I think it’s a better plan to use the rising tide of AI capabilities to flesh out CEV. In particular, since the details of CEV depend on not-yet-understood details of human cognition, one should think about how to use near-future AI power to extract those details from the available data regarding brains and behavior. But AI can contribute everywhere else in the development of CEV theory and practice too.
But I won’t try to change your thinking. In practice, MIRI’s work on simpler forms of alignment (and its work on logical induction, decision theory paradigms, etc) is surely relevant to a “CEV now” approach too. What I am wondering is, where is the de-facto center of gravity for a “CEV now” effort? I’ve said many times that June Ku’s blueprint is the best I’ve seen, but I don’t see anyone working to refine it. And there are other people whose work seems relevant and has a promising rigor, but I’m not sure how it best fits together.
edit: I do see a value to non-CEV alignment work, distinct from figuring out how to stop the world safely: and that is to reduce the risk arising from the general usage of advanced AI systems. So it is a contribution to AI safety.
I disagree on the plausibility of a stop-the-world scheme. The way things are now is as safe or stable as they will ever be. I think it’s a better plan to use the rising tide of AI capabilities to flesh out CEV. In particular, since the details of CEV depend on not-yet-understood details of human cognition, one should think about how to use near-future AI power to extract those details from the available data regarding brains and behavior. But AI can contribute everywhere else in the development of CEV theory and practice too.
But I won’t try to change your thinking. In practice, MIRI’s work on simpler forms of alignment (and its work on logical induction, decision theory paradigms, etc) is surely relevant to a “CEV now” approach too. What I am wondering is, where is the de-facto center of gravity for a “CEV now” effort? I’ve said many times that June Ku’s blueprint is the best I’ve seen, but I don’t see anyone working to refine it. And there are other people whose work seems relevant and has a promising rigor, but I’m not sure how it best fits together.
edit: I do see a value to non-CEV alignment work, distinct from figuring out how to stop the world safely: and that is to reduce the risk arising from the general usage of advanced AI systems. So it is a contribution to AI safety.