CEV is rather complicated and meta and hence not intended as something you’d do with the first AI you ever tried to build. CEV might be something that everyone inside a project agreed was an acceptable mutual target for their second AI. (The first AI should probably be a Task AGI.)
So MIRI doesn’t focus on CEV, etc. because the world hasn’t nailed down step one yet. We’re extremely worried that humanity’s on track to fail step one; and it doesn’t matter how well we do on step two if we don’t pull off step one. That doesn’t mean that stopping at step one and never shooting for anything more ambitious would be acceptable; by default I’d consider that an existential catastrophe in its own right.
Yeah, CEV itself seemed like a long shot—but my thought process was that maintaining human control wouldn’t be enough for step one, both because I think it’s not enough at the limit, but also because the human component might inherently be a limiting factor that makes it not very competitive. But the more I thought about it, the weaker that assumption of inherent-ness seemed, so I agree in that the most this post could be saying is that the timeline gap between something like Task AGI and figuring out step 2 is short—but which I expect isn’t very groundbreaking.
Also ,there’s no proof that CEV would work. Maybe values are incoherent.
The arbital article is no help.
Asking what everyone would want* if they knew what the AI knew, and doing what they’d all predictably agree on, is just about the least jerky thing you can do.
How do we know that they would agree? That just begs the question. Saying that you shouldn’t be “jerky”, ie. selfish, doesn’t tell you what kind of unselfishness to have instead. Clearly ,the left and the right don’t agree on the best kind of altruism—laying down your life to stop the spread of socialism, versus sacrificing your income to implement socialism.
I consider the Arbital article on CEV the best reference for the topic. It says:
So MIRI doesn’t focus on CEV, etc. because the world hasn’t nailed down step one yet. We’re extremely worried that humanity’s on track to fail step one; and it doesn’t matter how well we do on step two if we don’t pull off step one. That doesn’t mean that stopping at step one and never shooting for anything more ambitious would be acceptable; by default I’d consider that an existential catastrophe in its own right.
Yeah, CEV itself seemed like a long shot—but my thought process was that maintaining human control wouldn’t be enough for step one, both because I think it’s not enough at the limit, but also because the human component might inherently be a limiting factor that makes it not very competitive. But the more I thought about it, the weaker that assumption of inherent-ness seemed, so I agree in that the most this post could be saying is that the timeline gap between something like Task AGI and figuring out step 2 is short—but which I expect isn’t very groundbreaking.
Also ,there’s no proof that CEV would work. Maybe values are incoherent.
The arbital article is no help.
How do we know that they would agree? That just begs the question. Saying that you shouldn’t be “jerky”, ie. selfish, doesn’t tell you what kind of unselfishness to have instead. Clearly ,the left and the right don’t agree on the best kind of altruism—laying down your life to stop the spread of socialism, versus sacrificing your income to implement socialism.