I worry that “Prosaic Alignment Is Doomed” seems a bit… off as the most appropriate crux. At least for me. It seems hard for someone to justifiably know that this is true with enough confidence to not even try anymore. To have essayed or otherwise precluded all promising paths of inquiry, to not even engage with the rest of the field, to not even try to argue other researchers out of their mistaken beliefs, because it’s all Hopeless.
Consider the following analogy: Someone who wants to gain muscle, but has thought a lot about nutrition and their genetic makeup and concluded that Direct Exercise Gains Are Doomed, and they should expend their energy elsewhere.
OK, maybe. But how about try going to the gym for a month anyways and see what happens?
The point isn’t “EY hasn’t spent a month of work thinking about prosaic alignment.” The point is that AFAICT, by MIRI/EY’s own values, valuable-seeming plans are being left to rot on the cutting room floor. Like,“core MIRI staff meet for an hour each month and attack corrigibility/deceptive cognition/etc with all they’ve got. They pay someone to transcribe the session and post the fruits / negative results / reasoning to AF, without individually committing to following up with comments.”
(I am excited by Rob Bensinger’s comment that this post is the start of more communication from MIRI)
I worry that “Prosaic Alignment Is Doomed” seems a bit… off as the most appropriate crux. At least for me. It seems hard for someone to justifiably know that this is true with enough confidence to not even try anymore. To have essayed or otherwise precluded all promising paths of inquiry, to not even engage with the rest of the field, to not even try to argue other researchers out of their mistaken beliefs, because it’s all Hopeless.
Consider the following analogy: Someone who wants to gain muscle, but has thought a lot about nutrition and their genetic makeup and concluded that Direct Exercise Gains Are Doomed, and they should expend their energy elsewhere.
OK, maybe. But how about try going to the gym for a month anyways and see what happens?
The point isn’t “EY hasn’t spent a month of work thinking about prosaic alignment.” The point is that AFAICT, by MIRI/EY’s own values, valuable-seeming plans are being left to rot on the cutting room floor. Like, “core MIRI staff meet for an hour each month and attack corrigibility/deceptive cognition/etc with all they’ve got. They pay someone to transcribe the session and post the fruits / negative results / reasoning to AF, without individually committing to following up with comments.”
(I am excited by Rob Bensinger’s comment that this post is the start of more communication from MIRI)