I agree that, given MIRI’s model of AGI emergence, getting it slightly wrong would be catastrophic. But that’s my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way.
Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure.
so that there are no “immense optimization pressures”.
My intuition tells me that the single-hose solution is not enough for AGI and we instead need that is flawless in practice and in theory.
I think that’s what Eliezer says, as well, hence his pessimism and focus on “dying with dignity”. But we won’t know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because “there is no fire alarm for superintelligence”, but the alternative is strictly worse, because the problem is so complex.
I agree that, given MIRI’s model of AGI emergence, getting it slightly wrong would be catastrophic. But that’s my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way.
I had mentioned in another post that https://www.lesswrong.com/posts/mc2vroppqHsFLDEjh/aligned-ai-needs-slack
so that there are no “immense optimization pressures”.
I think that’s what Eliezer says, as well, hence his pessimism and focus on “dying with dignity”. But we won’t know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because “there is no fire alarm for superintelligence”, but the alternative is strictly worse, because the problem is so complex.