I mostly agree with that relying on real world data is necessary for better understanding our messy world and that in most cases this approach is favorable.
There’s a part of me that thinks AI is a different case though, since getting it even slightly wrong will be catastrophic. Experimental alignment research might get us most of the way to aligned AI, but there will probably still be issues that aren’t noticeable because the AIs we are experimenting on won’t be powerful enough to reveal them. Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure. My intuition tells me that the single-hose solution is not enough for AGI and we instead need something that is flawless in practice and in theory.
I agree that, given MIRI’s model of AGI emergence, getting it slightly wrong would be catastrophic. But that’s my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way.
Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure.
so that there are no “immense optimization pressures”.
My intuition tells me that the single-hose solution is not enough for AGI and we instead need that is flawless in practice and in theory.
I think that’s what Eliezer says, as well, hence his pessimism and focus on “dying with dignity”. But we won’t know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because “there is no fire alarm for superintelligence”, but the alternative is strictly worse, because the problem is so complex.
I mostly agree with that relying on real world data is necessary for better understanding our messy world and that in most cases this approach is favorable.
There’s a part of me that thinks AI is a different case though, since getting it even slightly wrong will be catastrophic. Experimental alignment research might get us most of the way to aligned AI, but there will probably still be issues that aren’t noticeable because the AIs we are experimenting on won’t be powerful enough to reveal them. Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure. My intuition tells me that the single-hose solution is not enough for AGI and we instead need something that is flawless in practice and in theory.
I agree that, given MIRI’s model of AGI emergence, getting it slightly wrong would be catastrophic. But that’s my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way.
I had mentioned in another post that https://www.lesswrong.com/posts/mc2vroppqHsFLDEjh/aligned-ai-needs-slack
so that there are no “immense optimization pressures”.
I think that’s what Eliezer says, as well, hence his pessimism and focus on “dying with dignity”. But we won’t know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because “there is no fire alarm for superintelligence”, but the alternative is strictly worse, because the problem is so complex.