I disagree with John’s post in a similar way to how Steven Byrnes disagrees in the comments. It’s not the speed of takeoff that matters, it’s our loss of control. If the takeoff happens very fast, but we have an automatic “turn it off if it gets too smart” system in place that successfully turns it off, and then we test it in a highly impaired mode (lowered intelligence/functionality, lowered speed) to learn about it… this is potentially a win not a loss.
As for John W’s point ‘getting what you measure’, yes. That’s the hard task interpretability must conquer. I think it is possible to hill climb towards getting better at this so long as you are in control and able to run many separate experiments.
https://www.lesswrong.com/posts/z8s3bsw3WY9fdevSm/boxing-an-ai ;-)
https://www.lesswrong.com/posts/wgcFStYwacRB8y3Yp/timelines-are-relevant-to-alignment-research-timelines-2-of
https://www.lesswrong.com/posts/p62bkNAciLsv6WFnR/how-do-we-align-an-agi-without-getting-socially-engineered
I disagree with John’s post in a similar way to how Steven Byrnes disagrees in the comments. It’s not the speed of takeoff that matters, it’s our loss of control. If the takeoff happens very fast, but we have an automatic “turn it off if it gets too smart” system in place that successfully turns it off, and then we test it in a highly impaired mode (lowered intelligence/functionality, lowered speed) to learn about it… this is potentially a win not a loss.
As for John W’s point ‘getting what you measure’, yes. That’s the hard task interpretability must conquer. I think it is possible to hill climb towards getting better at this so long as you are in control and able to run many separate experiments.