“It’s already happened at least once, at a major AI company, for an important AI system, yes in the future people will be paying more attention probably but that only changes the probability by an order of magnitude or so.”
Tbc, I think it will happen again; I just don’t think it will have a large impact on the world.
Isn’t the cheap solution just… being more cautious about our programming, to catch these bugs before the code starts running? And being more concerned about these signflip errors in general?
If you’re writing the AGI code, sure. But in practice it won’t be you, so you’d have to convince other people to do this. If you tried to do that, I think the primary impact would be “ML researchers are more likely to think AI risk concerns are crazy” which would more than cancel out the potential benefit, even if I believed the risk was 1 in 30,000.
Because you think it’ll be caught in time, etc. Yes. I think it will probably be caught in time too.
OK, so yeah, the solution isn’t quite as cheap as simply “Shout this problem at AI researchers.” It’s gotta be more subtle and respectable than that. Still, I think this is a vastly easier problem to solve than the normal AI alignment problem.
Tbc, I think it will happen again; I just don’t think it will have a large impact on the world.
If you’re writing the AGI code, sure. But in practice it won’t be you, so you’d have to convince other people to do this. If you tried to do that, I think the primary impact would be “ML researchers are more likely to think AI risk concerns are crazy” which would more than cancel out the potential benefit, even if I believed the risk was 1 in 30,000.
Because you think it’ll be caught in time, etc. Yes. I think it will probably be caught in time too.
OK, so yeah, the solution isn’t quite as cheap as simply “Shout this problem at AI researchers.” It’s gotta be more subtle and respectable than that. Still, I think this is a vastly easier problem to solve than the normal AI alignment problem.