I agree. However, in my case at least the 1/million probability is not for that reason, but for much more concrete reasons, e.g. “It’s already happened at least once, at a major AI company, for an important AI system, yes in the future people will be paying more attention probably but that only changes the probability by an order of magnitude or so.”
Isn’t the cheap solution just… being more cautious about our programming, to catch these bugs before the code starts running? And being more concerned about these signflip errors in general? It’s not like we need to solve Alignment Problem 2.0 to figure out how to prevent signflip. It’s a just an ordinary bug. Like, what happened already with OpenAI could totally have been prevented with an extra hour or so of eyeballs poring over the code, right? (or more accurately, whoever wrote the code in the first place being on the lookout for this kind of error?)
“It’s already happened at least once, at a major AI company, for an important AI system, yes in the future people will be paying more attention probably but that only changes the probability by an order of magnitude or so.”
Tbc, I think it will happen again; I just don’t think it will have a large impact on the world.
Isn’t the cheap solution just… being more cautious about our programming, to catch these bugs before the code starts running? And being more concerned about these signflip errors in general?
If you’re writing the AGI code, sure. But in practice it won’t be you, so you’d have to convince other people to do this. If you tried to do that, I think the primary impact would be “ML researchers are more likely to think AI risk concerns are crazy” which would more than cancel out the potential benefit, even if I believed the risk was 1 in 30,000.
Because you think it’ll be caught in time, etc. Yes. I think it will probably be caught in time too.
OK, so yeah, the solution isn’t quite as cheap as simply “Shout this problem at AI researchers.” It’s gotta be more subtle and respectable than that. Still, I think this is a vastly easier problem to solve than the normal AI alignment problem.
I agree. However, in my case at least the 1/million probability is not for that reason, but for much more concrete reasons, e.g. “It’s already happened at least once, at a major AI company, for an important AI system, yes in the future people will be paying more attention probably but that only changes the probability by an order of magnitude or so.”
Isn’t the cheap solution just… being more cautious about our programming, to catch these bugs before the code starts running? And being more concerned about these signflip errors in general? It’s not like we need to solve Alignment Problem 2.0 to figure out how to prevent signflip. It’s a just an ordinary bug. Like, what happened already with OpenAI could totally have been prevented with an extra hour or so of eyeballs poring over the code, right? (or more accurately, whoever wrote the code in the first place being on the lookout for this kind of error?)
Tbc, I think it will happen again; I just don’t think it will have a large impact on the world.
If you’re writing the AGI code, sure. But in practice it won’t be you, so you’d have to convince other people to do this. If you tried to do that, I think the primary impact would be “ML researchers are more likely to think AI risk concerns are crazy” which would more than cancel out the potential benefit, even if I believed the risk was 1 in 30,000.
Because you think it’ll be caught in time, etc. Yes. I think it will probably be caught in time too.
OK, so yeah, the solution isn’t quite as cheap as simply “Shout this problem at AI researchers.” It’s gotta be more subtle and respectable than that. Still, I think this is a vastly easier problem to solve than the normal AI alignment problem.