Separately, I guess I’m not that worried about failures in which the network itself doesn’t “understand” what’s going on. So the main issue are cases where the model in some sense knows, but doesn’t report this. (E.g. ELK problems at least broadly speaking.)
I think there are bunch of issues that look sort of like this now, but this will go away once models are smarter enough to automate R&D etc.
I’m not worried about future models murdering us because they were confused and though this would be what we wanted due to a spurious correlation.
(I do have some concerns around jailbreaking, but I also think that will look pretty different and the adversarial case is very different. And there appear to be solutions which are more promising that bayesian ML.)
Separately, I guess I’m not that worried about failures in which the network itself doesn’t “understand” what’s going on. So the main issue are cases where the model in some sense knows, but doesn’t report this. (E.g. ELK problems at least broadly speaking.)
I think there are bunch of issues that look sort of like this now, but this will go away once models are smarter enough to automate R&D etc.
I’m not worried about future models murdering us because they were confused and though this would be what we wanted due to a spurious correlation.
(I do have some concerns around jailbreaking, but I also think that will look pretty different and the adversarial case is very different. And there appear to be solutions which are more promising that bayesian ML.)
I think the distinction between these two cases often can be somewhat vague.
Why do you think that the adversarial case is very different?