ryan_greenblatt comments on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

ryan_greenblatt 22 May 2024 21:19 UTC
9 points
0

I thus think its fair to say that—empirically—neural networks do not robustly quantify uncertainty in a reliable way when out-of-distribution

Sure, but why will the bayesian model reliably quantify uncertainty OOD? There is also no guarantee of this (OOD).

Whether or not you get reliable uncertainly quanitification will depend on your prior. If you have (e.g.) the NN prior, I expect the uncertainly quantification is similar to if you trained an ensemble.

E.g., you’ll find a bunch of NNs (in the bayesian posterior) which also have the spurious correlation that a trained NN (or ensemble of NNs) would have.

If you have some other prior, why can’t we regularize our NN to match this?

(Maybe I’m confused about this?)
- ryan_greenblatt 22 May 2024 21:34 UTC
  2 points
  0
  Parent
  Separately, I guess I’m not that worried about failures in which the network itself doesn’t “understand” what’s going on. So the main issue are cases where the model in some sense knows, but doesn’t report this. (E.g. ELK problems at least broadly speaking.)
  
  I think there are bunch of issues that look sort of like this now, but this will go away once models are smarter enough to automate R&D etc.
  
  I’m not worried about future models murdering us because they were confused and though this would be what we wanted due to a spurious correlation.
  
  (I do have some concerns around jailbreaking, but I also think that will look pretty different and the adversarial case is very different. And there appear to be solutions which are more promising that bayesian ML.)
  - Joar Skalse 24 May 2024 15:36 UTC
    1 point
    0
    Parent
    I think the distinction between these two cases often can be somewhat vague.
    Why do you think that the adversarial case is very different?
- Joar Skalse 24 May 2024 15:36 UTC
  1 point
  0
  Parent
  I think you’re perhaps reading me as being more bullish on Bayesian methods than I in fact am—I am not necessarily saying that Bayesian methods in fact can solve OOD generalisation in practice, nor am I saying that other methods could not also do this. In fact, I was until recently very skeptical of Bayesian methods, before talking about it with Yoshua Bengio. Rather, my original reply was meant to explain why the Bayesian aspect of Bengio’s research agenda is a core part of its motivation, in response to your remark that “from my understanding, the bayesian aspect of [Bengio’s] agenda doesn’t add much value”.
  I agree that if a Bayesian learner uses the NN prior, then its behaviour should—in the limit—be very similar to training a large ensemble of NNs. However, there could still be advantages to an explicitly Bayesian method. For example, off the top of my head:
  1. It may be that you need an extremely large ensemble to approximate the posterior well, and that the Bayesian learner can approximate it much better with much less resources.
  2. It may be that you more easily can prove learning-theoretic guarantees for the Bayesian learner.
  3. It may be that a Bayesian learner makes it easier to condition on events that have a very small probability in your posterior (such as, for example, the event that a particular complex plan is executed).
  4. It may be that the Bayesian learner has a more interpretable prior, or that you can reprogram it more easily.
  And so on, these are just some examples. Of course, if you get these benefits in practice is a matter of speculation until we have a concrete algorithm to analyse. All I’m saying is that there are valid and well-motivated reasons to explore this particular direction.
  - ryan_greenblatt 24 May 2024 18:52 UTC
    4 points
    0
    Parent
    Insofar as the hope is:
    
    Figure out how to approximate sampling from the Bayesian posterior (using e.g. GFlowNets or something).
    Do something else that makes this actually useful for “improving” OOD generalization in some way.
    
    It would be nice to know what (2) actually is and why we needed step (1) for it. As far as I can tell, Bengio hasn’t stated any particular hope for (2) which depends on (1).
    
    Rather, my original reply was meant to explain why the Bayesian aspect of Bengio’s research agenda is a core part of its motivation, in response to your remark that “from my understanding, the bayesian aspect of [Bengio’s] agenda doesn’t add much value”.
    
    I agree that if the Bayesian aspect of the agenda did a specific useful thing like ‘”improve” OOD generalization’ or ‘allow us to control/understand OOD generalization’, then this aspect of the agenda would be useful.
    
    However, I think the Bayesian aspect of the agend won’t do this and thus it won’t add much value. I agree that Bengio (and others) think that the Bayesian aspect of the agenda will do things like this—but I disagree and don’t see the story for this.
    
    I agree that “actually use Bayesian methods” sounds like the sort of thing that could help you solve dangerous OOD generalization issues, but I don’t think it clearly does.
    
    (Unless of course someone has a specific proposal for (2) from my above decomposition which actually depends on (1).)
    
    However, there could still be advantages to an explicitly Bayesian method. For example, off the top of my head
    
    1-3 don’t seem promising/important to me. (4) would be useful, but I don’t see why we needed the bayesian aspect of it. If we have some sort of parametric model class which we can make smart enough to reason effectively about the world, just making an ensemble of these surely gets you most of the way there.
    
    To be clear, if the hope is “figure out how to make an ensemble of interpretable predictors which are able to model the world as well as our smartest model”, then this would be very useful (e.g. it would allow us to avoid ELK issues). But all the action was in making interpretable predictors, no bayesian aspect was required.