ryan_greenblatt comments on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

ryan_greenblatt 24 May 2024 18:52 UTC
4 points
0
Insofar as the hope is:
1. Figure out how to approximate sampling from the Bayesian posterior (using e.g. GFlowNets or something).
2. Do something else that makes this actually useful for “improving” OOD generalization in some way.
It would be nice to know what (2) actually is and why we needed step (1) for it. As far as I can tell, Bengio hasn’t stated any particular hope for (2) which depends on (1).

Rather, my original reply was meant to explain why the Bayesian aspect of Bengio’s research agenda is a core part of its motivation, in response to your remark that “from my understanding, the bayesian aspect of [Bengio’s] agenda doesn’t add much value”.

I agree that if the Bayesian aspect of the agenda did a specific useful thing like ‘”improve” OOD generalization’ or ‘allow us to control/understand OOD generalization’, then this aspect of the agenda would be useful.

However, I think the Bayesian aspect of the agend won’t do this and thus it won’t add much value. I agree that Bengio (and others) think that the Bayesian aspect of the agenda will do things like this—but I disagree and don’t see the story for this.

I agree that “actually use Bayesian methods” sounds like the sort of thing that could help you solve dangerous OOD generalization issues, but I don’t think it clearly does.

(Unless of course someone has a specific proposal for (2) from my above decomposition which actually depends on (1).)

However, there could still be advantages to an explicitly Bayesian method. For example, off the top of my head

1-3 don’t seem promising/important to me. (4) would be useful, but I don’t see why we needed the bayesian aspect of it. If we have some sort of parametric model class which we can make smart enough to reason effectively about the world, just making an ensemble of these surely gets you most of the way there.

To be clear, if the hope is “figure out how to make an ensemble of interpretable predictors which are able to model the world as well as our smartest model”, then this would be very useful (e.g. it would allow us to avoid ELK issues). But all the action was in making interpretable predictors, no bayesian aspect was required.