Joseph Miller comments on Bengio’s Alignment Proposal: “Towards a Cautious Scientist AI with Convergent Safety Bounds”

Joseph Miller 29 Feb 2024 21:56 UTC
10 points
1
To avoid catastrophic errors, now consider a risk management approach, with an AI that represents not a single H but a large set of them, in the form of a generative distribution over hypotheses H
On first reading this post, the whole proposal seemed so abstract that I wouldn’t know how to even begin making such an AI. However after a very quick skim of some of Bengio’s recent papers I think I have more of a sense for what he has in mind.
I think his approach is roughly to create a generative model that constructs Bayesian Networks edge by edge, where the likelihood of generating any given network represents the likelihood that that causal model is the correct hypothesis.
And he’s using GFlowNets to do it, which are a new type of ML/RL model developed by MILA that generate objects with likelihood proportional to some reward function (unlike normal RL which always tries to achieve maximum reward). They seem to have mostly been used for biological problems so far.