Interested in many things. I have a personal blog at https://www.beren.io/
beren
Hedonic Loops and Taming RL
The ‘four years’ they explicitly mention does seem very short to me for ASI unless they know something we don’t...
AI x-risk is not far off at all, it’s something like 4 years away IMO
Can I ask where this four years number is coming from? It was also stated prominently in the new ‘superalignment’ announcement (https://openai.com/blog/introducing-superalignment). Is this some agreed upon median timelines at OAI? Is there an explicit plan to build AGI in four years? Is there strong evidence behind this view—i.e. that you think you know how to build AGI explicitly and it will just take four years more compute/scaling?
[Linkpost] Introducing Superalignment
Hi there! Thanks for this comment. Here are my thoughts:
Where do highly capable proposals/amortised actions come from?
(handwave) lots of ‘experience’ and ‘good generalisation’?
Pretty much this. We know empirically that deep learning generalizes pretty well from a lot of data as long as it is reasonable representative. I think that fundamentally this is due to the nature of our reality that there are generalizable patterns which is ultimately due to the sparse underlying causal graph. It is very possible that there are realities where this isn’t true and in those cases this kind of ‘intelligence’ would not be possible.
r...? This seems to be to be where active learning and deliberate/creative exploration come in
It’s a Bayes-adaptivity problem, i.e. planning for value-of-information
This is basically what ‘science’ and ‘experimentalism’ are in my ontology
‘Play’ and ‘practice’ are the amortised equivalent (where explorative heuristics are baked in)
Again, I completely agree here. In practice in large environments it is necessary to explore if you can’t reach all useful states from a random policy. In these cases, it is very useful to a.) have an explicit world model so you can learn from sensory information which is much higher bandwidth than reward usually and generalizes further and in an uncorrelated way, and b.) do some kind of active exploration. Exploring according to maximizing info-gain is probably close to optimal, although whether this is actually theoretically optimal is I tihnk still an open question. The main issue is that info-gain is hard to cmopute/approximate tractably, since it requires keeping a close track of your uncertainty, and DL models are computationally tractable by explicitly throwing away all the uncertainty and only really maintaining point predictions.
animals are evidence that some amortised play heuristics are effective! Even humans only rarely ‘actually do deliberate experimentalism’
but when we do, it’s maybe the source of our massive technological dominance?
Like I don’t know to what extent there are ‘play heuristics’ at a behavioural level vs some kind of intrinsic drive for novelty / information gain but yes, having these drives ‘added to your reward function’ is generally useful in RL settings and we know this happens in the brain as well—i.e. there are dopamine neurons responsive to proxies of information gain (and exactly equal to information gain in simple bandit-like settings where this is tractable)
When is deliberation/direct planning tractable?
In any interestingly-large problem, you will never exhaustively evaluate
e.g. maybe no physically realisable computer in our world can ever evaluate all Go strategies, much less evaluating strategies for ‘operate in the world itself’!
What properties of options/proposals lend themselves?
(handwave) ‘Interestingly consequential’ - the differences should actually matter enough to bother computing!
Temporally flexible
The ‘temporal resolution’ of the strategy-value landscape may vary by orders of magnitude
so the temporal resolution of the proposals (or proposal-atoms) should too, on pain of intractability/value-loss/both
So there are a number of circumstances where direct planning is valuable and useful. I agree about your conditions and especially the correct action step-size as well as discrete actions and known not super stochastic dynamics. Other useful conditions are when it’s easy to evaluate the branches of the tree without having gone all the way down to the leaves—i.e. in games like Chess/GO it’s often very easy to know that some move tree is intrinsically doomed without having explored all of it. This is a kind of convexity to the state space (not literally mathematically, but intuitively) which makes optimization much easier. Similarly, when good proposals can be made due to linearity / generalizability in the action space it is easy to prune actions and trees.
Where does strong control/optimisation come from?
Strong control comes from where strong learning in general comes from—lots of compute and data—and for planning especially compute. The optimal trade-off between amortized and direct optimization given a fixed compute budget is super interesting and I don’t think we have any good models of this yet.
Another thing that I think is fairly underestimated among people on LW compared to people doing deep RL is that open-loop planning is actually very hard and bad at dealing with long time horizons. This is basically due to stochasticity and chaos theory—future prediction is hard. Small mistakes in either modelling or action propagate very rapidly to create massive uncertainties about the future so that your optimal posterior rapidly dwindles to a maximum entropy distribution. The key thing in long term planning is really adaptability and closed-loop control—i.e. seeing feedback and adjusting your actions in response to feedback. This is how almost all practical control systems actually work and in practice in deep RL with planning everybody actually uses MPC so replans every step.
The problem is not so much which one of 1,2,3 to pick but whether ‘we’ get a chance to pick it at all. If there is space, free energy, and diversity, there will be evolution going on among populations and evolution will consistently push things in the direction towards more reproduction up until it hits a Malthusian limit at which point it will push towards greater competition and economic/reproductive efficiency. The only way to avoid this is to remove the preconditions for evolution—any of variation, selection, heredity—but these seem quite natural in a world of large AI populations so in practice this will require some level of centralized control
This is obviously true; any AI complete problem can be trivially reduced to the problem of writing an AI program that solves the problem. That isn’t really a problem for the proposal here. The point isn’t that we could avoid making AGI by doing this, the point is that we can do this in order to get AI systems that we can trust without having to solve interpretability.
Maybe I’m being silly but then I don’t understand the safety properties of this approach. If we need an AGI based on uninterpretable DL to build this, then how do we first check if this AGI is safe?
I moderately agree here but I still think the primary factor is centralization of the value chain. The more of the value chain is centralized, the easier it is to control. My guess we can make this argument more formalized by thinking of things in terms of a dependency graph—if we imagine the economic process from sand + energy → DL models then the important measure is the centrality of the hubs in this graph. If we can control and/or cut these hubs, then the entire DL ecosystem falls apart. Conveniently/unfortunately this is also where most of the economic profit is likely to be accumulating by standard industrial economic laws, and hence this is also where there will be the most resources resisting regulation.
As I see it, there are two fundamental problems here:
1.) Generating an interpretable expert system code for an AGI is probably already AGI complete. It seems unlikely that a non-AGI DL model can output code for an AGI—especially given that it is highly unlikely that there would be expert system AGIs in its training set—or even things close to expert-system AGIs if deep learning keeps far out pacing GOFAI techniques.
2.) Building an interpretable expert system AGI is likely not just AGI complete but a fundamentally much harder problem than building a DL AGI system. Intelligence is extremely detailed, messy, and highly heuristic. All our examples of intelligent behaviour come from large blobs of optimized compute -- both brains and DL systems—and none from expert systems. The actual inner workings of intelligence might just be fundamentally uninterpretable in their complexity except at a high level—i.e. ‘this optimized blob is the output of approximate bayesian inference over this extremely massive dataset’
- Jul 5, 2023, 4:07 PM; 1 point) 's comment on Using (Uninterpretable) LLMs to Generate Interpretable AI Code by (
Interesting post! Do you have papers for the claims on why mixed activation functions perform worse? This is something I have thought about a little bit but not looked deeply into. Would appreciate links here? My naive thinking is that it mostly doesn’t work due to difficulties of conditioning and keeping the loss landscape smooth and low curvature with different activation functions in a layer. With a single activation function, it is relatively straightforward to design an initialization that doesn’t blow up—with mixed ones it seems your space of potential numerical difficulties increases massively.
Exactly this. This is the relationship in RL between the discount factor and the probability of transitioning into an absorbing state (death)
I think this is a really good post. You might be interested in these two posts which explore very similar arguments on the interactions between search in the world model and more general ‘intuitive policies’ as well as the fact that we are always optimizing for our world/reward model and not reality and how this affects how agents act.
Yes! This would be valuable. Generally, getting a sense of the ‘self-awareness’ of a model in terms of how much it knows about itself would be a valuable thing to start testing for.
I don’t think model’s currently have this ability by default anyway. But we definitely should think very hard before letting them do this!
Yes, I think what I proposed here is the broadest and crudest thing that will work. It can of course be much more targeted to specific proposals or posts that we think are potentially most dangerous. Using existing language models to rank these is an interesting idea.
The case for removing alignment and ML research from the training dataset
Announcing Apollo Research
I’m very glad you wrote this. I have had similar musings previously as well, but it is really nice to see this properly written up and analyzed in a more formal manner.
This is definitely possible and is essentially augmenting the state variables with additional homeostatic variables and then learning policies on the joint state space. However there are some clever experiments such as the linked Morrison and Berridge one demonstrating that this is not all that is going on—specifically many animals appear to be able to perform zero-shot changes in policy when rewards change even if they have not experienced this specific homeostatic variable before—I.e. mice suddenly chase after salt water which they previously disliked when put in a state of salt deprivation which they had never before experienced