We can probably rule out “a spread of situationally-activated computations which steer its actions towards historical reward-correlates”, insofar as that spread is a much less compact policy-encoding than an explicit search process + simple objective(s).
Here’s what I think you mean by an explicit search process:
In every situation, the neural network runs e.g. MCTS with a fixed leaf evaluation function (the simple objective).
On this understanding of your argument, I would be surprised if it went through. Here are a few quick counterpoints.
Outside tiny maze environments, constantly running search with a fixed objective is downright stupid, you’re going to constantly time out; anytime guarantees won’t necessarily save you, they’ll probably be weak or nonexistent; constantly running search will consistently waste computation time which could have been saved by caching computations and then thinking about other things during the rest of the forward pass (aka shards); fixed-depth neural networks also have a speed prior.
And there are many other tricks one can use too—like memoization on subsearches, or A*-style heuristic search, or (one meta-level up from A*) relaxation-based methods to discover heuristics. The key point is that these tricks are all very general purpose: they work on a very wide variety of search problems, and therefore produce general-purpose search algorithms which are more efficient than brute force (at least on realistic problems).
More advanced general-purpose search methods seem to rely relatively little on enumerating possible actions and evaluating their consequences. By the time we get to human-level search capabilities, we see human problem-solvers spend most of their effort on nontrivial problems thinking about subproblems, abstractions and analogies rather than thinking directly about particular solutions.
Memoization and heuristics would definitely count as part of a “spread” of contextually activated computations? Are we even disagreeing?
Humans are the one example we have of general intelligences; they surely have different e.g. inductive biases than ML, and that’s damn important. But even so, humans do not search in every situation in order to optimize a simple objective. Seems like an important hint.
More generally: “If your theory of alignment and/or intelligence is correct, why doesn’t it explain the one datapoint we have on general intelligence?”
any “simplicity prior” that ANNs have is not like the simplicity prior of a programming language. A single forwards pass is acyclic, so loops / recursion are impossible. If NN layers were expressed as programs, the language in question would also have to be acyclic, which would make “search” quite a dumb thing to do anyways.
EDIT Although in OP I did presume a recurrent state! Still important to keep in mind as we consider different architectures, though.
For instance, a plausible Fermi estimate for humans is that our values are ultimately generated from ~tens of simple proxies. (And I would guess that modern ML training would probably result in even fewer, relative to human evolution.)
Do you mean “hardcoded reward circuit” by “proxy”?
I’m not that committed to the RL frame, but roughly speaking yes. Whatever values we have are probably generated by ~tens of hardcoded things. Anyway, on to the meat of the discussion...
It seems like a whole bunch of people are completely thrown off by use of the word “search”. So let’s taboo that and talk about what’s actually relevant here.
We should expect compression, and we should expect general-purpose problem solving (i.e. the ability to take a fairly arbitrary problem in the training environment and solve it reasonably well). The general-purpose part comes from a combination of (a) variation in what the system needs to do to achieve good performance in training, and (b) the recursive nature of problem solving, i.e. solving one problem involves solving a wide variety of subproblems. Compactness means that it probably won’t be a whole boatload of case-specific heuristics; lookup tables are not compact. A subroutine for reasonably-general planning or problem-solving (i.e. take a problem statement, figure out a plan or solution) is the key thing we’re talking about here. Possibly a small number of such subroutines for a few different problem-classes, but not a large number of such subroutines, because compactness. My guess would be basically just one.
That probably will not look like babble and prune. It may look like a general-purpose heuristic-generator (like e.g. relaxation based heuristic generation). Or it may look like general-purpose efficiency tricks, like caching solutions to common subproblems. Or it may look like harcoded heuristics which are environment-specific but reasonably goal-agnostic (like e.g. the sort of thing in Mazes and Duality yields a maze-specific heuristic, but one which applies to a wide variety of path finding problems within that maze). Or it may look like harcoded strategies for achieving instrumentally convergent goals in the training environment (really this is another frame of caching solutions to common subproblems). Or it may look like learning instrumentally convergent concepts and heuristics from the training environment (i.e. natural abstractions; really this is another frame on environment-specific but goal-agnostic heuristics). Probably it’s a combination of all of those, and others too.
The important point is that it’s a problem-solving subroutine which is goal-agnostic (though possibly environment-specific). Pass in a goal, it figures out how to achieve that goal. And we do see this with humans: you can give humans pretty arbitrary goals, pretty arbitrary jobs to do, pretty arbitrary problems to solve, and they’ll go figure out how to do it.
I agree that AGI will need general purpose problem solving routines (by definition). I also agree that this requires something like recursive decomposition of problems into subproblems. I’m just very skeptical that the kinds of neural nets we’re training right now can learn to do anything remotely like that— I think it’s much more likely that people will hard code this type of reasoning into the compute graph with stuff like MCTS. This has already been pretty useful for e.g. MuZero. Once we’re hard coding search it’s less scary because it’s more interpretable and we can see exactly where the mesaobjective is.
I also don’t really buy the compactness argument at all. I think neural nets are biased toward flat minima / broad basins but these don’t generally correspond to “simple” functions in the Kolmogorov sense; they’re more like equivalence classes of diverse bundles of heuristics that all get about the same train and val loss. I’m interpreting this paper as providing some evidence in that direction.
I’m just very skeptical that the kinds of neural nets we’re training right now can learn to do anything remotely like that— I think it’s much more likely that people will hard code this type of reasoning into the compute graph with stuff like MCTS. This has already been pretty useful for e.g. MuZero. Once we’re hard coding search it’s less scary because it’s more interpretable and we can see exactly where the mesaobjective is.
I hope that you’re right; that would make Retargeting The Search very easy, and basically eliminates the inner alignment problem. Assuming, of course, that we can somehow confidently rule out the rest of the net doing any search in more subtle ways.
Probably it’s a combination of all of those, and others too
This seems like roughly what I had in mind by “contextually activated computations” (probably with a few differences about when/how the subroutines will be goal-agnostic). I was imagining computations like “contextually activated cached death-avoidance policy influences” and “contextually activated steering of plans towards paperclip production, in generalizations of the historical reinforcement contexts for paperclip-reward.”
Here’s what I think you mean by an explicit search process:
In every situation, the neural network runs e.g. MCTS with a fixed leaf evaluation function (the simple objective).
On this understanding of your argument, I would be surprised if it went through. Here are a few quick counterpoints.
Outside tiny maze environments, constantly running search with a fixed objective is downright stupid, you’re going to constantly time out; anytime guarantees won’t necessarily save you, they’ll probably be weak or nonexistent; constantly running search will consistently waste computation time which could have been saved by caching computations and then thinking about other things during the rest of the forward pass (aka shards); fixed-depth neural networks also have a speed prior.
(See also the independently written Gradient descent doesn’t select for inner search)
EDIT: Reading your reply comment on that post
Memoization and heuristics would definitely count as part of a “spread” of contextually activated computations? Are we even disagreeing?
Humans are the one example we have of general intelligences; they surely have different e.g. inductive biases than ML, and that’s damn important. But even so, humans do not search in every situation in order to optimize a simple objective. Seems like an important hint.
More generally: “If your theory of alignment and/or intelligence is correct, why doesn’t it explain the one datapoint we have on general intelligence?”
any “simplicity prior” that ANNs have is not like the simplicity prior of a programming language. A single forwards pass is acyclic, so loops / recursion are impossible. If NN layers were expressed as programs, the language in question would also have to be acyclic, which would make “search” quite a dumb thing to do anyways.
EDIT Although in OP I did presume a recurrent state! Still important to keep in mind as we consider different architectures, though.
Initial contextually-activated-heuristics might (low-confidence) starve gradients towards search.
Do you mean “hardcoded reward circuit” by “proxy”?
I’m not that committed to the RL frame, but roughly speaking yes. Whatever values we have are probably generated by ~tens of hardcoded things. Anyway, on to the meat of the discussion...
It seems like a whole bunch of people are completely thrown off by use of the word “search”. So let’s taboo that and talk about what’s actually relevant here.
We should expect compression, and we should expect general-purpose problem solving (i.e. the ability to take a fairly arbitrary problem in the training environment and solve it reasonably well). The general-purpose part comes from a combination of (a) variation in what the system needs to do to achieve good performance in training, and (b) the recursive nature of problem solving, i.e. solving one problem involves solving a wide variety of subproblems. Compactness means that it probably won’t be a whole boatload of case-specific heuristics; lookup tables are not compact. A subroutine for reasonably-general planning or problem-solving (i.e. take a problem statement, figure out a plan or solution) is the key thing we’re talking about here. Possibly a small number of such subroutines for a few different problem-classes, but not a large number of such subroutines, because compactness. My guess would be basically just one.
That probably will not look like babble and prune. It may look like a general-purpose heuristic-generator (like e.g. relaxation based heuristic generation). Or it may look like general-purpose efficiency tricks, like caching solutions to common subproblems. Or it may look like harcoded heuristics which are environment-specific but reasonably goal-agnostic (like e.g. the sort of thing in Mazes and Duality yields a maze-specific heuristic, but one which applies to a wide variety of path finding problems within that maze). Or it may look like harcoded strategies for achieving instrumentally convergent goals in the training environment (really this is another frame of caching solutions to common subproblems). Or it may look like learning instrumentally convergent concepts and heuristics from the training environment (i.e. natural abstractions; really this is another frame on environment-specific but goal-agnostic heuristics). Probably it’s a combination of all of those, and others too.
The important point is that it’s a problem-solving subroutine which is goal-agnostic (though possibly environment-specific). Pass in a goal, it figures out how to achieve that goal. And we do see this with humans: you can give humans pretty arbitrary goals, pretty arbitrary jobs to do, pretty arbitrary problems to solve, and they’ll go figure out how to do it.
I agree that AGI will need general purpose problem solving routines (by definition). I also agree that this requires something like recursive decomposition of problems into subproblems. I’m just very skeptical that the kinds of neural nets we’re training right now can learn to do anything remotely like that— I think it’s much more likely that people will hard code this type of reasoning into the compute graph with stuff like MCTS. This has already been pretty useful for e.g. MuZero. Once we’re hard coding search it’s less scary because it’s more interpretable and we can see exactly where the mesaobjective is.
I also don’t really buy the compactness argument at all. I think neural nets are biased toward flat minima / broad basins but these don’t generally correspond to “simple” functions in the Kolmogorov sense; they’re more like equivalence classes of diverse bundles of heuristics that all get about the same train and val loss. I’m interpreting this paper as providing some evidence in that direction.
I hope that you’re right; that would make Retargeting The Search very easy, and basically eliminates the inner alignment problem. Assuming, of course, that we can somehow confidently rule out the rest of the net doing any search in more subtle ways.
This seems like roughly what I had in mind by “contextually activated computations” (probably with a few differences about when/how the subroutines will be goal-agnostic). I was imagining computations like “contextually activated cached death-avoidance policy influences” and “contextually activated steering of plans towards paperclip production, in generalizations of the historical reinforcement contexts for paperclip-reward.”