johnswentworth comments on Seriously, what goes wrong with “reward the agent when it makes you smile”?

johnswentworth 15 Aug 2022 6:05 UTC
LW: 10 AF: 7
0
AF
Do you mean “hardcoded reward circuit”
I’m not that committed to the RL frame, but roughly speaking yes. Whatever values we have are probably generated by ~tens of hardcoded things. Anyway, on to the meat of the discussion...
It seems like a whole bunch of people are completely thrown off by use of the word “search”. So let’s taboo that and talk about what’s actually relevant here.
We should expect compression, and we should expect general-purpose problem solving (i.e. the ability to take a fairly arbitrary problem in the training environment and solve it reasonably well). The general-purpose part comes from a combination of (a) variation in what the system needs to do to achieve good performance in training, and (b) the recursive nature of problem solving, i.e. solving one problem involves solving a wide variety of subproblems. Compactness means that it probably won’t be a whole boatload of case-specific heuristics; lookup tables are not compact. A subroutine for reasonably-general planning or problem-solving (i.e. take a problem statement, figure out a plan or solution) is the key thing we’re talking about here. Possibly a small number of such subroutines for a few different problem-classes, but not a large number of such subroutines, because compactness. My guess would be basically just one.
That probably will not look like babble and prune. It may look like a general-purpose heuristic-generator (like e.g. relaxation based heuristic generation). Or it may look like general-purpose efficiency tricks, like caching solutions to common subproblems. Or it may look like harcoded heuristics which are environment-specific but reasonably goal-agnostic (like e.g. the sort of thing in Mazes and Duality yields a maze-specific heuristic, but one which applies to a wide variety of path finding problems within that maze). Or it may look like harcoded strategies for achieving instrumentally convergent goals in the training environment (really this is another frame of caching solutions to common subproblems). Or it may look like learning instrumentally convergent concepts and heuristics from the training environment (i.e. natural abstractions; really this is another frame on environment-specific but goal-agnostic heuristics). Probably it’s a combination of all of those, and others too.
The important point is that it’s a problem-solving subroutine which is goal-agnostic (though possibly environment-specific). Pass in a goal, it figures out how to achieve that goal. And we do see this with humans: you can give humans pretty arbitrary goals, pretty arbitrary jobs to do, pretty arbitrary problems to solve, and they’ll go figure out how to do it.
- Nora Belrose 15 Aug 2022 13:52 UTC
  6 points
  5
  Parent
  I agree that AGI will need general purpose problem solving routines (by definition). I also agree that this requires something like recursive decomposition of problems into subproblems. I’m just very skeptical that the kinds of neural nets we’re training right now can learn to do anything remotely like that— I think it’s much more likely that people will hard code this type of reasoning into the compute graph with stuff like MCTS. This has already been pretty useful for e.g. MuZero. Once we’re hard coding search it’s less scary because it’s more interpretable and we can see exactly where the mesaobjective is.
  I also don’t really buy the compactness argument at all. I think neural nets are biased toward flat minima / broad basins but these don’t generally correspond to “simple” functions in the Kolmogorov sense; they’re more like equivalence classes of diverse bundles of heuristics that all get about the same train and val loss. I’m interpreting this paper as providing some evidence in that direction.
  - johnswentworth 15 Aug 2022 17:25 UTC
    2 points
    0
    Parent
    I’m just very skeptical that the kinds of neural nets we’re training right now can learn to do anything remotely like that— I think it’s much more likely that people will hard code this type of reasoning into the compute graph with stuff like MCTS. This has already been pretty useful for e.g. MuZero. Once we’re hard coding search it’s less scary because it’s more interpretable and we can see exactly where the mesaobjective is.
    I hope that you’re right; that would make Retargeting The Search very easy, and basically eliminates the inner alignment problem. Assuming, of course, that we can somehow confidently rule out the rest of the net doing any search in more subtle ways.
- TurnTrout 22 Aug 2022 15:54 UTC
  LW: 4 AF: 4
  0
  AF Parent
  Probably it’s a combination of all of those, and others too
  This seems like roughly what I had in mind by “contextually activated computations” (probably with a few differences about when/how the subroutines will be goal-agnostic). I was imagining computations like “contextually activated cached death-avoidance policy influences” and “contextually activated steering of plans towards paperclip production, in generalizations of the historical reinforcement contexts for paperclip-reward.”