Thane Ruthenis comments on A Case for the Least Forgiving Take On Alignment

Thane Ruthenis 6 May 2023 17:07 UTC
10 points
2
an architectural change → Turing machines and their neural equivalents
This, yes. I think I see where the disconnect is, but I’m not sure how to bridge it. Let’s try...
To become universally capable, a system needs two things:
1. “Turing-completeness”: A mechanism by which it can construct arbitrary mathematical objects to describe new environments (including abstract environments).
2. “General intelligence”: an algorithm that can take in any arbitrary mathematical object produced by (1), and employ it for planning.
General intelligence isn’t Turing-completeness itself. Rather, it’s a planning algorithm that has Turing-completeness as a prerequisite. Its binariness is inherited from the binariness of Turing-completeness.
Consider a system that has (1) but not (2), such as your “memory + finite state control” example. While, yes, this system meets the requirements for Turing-complete world-modeling, this capability can’t be leveraged. Suppose it assembles a completely new region of its world-model. What would it do with it? It needs to leverage that knowledge for constructing practically-implementable plans, but its policy function/heuristics is a separate piece of cognition. So either needs:
- To get some practical experience, via trial-and-error experiments or a policy gradient, to arrive at good heuristics to employ in this new environment.
- A policy function that can gracefully expand to this new region — which can plan given only pure knowledge of the environment structure. A policy function that scales in lockstep with the world-model.
The second, in my framework, is general intelligence.
A practical example: Imagine that all your memory of tic-tac-toe has been erased. Then you’re given the rules for that game again, and told that in an hour, you’ll play a few rounds against a machine that makes random moves. Within that hour, you’re free to think and figure out good strategies for winning. I would expect that once the hour is up, you’ll be able to win handily against the random-move-maker.
How is that possible?
The knee-jerk reaction may be to suggest that in that hour of thinking, you’ll be playing simulated games in your mind, and refining your heuristics this way. That’s part of it, but I don’t think it’s the main trick. Even in these simulated games, you’ll likely not start out by making completely random moves, and iteratively converging towards better-than-random strategies by trial-and-error. Rather, you’ll look over the rules, analyse the game abstractly, and instantly back out a few good heuristics this way — e. g., that taking the center square is a pretty good move. Only then will you engage in simulated babble-and-prune. (It’s the same point John was making here.)
General intelligence is the capability that makes this possible, the algorithm you employ for this “abstract analysis”. As I’d stated, it main appeal is that it doesn’t require practical experience with the problem domain (simulated or otherwise) — only knowledge of its structure.
This is compatible with an alternative theory, that many other animals do have “the algorithm for general intelligence” you refer to, but that they’re running it with less impressive content (world models & heuristics).
Eh, I can grant that. See the point about “no fire alarm”, how “weak” AGIs are very difficult to tell apart from very advanced crystallized-intelligence structures (especially if these structures are being trained on-line, as animals are).
- cfoster0 6 May 2023 22:53 UTC
  11 points
  2
  Parent
  Ok I think this at least clears things up a bit.
  To become universally capable, a system needs two things:
  “Turing-completeness”: A mechanism by which it can construct arbitrary mathematical objects to describe new environments (including abstract environments).
  “General intelligence”: an algorithm that can take in any arbitrary mathematical object produced by (1), and employ it for planning.
  General intelligence isn’t Turing-completeness itself. Rather, it’s a planning algorithm that has Turing-completeness as a prerequisite. Its binariness is inherited from the binariness of Turing-completeness.
  Based on the above, I don’t understand why you expect what you say you’re expecting. We blew past the Turing-completeness threshold decades ago with general purpose computers, and we’ve combined them with planning algorithms in lots of ways.
  Take AIXI, which uses the full power of Turing-completeness to do model-based planning with every possible abstraction/model. To my knowledge, switching over to that kind of fully-general planning (or any of its bounded approximations) hasn’t actually produced corresponding improvements in quality of outputs, especially compared to the quality gains we get from other changes. I think our default expectation should be that the real action is in accumulating those “other changes”. On the theory that the gap between human- and nonhuman animal- cognition is from us accumulating better “content” (world model concepts, heuristics, abstractions, etc.) over time, it’s no surprise that there’s no big phase change from combining Turing machines with planning!
  General intelligence is the capability that makes this possible, the algorithm you employ for this “abstract analysis”. As I’d stated, it main appeal is that it doesn’t require practical experience with the problem domain (simulated or otherwise) — only knowledge of its structure.
  I think what you describe here and in the content prior is more or less “model-based reinforcement learning with state/action abstraction”, which is the class of algorithms that answer the question “What if we did planning towards goals but with learned/latent abstractions?” As far I can tell, other animals do this as well. Yes, it takes a more impressive form in humans because language (and the culture + science it enabled) has allowed us to develop more/better abstractions to plan with, but I see no need to posit some novel general capability in addition.
  - Thane Ruthenis 7 May 2023 21:56 UTC
    4 points
    0
    Parent
    it takes a more impressive form in humans because language (and the culture + science it enabled) has allowed us to develop more/better abstractions to plan with, but I see no need to posit some novel general capability in addition
    I think what I’m trying to get at, here, is that the ability to use these better, self-derived abstractions for planning is nontrivial, and requires a specific universal-planning algorithm to work. Animals et al. learn new concepts and their applications simultaneously: they see e. g. a new fruit, try eating it, their taste receptors approve/disapprove of it, and they simultaneously learn a concept for this fruit and a heuristic “this fruit is good/bad”. They also only learn new concepts downstream of actual interactions with the thing; all learning is implemented by hard-coded reward circuitry.
    Humans can do more than that. As in my example, you can just describe to them e. g. a new game, and they can spin up an abstract representation of it and derive heuristics for it autonomously, without engaging hard-coded reward circuitry at all, without doing trial-and-error even in simulations. They can also learn new concepts in an autonomous manner, by just thinking about some problem domain, finding a connection between some concepts in it, and creating a new abstraction/chunking them together.
    The general-intelligence algorithm is what allows all of this to be useful. A non-GI mind can’t make use of a newly-constructed concept, because its planning machinery has no idea what to do with it: its policy function doesn’t accept objects of this type, hasn’t been adapted for them. This makes them unable to learn autonomously, unable to construct heuristics autonomously, and therefore unable to construct new concepts autonomously. General intelligence, by contrast, is a planning algorithm that “scales as fast as the world-model”: a planning algorithm that can take in any concept that’s been created this way.
    Or, an alternative framing...
    I think our default expectation should be that the real action is in accumulating those “other changes”.
    General intelligence is an algorithm for systematic derivation of such “other changes”.
    Does any of that make sense to you?
    - cfoster0 7 May 2023 23:58 UTC
      10 points
      5
      Parent
      I think what I’m trying to get at, here, is that the ability to use these better, self-derived abstractions for planning is nontrivial, and requires a specific universal-planning algorithm to work. Animals et al. learn new concepts and their applications simultaneously: they see e. g. a new fruit, try eating it, their taste receptors approve/disapprove of it, and they simultaneously learn a concept for this fruit and a heuristic “this fruit is good/bad”. They also only learn new concepts downstream of actual interactions with the thing; all learning is implemented by hard-coded reward circuitry.
      Humans can do more than that. As in my example, you can just describe to them e. g. a new game, and they can spin up an abstract representation of it and derive heuristics for it autonomously, without engaging hard-coded reward circuitry at all, without doing trial-and-error even in simulations. They can also learn new concepts in an autonomous manner, by just thinking about some problem domain, finding a connection between some concepts in it, and creating a new abstraction/chunking them together.
      Hmm I feel like you’re underestimating animal cognition / overestimating how much of what humans can do comes from unique algorithms vs. accumulated “mental content”. Non-human animals don’t have language, culture, and other forms of externalized representation, including the particular human representations behind “learning the rules of a game”. Without these in place, even if one was using the “universal planning algorithm”, they’d be precluded from learning through abstract description and from learning through manipulation of abstract game-structure concepts. All they’ve got is observation, experiment, and extrapolation from their existing concepts. But lacking the ability to receive abstract concepts via communication doesn’t mean that they cannot synthesize new abstractions as situations require. I think there’s good evidence that other animals can indeed do that.
      General intelligence is an algorithm for systematic derivation of such “other changes”.
      Does any of that make sense to you?
      I get what you’re saying but disbelieve the broader theory. I think the “other changes” (innovations/useful context-specific improvements) we see in reality aren’t mostly attributable to the application of some simple algorithm, unless we abstract away all of the details that did the actual work. There are general purpose strategies (for ex. the “scientific method” strategy, which is an elaboration of the “model-based RL” strategy, which is an elaboration of the “trial and error” strategy) that are widely applicable for deriving useful improvements. But those strategies are at a very high level of abstraction, whereas the bulk of improvement comes from using strategies to accumulate lower-level concrete “content” over time, rather than merely from adopting a particular strategy.
      (Would again recommend Hanson’s blog on “The Betterness Explosion” as expressing my side of the discussion here.)
      - Thane Ruthenis 8 May 2023 0:40 UTC
        2 points
        0
        Parent
        Non-human animals don’t have language, culture, and other forms of externalized representation, including the particular human representations behind “learning the rules of a game”. Without these in place, even if one was using the “universal planning algorithm”, they’d be precluded from learning through abstract description and from learning through manipulation of abstract game-structure concepts
        Agreed, I think. I’m claiming that those abilities are mutually dependent. Turing-completeness allows to construct novel abstractions like language/culture/etc., but it’s only useful if there’s a GI algorithm that can actually take these novelties in as inputs. Otherwise, there’s no reason to waste compute deriving ahead of time abstractions you haven’t encountered yet and won’t know how to use; may as well wait until you run into them “in the wild”.
        In turn, the GI algorithm is (as you point out) only shines if there’s extant machinery that’s generating novel abstractions for it to plan over. Otherwise, it can do no better than trial-and-error learning.
        cfoster0 8 May 2023 1:17 UTC
        5 points
        2
        Parent
        I guess I don’t see much support for such mutual dependence. Other animals have working memory + finite state control, and learn from experience in flexible ways. It appears pretty useful to them despite the fact they don’t have language/culture. The vast majority of our useful computing is done by systems that have Turing-completeness but not language/cultural competence. Language models sure look like they have language ability without Turing-completeness and without having picked up some “universal planning algorithm” that would render our previous work w/ NNs ~useless.
        
        Why choose a theory like “the capability gap between humans and other animals is because the latter is missing language/culture and also some binary GI property” over one like “the capability gap between humans and other animals is just because the latter is missing language/culture”? IMO the latter is simpler and better fits the evidence.
        Thane Ruthenis 8 May 2023 1:40 UTC
        4 points
        0
        Parent
        Hmm, we may have reached the point from which we’re not going to move on without building mathematical frameworks and empirically testing them, or something.
        Other animals have working memory + finite state control, and learn from experience in flexible ways
        “Learn from experience” is the key point. Abstract thinking allows to learn without experience — from others’ experience that they communicate to you, or from just figuring out how something works abstractly and anticipating the consequences in advance of them occurring. This sort of learning, I claim, is only possible when you have the machinery for generating entirely novel abstractions (language, math, etc.), which in turn is only useful if you have a planning algorithm capable of handling any arbitrary abstraction you may spin up.
        “The capability gap between humans and other animals is because the latter is missing language/culture and also some binary GI property” and “the capability gap between humans and other animals is just because the latter is missing language/culture” are synonymous, in my view, because you can’t have language/culture without the binary GI property.
        Language models sure look like they have language ability
        As per the original post, I disagree that they have the language ability in the relevant sense. I think they’re situated firmly on the Simulacrum Level 4; they appear to communicate, but it’s all just reflexes.
        cfoster0 8 May 2023 2:16 UTC
        5 points
        −1
        Parent
        I didn’t mean “learning from experience” to be restrictive in that way. Animals learn by observing others & from building abstract mental models too. But unless one acquires abstracted knowledge via communication, learning requires some form of experience: even abstracted knowledge is derived from experience, whether actual or imagined. Moreover, I don’t think that some extra/different planning machinery was required for language itself, beyond the existing abstraction and model-based RL capabilities that many other animals share. But ultimately that’s an empirical question.
        
        Hmm, we may have reached the point from which we’re not going to move on without building mathematical frameworks and empirically testing them, or something.
        
        Yeah I am probably going to end my part of the discussion tree here.
        
        My overall take remains:
        
        There may be general purpose problem-solving strategies that humans and non-human animals alike share, which explain our relative capability gains when combined with the unlocks that came from language/culture.
        We don’t need any human-distinctive “general intelligence” property to explain the capability differences among human-, non-human animal-, and artificial systems, so we shouldn’t assume that there’s any major threshold ahead of us corresponding to it.
        Mateusz Bagiński 25 Aug 2024 12:57 UTC
        1 point
        0
        Parent
        
        Moreover, I don’t think that some extra/different planning machinery was required for language itself, beyond the existing abstraction and model-based RL capabilities that many other animals share.
        
        I would expect to see sophisticated ape/early-hominid-lvl culture in many more species if that was the case. For some reason humans went on the culture RSI trajectory whereas other animals didn’t. Plausibly there was some seed cognitive ability (plus some other contextual enablers) that allowed a gene-culture “coevolution” cycle to start.