GPT-3 does unsupervised learning on text data. Our brains do predictive processing on sensory inputs. My guess (which I’d love to hear arguments against!) is that there’s a true and deep analogy between the two, and that they lead to impressive abilities for fundamentally the same reason.
Agree that self-supervised learning powers both GPT-3 updates and human brain world-model updates (details & caveats). (Which isn’t to say that GPT-3 is exactly the same as the human brain world-model—there are infinitely many different possible ML algorithms that all update via self-supervised learning).
However…
If so, it seems to me that that’s where all the juice is. That’s where the intelligence comes from … if agency is not a fundamental part of intelligence, and rather something that can just be added in on top, or not, and if we’re at a loss for how to either align a superintelligent agent with CEV or else make it corrigible, then why not try to avoid creating the agent part of superintelligent agent?
I disagree; I think the agency is necessary to build a really good world-model, one that includes new useful concepts that humans have never thought of.
Without the agency, some of the things that you lose are (and these overlap): Intelligently choosing what to attend to; intelligently choosing what to think about; intelligently choosing what book to re-read and ponder; intelligently choosing what question to ask; ability to learn and use better and better brainstorming strategies and other such metacognitive heuristics.
See my discussion here (Section 7.2) for why I think these things are important if we want the AGI to be able to do things like invent new technology or come up with new good ideas in AI alignment.
You can say: “We’ll (1) make an agent that helps build a really good world-model, then (2) turn off the agent and use / query the world-model by itself”. But then step (1) is the dangerous part.
I disagree; I think the agency is necessary to build a really good world-model, one that includes new useful concepts that humans have never thought of.
Without the agency, some of the things that you lose are (and these overlap): Intelligently choosing what to attend to; intelligently choosing what to think about; intelligently choosing what book to re-read and ponder; intelligently choosing what question to ask; ability to learn and use better and better brainstorming strategies and other such metacognitive heuristics.
Why is agency necessary for these things?
If we follow Ought’s advice and build “process-based systems [that] are built on human-understandable task decompositions, with direct supervision of reasoning steps”, do you expect us to hit a hard wall somewhere that prevents these systems from creatively choosing things to think about, books to read, or better brainstorming strategies?
Let’s compare two things: “trying to get a good understanding of some domain by building up a vocabulary of concepts and their relations” versus “trying to win a video game”. At a high level, I claim they have a lot in common!
In both cases, there are a bunch of possible “moves” you can make (you could think the thought “what if there’s some analogy between this and that?”, or you could think the thought “that’s a bit of a pattern; does it generalize?”, etc. etc.), and each move affects subsequent moves, in an exponentially-growing tree of possibilities.
In both cases, you’ll often get some early hints about whether moves were wise, but you won’t really know that you’re on the right track except in hindsight.
And in both cases, I think the only reliable way to succeed is to have the capability to repeatedly try different things, and learn from experience what paths and strategies are fruitful.
Therefore (I would argue), a human-level concept-inventing AI needs “RL-on-thoughts”—i.e., a reinforcement learning system, in which “thoughts” (edits to the hypothesis space / priors / world-model) are the thing that gets rewarded. The human brain certainly has that. You can be lying in bed motionless, and have rewarding thoughts, and aversive thoughts, and new ideas that make you rethink something you thought you knew.
Unfortunately, I also believe that RL-on-thoughts is really dangerous by default. Here’s why.
Again suppose that we want an AI that gets a good understanding of some domain by building up a vocabulary of concepts and their relations. As discussed above, we do this via an RL-on-thoughts AI. Consider some of the features that we plausibly need to put into this RL-on-thoughts system, for it to succeed at a superhuman level:
Developing and pursuing instrumental subgoals—for example, suppose the AI is “trying” to develop concepts that will make it superhumanly competent at assisting a human microscope inventor. We want it to be able to “notice” that there might be a relation between lenses and symplectic transformations, and then go spend some compute cycles developing a better understanding of symplectic transformations. For this to happen, we need “understand symplectic transformations” to be flagged as a temporary sub-goal, and to be pursued, and we want it to be able to spawn further sub-sub-goals and so on.
Consequentialist planning—Relatedly, we want the AI to be able to summon and re-read a textbook on linear algebra, or mentally work through an example problem, because it anticipates that these activities will lead to better understanding of the target domain.
Meta-cognition—We want the AI to be able to learn patterns in which of its own “thoughts” lead to better understanding and which don’t, and to apply that knowledge towards having more productive thoughts.
Putting all these things together, it seems to me that the default for this kind of AI would be to figure out that “seizing control of its off-switch” would be instrumentally useful for it to do what it’s trying to do (i.e. develop a better understanding of the target domain, presumably), and then to come up with a clever scheme to do so, and then to do it. So like I said, RL-on-thoughts seems to me to be both necessary and dangerous.
(Does that count as “agency”? I don’t know, it depends on what you mean by “agency”.)
In terms of the “task decomposition” strategy, this might be a tricky to discuss because you probably have a more detailed picture in your mind than I do. I’ll try anyway.
It seems to me that the options are:
(1) the subprocess only knows its narrow task (“solve this symplectic geometry homework problem”), and is oblivious to the overall system goal (“design a better microscope”), or
(2) the subprocess is aware of the overall system goal and chooses actions in part to advance it.
In Case (2), I’m not sure this really counts as “task decomposition” in the first place, or how this would help with safety.
In Case (1), yes I expect systems to hit a hard wall—I’m skeptical that tasks we care about decompose cleanly.
For example, at my last job, I would often be part of a team inventing a new gizmo, and it was not at all unusual for me to find myself sketching out the algorithms and sketching out the link budget and scrutinizing laser spec sheets and scrutinizing FPGA spec sheets and nailing down end-user requirements, etc. etc. Not because I’m individually the best person at each of those tasks—or even very good!—but because sometimes a laser-related problem is best solved by switching to a different algorithm, or an FPGA-related problem is best solved by recognizing that the real end-user requirements are not quite what we thought, etc. etc. And that kind of design work is awfully hard unless a giant heap of relevant information and knowledge is all together in a single brain / world-model.
In the case of my current job doing AI alignment research, I sometimes come across small self-contained tasks that could be delegated, but I would have no idea how to decompose most of what I do. (E.g. writing this comment!)
So why do bureaucracies (and large organizations more generally) fail so badly?
My main model for this is that interfaces are a scarce resource. Or, to phrase it in a way more obviously relevant to factorization: it is empirically hard for humans to find good factorizations of problems which have not already been found. Interfaces which neatly split problems are not an abundant resource (at least relative to humans’ abilities to find/build such interfaces). If you can solve that problem well, robustly and at scale, then there’s an awful lot of money to be made.
Also, one major sub-bottleneck (though not the only sub-bottleneck) of interface scarcity is that it’s hard to tell who has done a good job on a domain-specific problem/question without already having some domain-specific background knowledge. This also applies at a more “micro” level: it’s hard to tell whose answers are best without knowing lots of context oneself.
A possible example of a seemingly-hard-to-decompose task would be: Until 1948, no human had ever thought of the concept of “information entropy”. Then Claude Shannon sat down and invented this new useful concept. Make an AI that can do things like that.
(Even if I’m correct that process-based task-decomposition hits a wall, that’s not to say that it doesn’t have room for improvement over today’s AI. The issue is (1) outcome-based systems are dangerous; (2) given enough time, people will presumably build them anyway. And the goal is to solve that problem, either by a GPU-melting-nanobot type of plan, or some other better plan. Is there such a plan that we can enact using a process-based task-decomposition AI? Eliezer believes (see point 7) that the answer is “no”. I would say the answer is: “I guess maybe, but I can’t think of any”. I don’t know what type of plan you have in mind. Sorry if you already talked about that and I missed it. :) )
FWIW self-supervised learning can be surprisingly capable of doing things that we previously only knew how to do with “agentic” designs. From that link: classification is usually done with an objective + an optimization procedure, but GPT-3 just does it.
Agree that self-supervised learning powers both GPT-3 updates and human brain world-model updates (details & caveats). (Which isn’t to say that GPT-3 is exactly the same as the human brain world-model—there are infinitely many different possible ML algorithms that all update via self-supervised learning).
However…
I disagree; I think the agency is necessary to build a really good world-model, one that includes new useful concepts that humans have never thought of.
Without the agency, some of the things that you lose are (and these overlap): Intelligently choosing what to attend to; intelligently choosing what to think about; intelligently choosing what book to re-read and ponder; intelligently choosing what question to ask; ability to learn and use better and better brainstorming strategies and other such metacognitive heuristics.
See my discussion here (Section 7.2) for why I think these things are important if we want the AGI to be able to do things like invent new technology or come up with new good ideas in AI alignment.
You can say: “We’ll (1) make an agent that helps build a really good world-model, then (2) turn off the agent and use / query the world-model by itself”. But then step (1) is the dangerous part.
Why is agency necessary for these things?
If we follow Ought’s advice and build “process-based systems [that] are built on human-understandable task decompositions, with direct supervision of reasoning steps”, do you expect us to hit a hard wall somewhere that prevents these systems from creatively choosing things to think about, books to read, or better brainstorming strategies?
(Copying from here:)
(Does that count as “agency”? I don’t know, it depends on what you mean by “agency”.)
In terms of the “task decomposition” strategy, this might be a tricky to discuss because you probably have a more detailed picture in your mind than I do. I’ll try anyway.
It seems to me that the options are:
(1) the subprocess only knows its narrow task (“solve this symplectic geometry homework problem”), and is oblivious to the overall system goal (“design a better microscope”), or
(2) the subprocess is aware of the overall system goal and chooses actions in part to advance it.
In Case (2), I’m not sure this really counts as “task decomposition” in the first place, or how this would help with safety.
In Case (1), yes I expect systems to hit a hard wall—I’m skeptical that tasks we care about decompose cleanly.
For example, at my last job, I would often be part of a team inventing a new gizmo, and it was not at all unusual for me to find myself sketching out the algorithms and sketching out the link budget and scrutinizing laser spec sheets and scrutinizing FPGA spec sheets and nailing down end-user requirements, etc. etc. Not because I’m individually the best person at each of those tasks—or even very good!—but because sometimes a laser-related problem is best solved by switching to a different algorithm, or an FPGA-related problem is best solved by recognizing that the real end-user requirements are not quite what we thought, etc. etc. And that kind of design work is awfully hard unless a giant heap of relevant information and knowledge is all together in a single brain / world-model.
In the case of my current job doing AI alignment research, I sometimes come across small self-contained tasks that could be delegated, but I would have no idea how to decompose most of what I do. (E.g. writing this comment!)
Here’s John Wentworth making a similar point more eloquently:
A possible example of a seemingly-hard-to-decompose task would be: Until 1948, no human had ever thought of the concept of “information entropy”. Then Claude Shannon sat down and invented this new useful concept. Make an AI that can do things like that.
(Even if I’m correct that process-based task-decomposition hits a wall, that’s not to say that it doesn’t have room for improvement over today’s AI. The issue is (1) outcome-based systems are dangerous; (2) given enough time, people will presumably build them anyway. And the goal is to solve that problem, either by a GPU-melting-nanobot type of plan, or some other better plan. Is there such a plan that we can enact using a process-based task-decomposition AI? Eliezer believes (see point 7) that the answer is “no”. I would say the answer is: “I guess maybe, but I can’t think of any”. I don’t know what type of plan you have in mind. Sorry if you already talked about that and I missed it. :) )
FWIW self-supervised learning can be surprisingly capable of doing things that we previously only knew how to do with “agentic” designs. From that link: classification is usually done with an objective + an optimization procedure, but GPT-3 just does it.