I’ve noticed a sort of tradeoff in how I use planning/todo systems (having experimented with several such systems recently). This mainly applies to planning things with no immediate deadline, where it’s more about how to split a large amount of available time between a large number of tasks, rather than about remembering which things to do when. For instance, think of a personal reading list—there is no hurry to read any particular things on it, but you do want to be spending your reading time effectively.
On one extreme, I make a commitment to myself to do all the things on the list eventually. At first, this has the desired effect of making me get things done. But eventually, things that I don’t want to do start to accumulate. I procrastinate on these things by working on more attractive items on the list. This makes the list much less useful from a planning perspective, since it’s cluttered with a bunch of old things I no longer want to spend time on (which make me feel bad about not doing them whenever I’m looking at the list).
On the other extreme, I make no commitment like that, and remove things from the list whenever I feel like it. This avoids the problem of accumulating things I don’t want to do, but makes the list completely useless as a tool for getting me to do boring tasks.
I have a hard time balancing these issues. I’m currently trying an approach to my academic reading list where I keep a mostly unsorted list, and whenever I look at it to find something to read, I have to work on the top item, or remove it from the list. This is hardly ideal, but it mitigates the “stale items” problem, and still manages to provide some motivation, since it feels bad to take items off the list.
I’ve addressed this sort of problem with a fairly ruthless “when updating my todo-lists, they always start empty rather than full of previous stuff. I have to make a constant choice to keep old things around if they still feel ‘alive.’”
I used to be big on todo lists, and I always had the exact same problem. I mostly hung out on the “keeping old tasks around for too long” end of the spectrum.
Now, I no longer struggle with this nearly as much. The solution turned out to be a paradigm shift that occurred when I read Nate Soare’s replacing guilt series. If you aren’t already familiar with it, I highly recommend reading it sometime.
Belief: There is no amount of computing power which would make AlphaGo Zero(AGZ) turn the world into computronium in order to make the best possible Go moves (even if we assume there is some strategy which would let the system achieve this, like manipulating humans with cleverly chosen Go moves).
My reasoning is that AGZ is trained by recursively approximating a Monte Carlo Tree Search guided by its current model (very rough explanation which is probably missing something important). And it seems the “attractor” in this system is “perfect Go play”, not “whatever Go play leads to better Go play in the future”. There is no way for a system like this to learn that humans exist, or that it’s running on a computer of a certain type, or even to conceptualize that certain moves may alter certain parameters of the system, because these things aren’t captured in the MCTS, only the rules of Go.
This isn’t an argument against dangerous AGI in general—I’m trying to clarify my thinking about the whole “Tool AI vs Agent AI” thing, before I read reframing superintelligence.
Sounds correct to me. As long as the AI has no model of the outside world and no model of itself (and perhaps a few extra assumptions), it should keep playing within the given constraints. It may produce results that are incomprehensive to us, but it would not do so on purpose.
It’s when the “tool AI” has the model of the world—including itself, the humans, how the rewards are generated, how it could generate better results by obtaining more resources, and how humans could interfere with its goals—when the agent-ness emerges as a side effect of trying to solve the problem.
“Find the best GO move in this tree” is safe. “Find the best GO move, given the fact that the guy in the next room hates computers and will try to turn you off, which would be considered a failure at finding the best move” is dangerous. “Find the best GO move, given the fact that more computing power would likely allow you to make better moves, but humans would try to prevent you from getting too much resources” is an x-risk.
I recommend Two Sense of “Optimizer”. I agree with you, roughly. I think that it will be relatively easy to build a tool AI that has very strong capabilities, and much harder to build something with world optimization capabilities. This implies that powerful tool-AI (or AI services) will come first, and for the most part they will be safe in the way that agentic AI isn’t safe.
However, two potential things which may trouble this analysis:
1. Tool AI’s might become agentic AIs because of some emergent mesa-optimization program which encourages world optimization. It’s hard to see how likely this would be.
2. Gwern wrote about how Tool AI’s have an instrumental reason to become agentic. At one point I believed this argument, but I no longer think that it is predictive of real AI systems. Practically speaking, even if there was an instrumental advantage to becoming agentic, AGZ just isn’t optimizing a utility function (or at least, it’s not useful to describe it as such) and therefore arguments about instrumental convergence aren’t predictive of AGZ scaled up.
AIXI: The external world is a Turing machine that receives our actions as input and produces our sensory impressions as output. Our prior belief about this Turing machine should be that it’s simple, i.e. the Solomonoff prior
“The embedded prior”: The “entire” world is a sort of Turing machine, which we happen to be one component of in some sense. Our prior for this Turing machine should be that it’s simple (again, the Solomonoff prior), but we have to condition on the observation that it’s complicated enough to contain observers (“Descartes’ update”). (This is essentially Naturalized induction
I think of the difference between these as “solipsism”—AIXI gives its own existence a distinguished role in reality.
Importantly, the laws of physics seem fairly complicated in an absolute sense—clearly they require tens[1] or hundreds of bits to specify. This is evidence against solipsism, because on the solipsistic prior, we expect to interact with a largely empty universe. But they don’t seem much more complicated than necessary for a universe that contains at least one observer, since the minimal source code for an observer is probably also fairly long.
More evidence against solipsism:
The laws of physics don’t seem to privilege my frame of reference. This is a pretty astounding coincidence on the solipsistic viewpoint—it means we randomly picked a universe which simulates some observer-independent laws of physics, then picks out a specific point inside it, depending on some fairly complex parameters, to show me.
When I look out into the universe external to my mind, one of the things I find there is my brain, which really seems to contain a copy of my mind. This is another pretty startling coincidence on the solipsistic prior, that the external universe being run happens to contain this kind of representation of the Cartesian observe
A thought about productivity systems/workflow optimization:
One principle of good design is “make the thing you want people to do, the easy thing to do”. However, this idea is susceptible to the following form of Goodhart: often a lot of the value in some desirable action comes from the things that make it difficult.
For instance, sometimes I decide to migrate some notes from one note-taking system to another.
This is usually extremely useful, because it forces me to review the notes and think about how they relate to each other and to the new system. If I make this easier for myself by writing a script to do the work (as I have sometimes done), this important value is lost.
Or think about spaced repetition cards: You can save a ton of time by reusing cards made by other people covering the same material—but the mental work of breaking the material down into chunks that can go into the spaced-repetition system, which is usually very important, is lost.
I’ve noticed a sort of tradeoff in how I use planning/todo systems (having experimented with several such systems recently). This mainly applies to planning things with no immediate deadline, where it’s more about how to split a large amount of available time between a large number of tasks, rather than about remembering which things to do when. For instance, think of a personal reading list—there is no hurry to read any particular things on it, but you do want to be spending your reading time effectively.
On one extreme, I make a commitment to myself to do all the things on the list eventually. At first, this has the desired effect of making me get things done. But eventually, things that I don’t want to do start to accumulate. I procrastinate on these things by working on more attractive items on the list. This makes the list much less useful from a planning perspective, since it’s cluttered with a bunch of old things I no longer want to spend time on (which make me feel bad about not doing them whenever I’m looking at the list).
On the other extreme, I make no commitment like that, and remove things from the list whenever I feel like it. This avoids the problem of accumulating things I don’t want to do, but makes the list completely useless as a tool for getting me to do boring tasks.
I have a hard time balancing these issues. I’m currently trying an approach to my academic reading list where I keep a mostly unsorted list, and whenever I look at it to find something to read, I have to work on the top item, or remove it from the list. This is hardly ideal, but it mitigates the “stale items” problem, and still manages to provide some motivation, since it feels bad to take items off the list.
I’ve addressed this sort of problem with a fairly ruthless “when updating my todo-lists, they always start empty rather than full of previous stuff. I have to make a constant choice to keep old things around if they still feel ‘alive.’”
I used to be big on todo lists, and I always had the exact same problem. I mostly hung out on the “keeping old tasks around for too long” end of the spectrum.
Now, I no longer struggle with this nearly as much. The solution turned out to be a paradigm shift that occurred when I read Nate Soare’s replacing guilt series. If you aren’t already familiar with it, I highly recommend reading it sometime.
Belief: There is no amount of computing power which would make AlphaGo Zero(AGZ) turn the world into computronium in order to make the best possible Go moves (even if we assume there is some strategy which would let the system achieve this, like manipulating humans with cleverly chosen Go moves).
My reasoning is that AGZ is trained by recursively approximating a Monte Carlo Tree Search guided by its current model (very rough explanation which is probably missing something important). And it seems the “attractor” in this system is “perfect Go play”, not “whatever Go play leads to better Go play in the future”. There is no way for a system like this to learn that humans exist, or that it’s running on a computer of a certain type, or even to conceptualize that certain moves may alter certain parameters of the system, because these things aren’t captured in the MCTS, only the rules of Go.
This isn’t an argument against dangerous AGI in general—I’m trying to clarify my thinking about the whole “Tool AI vs Agent AI” thing, before I read reframing superintelligence.
Am I right? And is this a sound argument?
Sounds correct to me. As long as the AI has no model of the outside world and no model of itself (and perhaps a few extra assumptions), it should keep playing within the given constraints. It may produce results that are incomprehensive to us, but it would not do so on purpose.
It’s when the “tool AI” has the model of the world—including itself, the humans, how the rewards are generated, how it could generate better results by obtaining more resources, and how humans could interfere with its goals—when the agent-ness emerges as a side effect of trying to solve the problem.
“Find the best GO move in this tree” is safe. “Find the best GO move, given the fact that the guy in the next room hates computers and will try to turn you off, which would be considered a failure at finding the best move” is dangerous. “Find the best GO move, given the fact that more computing power would likely allow you to make better moves, but humans would try to prevent you from getting too much resources” is an x-risk.
I recommend Two Sense of “Optimizer”. I agree with you, roughly. I think that it will be relatively easy to build a tool AI that has very strong capabilities, and much harder to build something with world optimization capabilities. This implies that powerful tool-AI (or AI services) will come first, and for the most part they will be safe in the way that agentic AI isn’t safe.
However, two potential things which may trouble this analysis:
1. Tool AI’s might become agentic AIs because of some emergent mesa-optimization program which encourages world optimization. It’s hard to see how likely this would be.
2. Gwern wrote about how Tool AI’s have an instrumental reason to become agentic. At one point I believed this argument, but I no longer think that it is predictive of real AI systems. Practically speaking, even if there was an instrumental advantage to becoming agentic, AGZ just isn’t optimizing a utility function (or at least, it’s not useful to describe it as such) and therefore arguments about instrumental convergence aren’t predictive of AGZ scaled up.
Compare two views of “the universal prior”
AIXI: The external world is a Turing machine that receives our actions as input and produces our sensory impressions as output. Our prior belief about this Turing machine should be that it’s simple, i.e. the Solomonoff prior
“The embedded prior”: The “entire” world is a sort of Turing machine, which we happen to be one component of in some sense. Our prior for this Turing machine should be that it’s simple (again, the Solomonoff prior), but we have to condition on the observation that it’s complicated enough to contain observers (“Descartes’ update”). (This is essentially Naturalized induction
I think of the difference between these as “solipsism”—AIXI gives its own existence a distinguished role in reality.
Importantly, the laws of physics seem fairly complicated in an absolute sense—clearly they require tens[1] or hundreds of bits to specify. This is evidence against solipsism, because on the solipsistic prior, we expect to interact with a largely empty universe. But they don’t seem much more complicated than necessary for a universe that contains at least one observer, since the minimal source code for an observer is probably also fairly long.
More evidence against solipsism:
The laws of physics don’t seem to privilege my frame of reference. This is a pretty astounding coincidence on the solipsistic viewpoint—it means we randomly picked a universe which simulates some observer-independent laws of physics, then picks out a specific point inside it, depending on some fairly complex parameters, to show me.
When I look out into the universe external to my mind, one of the things I find there is my brain, which really seems to contain a copy of my mind. This is another pretty startling coincidence on the solipsistic prior, that the external universe being run happens to contain this kind of representation of the Cartesian observe
This is obviously a very small number but I’m trying to be maximally conservative here.
Why wouldn’t they be the same? Are you saying AIXI doesn’t ask ‘where did I come from?’
Yes, that’s right. It’s the same basic issue that leads to the Anvil Problem
A thought about productivity systems/workflow optimization:
One principle of good design is “make the thing you want people to do, the easy thing to do”. However, this idea is susceptible to the following form of Goodhart: often a lot of the value in some desirable action comes from the things that make it difficult.
For instance, sometimes I decide to migrate some notes from one note-taking system to another. This is usually extremely useful, because it forces me to review the notes and think about how they relate to each other and to the new system. If I make this easier for myself by writing a script to do the work (as I have sometimes done), this important value is lost.
Or think about spaced repetition cards: You can save a ton of time by reusing cards made by other people covering the same material—but the mental work of breaking the material down into chunks that can go into the spaced-repetition system, which is usually very important, is lost.