Can iterated amplification recreate a human’s ability for creative insight? By that I mean the phenomenon where after thinking about a problem for an extended period of time, from hours to years, a novel solution suddenly pops into your head seemingly out of nowhere. I guess under the hood what’s probably happening is that you’re building up and testing various conceptual frameworks for thinking about the problem, and using those frameworks and other heuristics to do a guided search of the solution space. The problem for iterated amplification is that we typically don’t have introspective access to the conceptual framework building algorithms or the search heuristics that our brains learned or came up with over our lifetimes, so it’s unclear how to break down these tasks when faced with a problem that requires creative insight to solve.
If iterated amplification needs to exhibit creative insight in order to succeed (not sure if you can sidestep the problem or find a workaround for it), I suggest that it be included in the set of tasks that Ought will evaluate for their factored cognition project.
EDIT: Maybe this is essentially the same as the translation example, and I’m just not understanding how you’re proposing to handle that class of problems?
EDIT: Maybe this is essentially the same as the translation example, and I’m just not understanding how you’re proposing to handle that class of problems
Yes, I think these are the same case. The discussion in this thread applies to both. The relevant quote from the OP:
I think our long-term goal should be to find, for each powerful AI technique, an analog of that technique that is aligned and works nearly as well. My current work is trying to find analogs of model-free RL or AlphaZero-style model-based RL.
I think “copy human expertise by imitation learning,” or even “delegate to a human,” raise different kinds of problems than RL. I don’t think those problems all have clean answers.
Going back to the translation example, I can understand your motivation to restrict attention to some subset of all AI techniques. But I think it’s reasonable for people to expect that if you’re aiming to be competitive with a certain kind of AI, you’ll also aim to avoid ending up not being competitive with minor variations of your own design (in this case, forms of iterated amplification that don’t break down tasks into such small pieces). Otherwise, aren’t you “cheating” by letting aligned AIs use AI techniques that their competitors aren’t allowed to use?
To put it another way, people clearly get the impression from you that there’s hope that IDA can simultaneously be aligned and achieve state of the art performance at runtime. See this post where Ajeya Cotra says exactly this:
The hope is that if we use IDA to train each learned component of an AI then the overall AI will remain aligned with the user’s interests while achieving state of the art performance at runtime — provided that any non-learned components such as search or logic are also built to preserve alignment and maintain runtime performance.
But the actual situation seems to be that at best IDA can either be aligned (if you break down tasks enough) or achieve state of the art performance (if you don’t), but not both at the same time.
In general, if you have some useful but potentially malign data source (humans, in the translation example) then that’s a possible problem—whether you learn from the data source or merely consult it.
You have to solve each instance of that problem in a way that depends on the details of the data source. In the translation example, you need to actually reason about human psychology. In the case of SETI, we need to coordinate to not use malign alien messages (or else opt to let the aliens take over).
Otherwise, aren’t you “cheating” by letting aligned AIs use AI techniques that their competitors aren’t allowed to use?
I’m just trying to compete with a particular set of AI techniques. Then every time you would have used those (potentially dangerous) techniques, you can instead use the safe alternative we’ve developed.
If there are other ways to make your AI more powerful, you have to deal with those on your own. That may be learning from human abilities that are entangled with malign behavior in complex ways, or using an AI design that you found in an alien message, or using an unsafe physical process in order to generate large amounts of power, or whatever.
I grant that my definition of the alignment problem would count “learn from malign data source” as an alignment problem, since you ultimately end up with a malign AI, but that problem occurs with or without AI and I don’t think it is deceptive to factor that problem out (but I agree that I should be more careful about the statement / switch to a more refined statement).
I also don’t think it’s a particularly important problem. And it’s not what people usually have in mind as a failure mode—I’ve discussed this problem with a few people, to try to explain some subtleties of the alignment problem, and most people hadn’t thought about it and were pretty skeptical. So in those respects I think it’s basically fine.
When Ajeya says:
provided that any non-learned components such as search or logic are also built to preserve alignment and maintain runtime performance.
This is meant to include things like “You don’t have a malign data source that you are learning from.” I agree that it’s slightly misleading if we think that humans are such a data source.
I think “copy human expertise by imitation learning,” or even “delegate to a human,” raise different kinds of problems than RL. I don’t think those problems all have clean answers.
I think I can restate the problem as about competing with RL: Presumably eventually RL will be as capable as a human (on its own, without copying from or delegating to a human), including on problems that humans need to use “creative insight” on. In order to compete with such RL-based AI with an Amplification-based AI, it seems that H needs to be able to introspectively access their cognitive framework algorithms and search heuristics in order to use them to help break down tasks, but H doesn’t have such introspective access, so how does Amplification-based AI compete?
If an RL agent can learn to behave creatively, then that implies that amplification from a small core can learn to behave creatively.
This is pretty clear if you don’t care about alignment—you can just perform the exponential search within the amplification step, and then amplification is structurally identical to RL. The difficult problem is how to do that without introducing malign optimization. But that’s not really about H’s abilities.
This is pretty clear if you don’t care about alignment—you can just perform the exponential search within the amplification step, and then amplification is structurally identical to RL.
I don’t follow. I think if you perform the exponential search within the amplification step, amplification would be exponentially slow whereas RL presumably wouldn’t be? How would they be structurally identical? (If someone else understands this, please feel free to jump in and explain.)
The difficult problem is how to do that without introducing malign optimization.
Do you consider this problem to be inside your problem scope? I’m guessing yes but I’m not sure and I’m generally still very confused about this. I think it would help a lot if you could give a precise definition of what the scope is.
As another example of my confusion, an RL agent will presumably learn to do symbolic reasoning and perform arbitrary computations either inside its neural network or via an attached general purpose computer, so it could self-modify into or emulate an arbitrary AI. So under one natural definition of “compete”, to compete with RL is to compete with every type of AI. You must not be using this definition but I’m not sure what definition you are using. The trouble I’m having is that there seems to be no clear dividing line between “internal cognition the RL agent has learned to do” and “AI technique the RL agent is emulating” but presumably you want to include the former and exclude the latter from your problem definition?
Another example is that you said that you exclude “all failures of competence” and I still only have a vague sense of what that means.
How would they be structurally identical? (If someone else understands this, please feel free to jump in and explain.)
AlphaZero is exactly the same as this: you want to explore an exponentially large search tree. You can’t do that. Instead you explore a small part of the search tree. Then you train a model to quickly (lossily) imitate that search. Then you repeat the process, using the learned model in the leaves to effectively search a deeper tree. (Also see Will’s comment.)
Do you consider this problem to be inside your problem scope? I’m guessing yes but I’m not sure and I’m generally still very confused about this.
For now let’s restrict attention to the particular RL algorithms mentioned in the post, to make definitions clearer.
By default these techniques yield an unaligned AI.
I want a version of those techniques that produces aligned AI, which is trying to help us get what we want.
That aligned AI may still need to do dangerous things, e.g. “build a new AI” or “form an organization with a precise and immutable mission statement” or whatever. Alignment doesn’t imply “never has to deal with a difficult situation again,” and I’m not (now) trying to solve alignment for all possible future AI techniques.
We would have encountered those problems even if we replaced the aligned AI with a human. If the AI is aligned, it will at least be trying to solve those problems. But even as such, it may fail. And separately from whether we solve the alignment problem, we may build an incompetent AI (e.g. it may be worse at solving the next round of the alignment problem).
The goal is to get out an AI that is trying to do the right thing. A good litmus test is whether the same problem would occur with a secure human. (Or with a human who happened to be very smart, or with a large group of humans...). If so, then that’s out of scope for me.
To address the example you gave: doing some optimization without introducing misalignment is necessary to perform as well as the RL techniques we are discussing. Avoiding that optimization is in scope.
There may be other optimization or heuristics that an RL agent (or an aligned human) would eventually use in order to perform well, e.g. using a certain kind of external aid. That’s out of scope, because we aren’t trying to compete with all of the things that an RL agent will eventually do (as you say, a powerful RL agent will eventually learn to do everything...) we are trying to compete with the RL algorithm itself.
We need an aligned version of the optimization done by the RL algorithm, not all optimization that the RL agent will eventually decide to do.
I think the way to do exponential search in amplification without being exponentially slow is to not try to do the search in one amplification step, but start with smaller problems, learn how to solve those efficiently, then use that knowledge to speed up the search in later iteration-amplification rounds.
Suppose we have some problem with branching factor 2 (ie. searching for binary strings that fit some criteria)
Start with agent A0.
Amplify agent A0 to solve problems which require searching a tree of depth d0 at cost 2d0.
Distill agent A1, which uses the output of the amplification process to learn how to solve problems of depth d0 faster than the amplified A0, ideally as fast as any other ML approach. One way would be to learn heuristics for which parts of the tree don’t contain useful information, and can be pruned.
Amplify agent A1, which can use the heuristics it has learned to prune the tree much earlier and solve problems of depth d1>d0 at cost <2d1
Distill agent A2, which can now efficiently solve problems of depth d1
If this process is efficient enough, the training cost can be less than 2d1 to get an agent that solves problems of depth d1 (and the runtime cost is as good as the runtime cost of the ML algorithm that implements the distilled agent)
Thanks for the explanation, but I’m not seeing how this would work in general. Let’s use Paul’s notation where Ai+1=Distill(Bi) and Bi=Amplify(Ai). And say we’re searching for binary strings s such that F(s, t)=1 for fixed F and variable t. So we start with B0 (a human) and distill+amplify it into B1 which searches strings up to length d0 (which requires searching a tree of depth d0 at cost 2d0). Then we distill that into A2 which learns how to solve problems of depth d0 faster than B1, and suppose it does that by learning the heuristic that the first bit of s is almost always the parity of t.
Now suppose I’m an instance of A2 running at the top level of B2. I have access to other instances of A2 which can solve this problem up to length d0 but I need to solve a problem of length d1(which let’s say is d0+1). So I ask another instance of A2 “Find a string s of length d1 such that s starts with 0 and F(s, t)=1” then followed by query to another A2 “Find a string s of length d1 such that s starts with 1 and F(s, t)=1″ Well the heuristic that A2 learned doesn’t help to speed up those queries so each of them is still going to take time 2d0.
The problem here as I see it is it’s not clear how I, as A2, can make use of the previously learned heuristics to help solve larger problems more efficiently, since I have no introspective access to them. If there’s a way to do that and I’m missing it, please let me know.
(I posted this from greaterwrong.com and it seems the LaTeX isn’t working. Someone please PM me if you know how to fix this.)
[Habryka edit: Fixed your LaTeX for you. GreaterWrong doesn’t currently support LaTeX I think. We would have to either improve our API, or greaterwrong would need to do some more fancy client-side processing to make it work]
For this example, I think you can do this if you implement the additional query “How likely is the search on [partial solution] to return a complete solution?”. This is asked of all potential branches before recursing into them.A2 learns to answer the solution probability query efficiently.
Then in amplification of A2 in the top level of B2 looking for a solution to problem of length d1, the root agent first asks “How likely is the search on [string starting with 0] to return a complete solution?” and “How likely is the search on [string starting with 1] to return a complete solution?”. Then, the root agent first queries whichever subtree is most likely to contain a solution. (This doesn’t improve worst case running time, but does improve average case running time.).
This is analogous to running a value estimation network in tree search, and then picking the most promising node to query first.
This seems to require that the heuristic be of a certain form and you know what that form is. What if it’s more general, like run algorithm G on t to produce a list of guesses for s, then check the guesses in that order?
1. I don’t think that every heuristic A2 could use to solve problems of depth d0 needs to be applicable to performing the search of depth d1 - we only need enough heuristics to be useable to be able to keep increasing the search depth at each amplification round in an efficient manner. It’s possible that some of the value of heuristics like “solution is likely to be an output of algorithm G” could be (imperfectly) captured through some small universal set of heuristics that we can specify how to learn and exploit. (I think that variations on “How likely is the search on [partial solution] to produce an answer?” might get us pretty far).
The AlphaGo analogy is that the original supervised move prediction algorithm didn’t necessarily learn every heuristic that the experts used, but just learned enough to be able to efficiently guide the MCTS to better performance.
(Though I do think that imperfectly learning heuristics might cause alignment problems without a solution to the aligned search problem).
2. This isn’t a problem if once the agent A2 can run algorithm G on t for problems of depth d0, it can directly generalize to applying G to problems of depth d1. Simple Deep RL methods aren’t good at this kind of tasks, but things like the Neural Turing Machine are trying to do better on this sort of tasks. So the ability to learn efficient exponential search could be limited by the underlying agent capability; for some capability range, a problem could be directly solved by an unaligned agent, but couldn’t be solved for an aligned agent. This isn’t a problem if we can surpass that level of capability.
I’m not sure that these considerations fix the problem entirely, or whether Paul would take a different approach.
It also might be worth coming up with a concrete example where some heuristics are not straightforward to generalize from smaller to larger problems, and it seems like this will prevent efficiently learning to solve large problems. The problem, however, would need to be something that humans can solve (ie. finding a string that hashed to a particular value using a cryptographic hash function would be hard to generalize any heuristics from, but I don’t think humans could do it either so it’s outside of scope).
If an RL agent can’t solve a task, then I’m fine with amplification being unable to solve it.
I guess by “RL agent” you mean RL agents of certain specific designs, such as the one you just blogged about, and not RL agents in general, since as far as we know there aren’t any tasks that RL agents in general can’t solve?
BTW, I find it hard to understand your overall optimism (only 10-20% expected value loss from AI risk), since there are so many disjunctive risks to just being able to design an aligned AI that’s competitive with certain kinds of RL agents (such as not solving one of the obstacles you list in the OP), and even if we succeed in doing that we’d have to come up with more capable aligned designs that would be competitive with more advanced RL (or other kinds of) agents. Have you explained this optimism somewhere?
Sorry, that was a bit confusing, edited to clarify. What I mean is, you have some algorithm you’re using to implement new agents, and that algorithm has a training cost (that you pay during distillation) and a runtime cost (that you pay when you apply the agent). The runtime cost of the distilled agent can be as good as the runtime cost of an unaligned agent implemented by the same algorithm (part of Paul’s claim about being competitive with unaligned agents).
If I understand you correctly, It sounds like the problem that “creative insight” is solving is “searching through a large space of possible solutions and finding a good one”. It seems like Amplification could, given enough time, systematically search through all possible solutions (ie. generate all bit sequences, turn them into strings, evaluate whether they are a solution). But the problem with that is that it will likely yield an misaligned solution (assuming the evaluation of solutions is imperfect). Humans, when performing “creative insight”, have their search process 1) guided by a bunch of these hard-to-access heuristics/conceptual frameworks which guide the search towards useful and benign parts of the search space and 2) are limited in how large of a solution space they can search. These combine that the human creativity search typically get aligned solutions or terminate without finding a solution after doing a bounded amount of computation. Does this fit with what you are thinking of as “creative insight”?
My understanding of what I’ve read about Paul’s approach suggests the solution to both the translation problem and creativity would be extract any search heuristics/conceptual framework algorithms that humans do have access to, and then still limit the search, sacrificing solution quality but maintaining corrigibility. Is your concern then that this amplification based search would not perform well enough in practice to be useful (ie. yielding a good solution and terminating before coming across a malign solution)?
It seems like Amplification could, given enough time, systematically search through all possible solutions (ie. generate all bit sequences, turn them into strings, evaluate whether they are a solution). But the problem with that is that it will likely yield an misaligned solution (assuming the evaluation of solutions is imperfect).
Well I was thinking that before this alignment problem could even happen, a brute force search would be exponentially expensive so Amplification wouldn’t work at all in practice on a question that requires “creative insight”.
My understanding of what I’ve read about Paul’s approach suggests the solution to both the translation problem and creativity would be extract any search heuristics/conceptual framework algorithms that humans do have access to, and then still limit the search, sacrificing solution quality but maintaining corrigibility.
My concern is that this won’t be competitive with other AGI approaches that don’t try to maintain alignment/corrigibility, for example using reinforcement learning to “raise” an AGI through a series of increasingly complex virtual environments, and letting the AGI incrementally build its own search heuristics and conceptual framework algorithms.
BTW, thanks for trying to understand Paul’s ideas and engaging in these discussions. It would be nice to get a critical mass of people to understand these ideas well enough to sustain discussions and make progress without Paul having to be present all the time.
Can iterated amplification recreate a human’s ability for creative insight? By that I mean the phenomenon where after thinking about a problem for an extended period of time, from hours to years, a novel solution suddenly pops into your head seemingly out of nowhere. I guess under the hood what’s probably happening is that you’re building up and testing various conceptual frameworks for thinking about the problem, and using those frameworks and other heuristics to do a guided search of the solution space. The problem for iterated amplification is that we typically don’t have introspective access to the conceptual framework building algorithms or the search heuristics that our brains learned or came up with over our lifetimes, so it’s unclear how to break down these tasks when faced with a problem that requires creative insight to solve.
If iterated amplification needs to exhibit creative insight in order to succeed (not sure if you can sidestep the problem or find a workaround for it), I suggest that it be included in the set of tasks that Ought will evaluate for their factored cognition project.
EDIT: Maybe this is essentially the same as the translation example, and I’m just not understanding how you’re proposing to handle that class of problems?
Yes, I think these are the same case. The discussion in this thread applies to both. The relevant quote from the OP:
I think “copy human expertise by imitation learning,” or even “delegate to a human,” raise different kinds of problems than RL. I don’t think those problems all have clean answers.
Going back to the translation example, I can understand your motivation to restrict attention to some subset of all AI techniques. But I think it’s reasonable for people to expect that if you’re aiming to be competitive with a certain kind of AI, you’ll also aim to avoid ending up not being competitive with minor variations of your own design (in this case, forms of iterated amplification that don’t break down tasks into such small pieces). Otherwise, aren’t you “cheating” by letting aligned AIs use AI techniques that their competitors aren’t allowed to use?
To put it another way, people clearly get the impression from you that there’s hope that IDA can simultaneously be aligned and achieve state of the art performance at runtime. See this post where Ajeya Cotra says exactly this:
But the actual situation seems to be that at best IDA can either be aligned (if you break down tasks enough) or achieve state of the art performance (if you don’t), but not both at the same time.
In general, if you have some useful but potentially malign data source (humans, in the translation example) then that’s a possible problem—whether you learn from the data source or merely consult it.
You have to solve each instance of that problem in a way that depends on the details of the data source. In the translation example, you need to actually reason about human psychology. In the case of SETI, we need to coordinate to not use malign alien messages (or else opt to let the aliens take over).
I’m just trying to compete with a particular set of AI techniques. Then every time you would have used those (potentially dangerous) techniques, you can instead use the safe alternative we’ve developed.
If there are other ways to make your AI more powerful, you have to deal with those on your own. That may be learning from human abilities that are entangled with malign behavior in complex ways, or using an AI design that you found in an alien message, or using an unsafe physical process in order to generate large amounts of power, or whatever.
I grant that my definition of the alignment problem would count “learn from malign data source” as an alignment problem, since you ultimately end up with a malign AI, but that problem occurs with or without AI and I don’t think it is deceptive to factor that problem out (but I agree that I should be more careful about the statement / switch to a more refined statement).
I also don’t think it’s a particularly important problem. And it’s not what people usually have in mind as a failure mode—I’ve discussed this problem with a few people, to try to explain some subtleties of the alignment problem, and most people hadn’t thought about it and were pretty skeptical. So in those respects I think it’s basically fine.
When Ajeya says:
This is meant to include things like “You don’t have a malign data source that you are learning from.” I agree that it’s slightly misleading if we think that humans are such a data source.
I think I can restate the problem as about competing with RL: Presumably eventually RL will be as capable as a human (on its own, without copying from or delegating to a human), including on problems that humans need to use “creative insight” on. In order to compete with such RL-based AI with an Amplification-based AI, it seems that H needs to be able to introspectively access their cognitive framework algorithms and search heuristics in order to use them to help break down tasks, but H doesn’t have such introspective access, so how does Amplification-based AI compete?
If an RL agent can learn to behave creatively, then that implies that amplification from a small core can learn to behave creatively.
This is pretty clear if you don’t care about alignment—you can just perform the exponential search within the amplification step, and then amplification is structurally identical to RL. The difficult problem is how to do that without introducing malign optimization. But that’s not really about H’s abilities.
I don’t follow. I think if you perform the exponential search within the amplification step, amplification would be exponentially slow whereas RL presumably wouldn’t be? How would they be structurally identical? (If someone else understands this, please feel free to jump in and explain.)
Do you consider this problem to be inside your problem scope? I’m guessing yes but I’m not sure and I’m generally still very confused about this. I think it would help a lot if you could give a precise definition of what the scope is.
As another example of my confusion, an RL agent will presumably learn to do symbolic reasoning and perform arbitrary computations either inside its neural network or via an attached general purpose computer, so it could self-modify into or emulate an arbitrary AI. So under one natural definition of “compete”, to compete with RL is to compete with every type of AI. You must not be using this definition but I’m not sure what definition you are using. The trouble I’m having is that there seems to be no clear dividing line between “internal cognition the RL agent has learned to do” and “AI technique the RL agent is emulating” but presumably you want to include the former and exclude the latter from your problem definition?
Another example is that you said that you exclude “all failures of competence” and I still only have a vague sense of what that means.
AlphaZero is exactly the same as this: you want to explore an exponentially large search tree. You can’t do that. Instead you explore a small part of the search tree. Then you train a model to quickly (lossily) imitate that search. Then you repeat the process, using the learned model in the leaves to effectively search a deeper tree. (Also see Will’s comment.)
For now let’s restrict attention to the particular RL algorithms mentioned in the post, to make definitions clearer.
By default these techniques yield an unaligned AI.
I want a version of those techniques that produces aligned AI, which is trying to help us get what we want.
That aligned AI may still need to do dangerous things, e.g. “build a new AI” or “form an organization with a precise and immutable mission statement” or whatever. Alignment doesn’t imply “never has to deal with a difficult situation again,” and I’m not (now) trying to solve alignment for all possible future AI techniques.
We would have encountered those problems even if we replaced the aligned AI with a human. If the AI is aligned, it will at least be trying to solve those problems. But even as such, it may fail. And separately from whether we solve the alignment problem, we may build an incompetent AI (e.g. it may be worse at solving the next round of the alignment problem).
The goal is to get out an AI that is trying to do the right thing. A good litmus test is whether the same problem would occur with a secure human. (Or with a human who happened to be very smart, or with a large group of humans...). If so, then that’s out of scope for me.
To address the example you gave: doing some optimization without introducing misalignment is necessary to perform as well as the RL techniques we are discussing. Avoiding that optimization is in scope.
There may be other optimization or heuristics that an RL agent (or an aligned human) would eventually use in order to perform well, e.g. using a certain kind of external aid. That’s out of scope, because we aren’t trying to compete with all of the things that an RL agent will eventually do (as you say, a powerful RL agent will eventually learn to do everything...) we are trying to compete with the RL algorithm itself.
We need an aligned version of the optimization done by the RL algorithm, not all optimization that the RL agent will eventually decide to do.
I think the way to do exponential search in amplification without being exponentially slow is to not try to do the search in one amplification step, but start with smaller problems, learn how to solve those efficiently, then use that knowledge to speed up the search in later iteration-amplification rounds.
Suppose we have some problem with branching factor 2 (ie. searching for binary strings that fit some criteria)
Start with agent A0.
Amplify agent A0 to solve problems which require searching a tree of depth d0 at cost 2d0.
Distill agent A1, which uses the output of the amplification process to learn how to solve problems of depth d0 faster than the amplified A0, ideally as fast as any other ML approach. One way would be to learn heuristics for which parts of the tree don’t contain useful information, and can be pruned.
Amplify agent A1, which can use the heuristics it has learned to prune the tree much earlier and solve problems of depth d1>d0 at cost <2d1
Distill agent A2, which can now efficiently solve problems of depth d1
If this process is efficient enough, the training cost can be less than 2d1 to get an agent that solves problems of depth d1 (and the runtime cost is as good as the runtime cost of the ML algorithm that implements the distilled agent)
Thanks for the explanation, but I’m not seeing how this would work in general. Let’s use Paul’s notation where Ai+1=Distill(Bi) and Bi=Amplify(Ai). And say we’re searching for binary strings s such that F(s, t)=1 for fixed F and variable t. So we start with B0 (a human) and distill+amplify it into B1 which searches strings up to length d0 (which requires searching a tree of depth d0 at cost 2d0). Then we distill that into A2 which learns how to solve problems of depth d0 faster than B1, and suppose it does that by learning the heuristic that the first bit of s is almost always the parity of t.
Now suppose I’m an instance of A2 running at the top level of B2. I have access to other instances of A2 which can solve this problem up to length d0 but I need to solve a problem of length d1(which let’s say is d0+1). So I ask another instance of A2 “Find a string s of length d1 such that s starts with 0 and F(s, t)=1” then followed by query to another A2 “Find a string s of length d1 such that s starts with 1 and F(s, t)=1″ Well the heuristic that A2 learned doesn’t help to speed up those queries so each of them is still going to take time 2d0.
The problem here as I see it is it’s not clear how I, as A2, can make use of the previously learned heuristics to help solve larger problems more efficiently, since I have no introspective access to them. If there’s a way to do that and I’m missing it, please let me know.
(I posted this from greaterwrong.com and it seems the LaTeX isn’t working. Someone please PM me if you know how to fix this.)
[Habryka edit: Fixed your LaTeX for you. GreaterWrong doesn’t currently support LaTeX I think. We would have to either improve our API, or greaterwrong would need to do some more fancy client-side processing to make it work]
For this example, I think you can do this if you implement the additional query “How likely is the search on [partial solution] to return a complete solution?”. This is asked of all potential branches before recursing into them.A2 learns to answer the solution probability query efficiently.
Then in amplification of A2 in the top level of B2 looking for a solution to problem of length d1, the root agent first asks “How likely is the search on [string starting with 0] to return a complete solution?” and “How likely is the search on [string starting with 1] to return a complete solution?”. Then, the root agent first queries whichever subtree is most likely to contain a solution. (This doesn’t improve worst case running time, but does improve average case running time.).
This is analogous to running a value estimation network in tree search, and then picking the most promising node to query first.
This seems to require that the heuristic be of a certain form and you know what that form is. What if it’s more general, like run algorithm G on t to produce a list of guesses for s, then check the guesses in that order?
1. I don’t think that every heuristic A2 could use to solve problems of depth d0 needs to be applicable to performing the search of depth d1 - we only need enough heuristics to be useable to be able to keep increasing the search depth at each amplification round in an efficient manner. It’s possible that some of the value of heuristics like “solution is likely to be an output of algorithm G” could be (imperfectly) captured through some small universal set of heuristics that we can specify how to learn and exploit. (I think that variations on “How likely is the search on [partial solution] to produce an answer?” might get us pretty far).
The AlphaGo analogy is that the original supervised move prediction algorithm didn’t necessarily learn every heuristic that the experts used, but just learned enough to be able to efficiently guide the MCTS to better performance.
(Though I do think that imperfectly learning heuristics might cause alignment problems without a solution to the aligned search problem).
2. This isn’t a problem if once the agent A2 can run algorithm G on t for problems of depth d0, it can directly generalize to applying G to problems of depth d1. Simple Deep RL methods aren’t good at this kind of tasks, but things like the Neural Turing Machine are trying to do better on this sort of tasks. So the ability to learn efficient exponential search could be limited by the underlying agent capability; for some capability range, a problem could be directly solved by an unaligned agent, but couldn’t be solved for an aligned agent. This isn’t a problem if we can surpass that level of capability.
I’m not sure that these considerations fix the problem entirely, or whether Paul would take a different approach.
It also might be worth coming up with a concrete example where some heuristics are not straightforward to generalize from smaller to larger problems, and it seems like this will prevent efficiently learning to solve large problems. The problem, however, would need to be something that humans can solve (ie. finding a string that hashed to a particular value using a cryptographic hash function would be hard to generalize any heuristics from, but I don’t think humans could do it either so it’s outside of scope).
If an RL agent can’t solve a task, then I’m fine with amplification being unable to solve it.
I guess by “RL agent” you mean RL agents of certain specific designs, such as the one you just blogged about, and not RL agents in general, since as far as we know there aren’t any tasks that RL agents in general can’t solve?
BTW, I find it hard to understand your overall optimism (only 10-20% expected value loss from AI risk), since there are so many disjunctive risks to just being able to design an aligned AI that’s competitive with certain kinds of RL agents (such as not solving one of the obstacles you list in the OP), and even if we succeed in doing that we’d have to come up with more capable aligned designs that would be competitive with more advanced RL (or other kinds of) agents. Have you explained this optimism somewhere?
In the LesserWrong comment editor, select the text you want to be LaTeX, then press Ctrl+4 (or Cmd+4 on Mac). You can delete the dollar signs.
(Commenting rather than PM’ing so that others will benefit as well.)
Why would the runtime cost be on par with the distillation cost?
Sorry, that was a bit confusing, edited to clarify. What I mean is, you have some algorithm you’re using to implement new agents, and that algorithm has a training cost (that you pay during distillation) and a runtime cost (that you pay when you apply the agent). The runtime cost of the distilled agent can be as good as the runtime cost of an unaligned agent implemented by the same algorithm (part of Paul’s claim about being competitive with unaligned agents).
If I understand you correctly, It sounds like the problem that “creative insight” is solving is “searching through a large space of possible solutions and finding a good one”. It seems like Amplification could, given enough time, systematically search through all possible solutions (ie. generate all bit sequences, turn them into strings, evaluate whether they are a solution). But the problem with that is that it will likely yield an misaligned solution (assuming the evaluation of solutions is imperfect). Humans, when performing “creative insight”, have their search process 1) guided by a bunch of these hard-to-access heuristics/conceptual frameworks which guide the search towards useful and benign parts of the search space and 2) are limited in how large of a solution space they can search. These combine that the human creativity search typically get aligned solutions or terminate without finding a solution after doing a bounded amount of computation. Does this fit with what you are thinking of as “creative insight”?
My understanding of what I’ve read about Paul’s approach suggests the solution to both the translation problem and creativity would be extract any search heuristics/conceptual framework algorithms that humans do have access to, and then still limit the search, sacrificing solution quality but maintaining corrigibility. Is your concern then that this amplification based search would not perform well enough in practice to be useful (ie. yielding a good solution and terminating before coming across a malign solution)?
Well I was thinking that before this alignment problem could even happen, a brute force search would be exponentially expensive so Amplification wouldn’t work at all in practice on a question that requires “creative insight”.
My concern is that this won’t be competitive with other AGI approaches that don’t try to maintain alignment/corrigibility, for example using reinforcement learning to “raise” an AGI through a series of increasingly complex virtual environments, and letting the AGI incrementally build its own search heuristics and conceptual framework algorithms.
BTW, thanks for trying to understand Paul’s ideas and engaging in these discussions. It would be nice to get a critical mass of people to understand these ideas well enough to sustain discussions and make progress without Paul having to be present all the time.