Thanks for the explanation, but I’m not seeing how this would work in general. Let’s use Paul’s notation where Ai+1=Distill(Bi) and Bi=Amplify(Ai). And say we’re searching for binary strings s such that F(s, t)=1 for fixed F and variable t. So we start with B0 (a human) and distill+amplify it into B1 which searches strings up to length d0 (which requires searching a tree of depth d0 at cost 2d0). Then we distill that into A2 which learns how to solve problems of depth d0 faster than B1, and suppose it does that by learning the heuristic that the first bit of s is almost always the parity of t.
Now suppose I’m an instance of A2 running at the top level of B2. I have access to other instances of A2 which can solve this problem up to length d0 but I need to solve a problem of length d1(which let’s say is d0+1). So I ask another instance of A2 “Find a string s of length d1 such that s starts with 0 and F(s, t)=1” then followed by query to another A2 “Find a string s of length d1 such that s starts with 1 and F(s, t)=1″ Well the heuristic that A2 learned doesn’t help to speed up those queries so each of them is still going to take time 2d0.
The problem here as I see it is it’s not clear how I, as A2, can make use of the previously learned heuristics to help solve larger problems more efficiently, since I have no introspective access to them. If there’s a way to do that and I’m missing it, please let me know.
(I posted this from greaterwrong.com and it seems the LaTeX isn’t working. Someone please PM me if you know how to fix this.)
[Habryka edit: Fixed your LaTeX for you. GreaterWrong doesn’t currently support LaTeX I think. We would have to either improve our API, or greaterwrong would need to do some more fancy client-side processing to make it work]
For this example, I think you can do this if you implement the additional query “How likely is the search on [partial solution] to return a complete solution?”. This is asked of all potential branches before recursing into them.A2 learns to answer the solution probability query efficiently.
Then in amplification of A2 in the top level of B2 looking for a solution to problem of length d1, the root agent first asks “How likely is the search on [string starting with 0] to return a complete solution?” and “How likely is the search on [string starting with 1] to return a complete solution?”. Then, the root agent first queries whichever subtree is most likely to contain a solution. (This doesn’t improve worst case running time, but does improve average case running time.).
This is analogous to running a value estimation network in tree search, and then picking the most promising node to query first.
This seems to require that the heuristic be of a certain form and you know what that form is. What if it’s more general, like run algorithm G on t to produce a list of guesses for s, then check the guesses in that order?
1. I don’t think that every heuristic A2 could use to solve problems of depth d0 needs to be applicable to performing the search of depth d1 - we only need enough heuristics to be useable to be able to keep increasing the search depth at each amplification round in an efficient manner. It’s possible that some of the value of heuristics like “solution is likely to be an output of algorithm G” could be (imperfectly) captured through some small universal set of heuristics that we can specify how to learn and exploit. (I think that variations on “How likely is the search on [partial solution] to produce an answer?” might get us pretty far).
The AlphaGo analogy is that the original supervised move prediction algorithm didn’t necessarily learn every heuristic that the experts used, but just learned enough to be able to efficiently guide the MCTS to better performance.
(Though I do think that imperfectly learning heuristics might cause alignment problems without a solution to the aligned search problem).
2. This isn’t a problem if once the agent A2 can run algorithm G on t for problems of depth d0, it can directly generalize to applying G to problems of depth d1. Simple Deep RL methods aren’t good at this kind of tasks, but things like the Neural Turing Machine are trying to do better on this sort of tasks. So the ability to learn efficient exponential search could be limited by the underlying agent capability; for some capability range, a problem could be directly solved by an unaligned agent, but couldn’t be solved for an aligned agent. This isn’t a problem if we can surpass that level of capability.
I’m not sure that these considerations fix the problem entirely, or whether Paul would take a different approach.
It also might be worth coming up with a concrete example where some heuristics are not straightforward to generalize from smaller to larger problems, and it seems like this will prevent efficiently learning to solve large problems. The problem, however, would need to be something that humans can solve (ie. finding a string that hashed to a particular value using a cryptographic hash function would be hard to generalize any heuristics from, but I don’t think humans could do it either so it’s outside of scope).
If an RL agent can’t solve a task, then I’m fine with amplification being unable to solve it.
I guess by “RL agent” you mean RL agents of certain specific designs, such as the one you just blogged about, and not RL agents in general, since as far as we know there aren’t any tasks that RL agents in general can’t solve?
BTW, I find it hard to understand your overall optimism (only 10-20% expected value loss from AI risk), since there are so many disjunctive risks to just being able to design an aligned AI that’s competitive with certain kinds of RL agents (such as not solving one of the obstacles you list in the OP), and even if we succeed in doing that we’d have to come up with more capable aligned designs that would be competitive with more advanced RL (or other kinds of) agents. Have you explained this optimism somewhere?
Thanks for the explanation, but I’m not seeing how this would work in general. Let’s use Paul’s notation where Ai+1=Distill(Bi) and Bi=Amplify(Ai). And say we’re searching for binary strings s such that F(s, t)=1 for fixed F and variable t. So we start with B0 (a human) and distill+amplify it into B1 which searches strings up to length d0 (which requires searching a tree of depth d0 at cost 2d0). Then we distill that into A2 which learns how to solve problems of depth d0 faster than B1, and suppose it does that by learning the heuristic that the first bit of s is almost always the parity of t.
Now suppose I’m an instance of A2 running at the top level of B2. I have access to other instances of A2 which can solve this problem up to length d0 but I need to solve a problem of length d1(which let’s say is d0+1). So I ask another instance of A2 “Find a string s of length d1 such that s starts with 0 and F(s, t)=1” then followed by query to another A2 “Find a string s of length d1 such that s starts with 1 and F(s, t)=1″ Well the heuristic that A2 learned doesn’t help to speed up those queries so each of them is still going to take time 2d0.
The problem here as I see it is it’s not clear how I, as A2, can make use of the previously learned heuristics to help solve larger problems more efficiently, since I have no introspective access to them. If there’s a way to do that and I’m missing it, please let me know.
(I posted this from greaterwrong.com and it seems the LaTeX isn’t working. Someone please PM me if you know how to fix this.)
[Habryka edit: Fixed your LaTeX for you. GreaterWrong doesn’t currently support LaTeX I think. We would have to either improve our API, or greaterwrong would need to do some more fancy client-side processing to make it work]
For this example, I think you can do this if you implement the additional query “How likely is the search on [partial solution] to return a complete solution?”. This is asked of all potential branches before recursing into them.A2 learns to answer the solution probability query efficiently.
Then in amplification of A2 in the top level of B2 looking for a solution to problem of length d1, the root agent first asks “How likely is the search on [string starting with 0] to return a complete solution?” and “How likely is the search on [string starting with 1] to return a complete solution?”. Then, the root agent first queries whichever subtree is most likely to contain a solution. (This doesn’t improve worst case running time, but does improve average case running time.).
This is analogous to running a value estimation network in tree search, and then picking the most promising node to query first.
This seems to require that the heuristic be of a certain form and you know what that form is. What if it’s more general, like run algorithm G on t to produce a list of guesses for s, then check the guesses in that order?
1. I don’t think that every heuristic A2 could use to solve problems of depth d0 needs to be applicable to performing the search of depth d1 - we only need enough heuristics to be useable to be able to keep increasing the search depth at each amplification round in an efficient manner. It’s possible that some of the value of heuristics like “solution is likely to be an output of algorithm G” could be (imperfectly) captured through some small universal set of heuristics that we can specify how to learn and exploit. (I think that variations on “How likely is the search on [partial solution] to produce an answer?” might get us pretty far).
The AlphaGo analogy is that the original supervised move prediction algorithm didn’t necessarily learn every heuristic that the experts used, but just learned enough to be able to efficiently guide the MCTS to better performance.
(Though I do think that imperfectly learning heuristics might cause alignment problems without a solution to the aligned search problem).
2. This isn’t a problem if once the agent A2 can run algorithm G on t for problems of depth d0, it can directly generalize to applying G to problems of depth d1. Simple Deep RL methods aren’t good at this kind of tasks, but things like the Neural Turing Machine are trying to do better on this sort of tasks. So the ability to learn efficient exponential search could be limited by the underlying agent capability; for some capability range, a problem could be directly solved by an unaligned agent, but couldn’t be solved for an aligned agent. This isn’t a problem if we can surpass that level of capability.
I’m not sure that these considerations fix the problem entirely, or whether Paul would take a different approach.
It also might be worth coming up with a concrete example where some heuristics are not straightforward to generalize from smaller to larger problems, and it seems like this will prevent efficiently learning to solve large problems. The problem, however, would need to be something that humans can solve (ie. finding a string that hashed to a particular value using a cryptographic hash function would be hard to generalize any heuristics from, but I don’t think humans could do it either so it’s outside of scope).
If an RL agent can’t solve a task, then I’m fine with amplification being unable to solve it.
I guess by “RL agent” you mean RL agents of certain specific designs, such as the one you just blogged about, and not RL agents in general, since as far as we know there aren’t any tasks that RL agents in general can’t solve?
BTW, I find it hard to understand your overall optimism (only 10-20% expected value loss from AI risk), since there are so many disjunctive risks to just being able to design an aligned AI that’s competitive with certain kinds of RL agents (such as not solving one of the obstacles you list in the OP), and even if we succeed in doing that we’d have to come up with more capable aligned designs that would be competitive with more advanced RL (or other kinds of) agents. Have you explained this optimism somewhere?
In the LesserWrong comment editor, select the text you want to be LaTeX, then press Ctrl+4 (or Cmd+4 on Mac). You can delete the dollar signs.
(Commenting rather than PM’ing so that others will benefit as well.)