(For the record—I do think there are other reasons to think that the evolution example is not informative about the probability of AGI risk, namely the obvious point that different specific optimization algorithms may have different properties, see my brief discussion here.)
In general, I’m very strongly opposed to the activity that I call “analogy-target policing”, where somebody points out some differences between X and Y and says “therefore it’s dubious to analogize X and Y”, independent of how the analogy is being used. Context matters. There are always differences / disanalogies! That’s the whole point of an analogy—X≠Y! Nobody analogizes something to itself! So there have to be differences!!
And sometimes a difference between X and Y is critically important, as it undermines the point that someone is trying to make by bringing up the analogy between X and Y. And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy, so the analogy is perfectly great. See my discussion here with various examples and discussion, including Shakespeare’s comparing a woman to a summer’s day. :)
…Granted, you don’t say that you have proven all analogies between evolution and AGI x-risk to be invalid. Rather, you say “in the category of things that are referred to as optimisation, evolution has numerous properties that it does not share with ML optimisation, so be careful about invoking it as an analogy”. But that’s not news at all! If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.
Evolution itself isn’t a separate system that is optimising for something. (Micro)evolution is the change in allele frequency over generations. There is no separate entity you can point to and call “evolution”.
Evolution is an optimization process. For some optimization processes, you can point to some “system” that is orchestrating that process, and for other optimization processes, you can’t. I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system?
Evolution does not have an explicitly represented objective function
OK, but if it did, it would change nothing, right? I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated. The difference between these is irrelevant, right? Why would anyone care? The same process will unfold, with the same results, for the same reason.
Again, I don’t think anyone has made any argument about evolution versus AGI risk that relies on the objective function being explicitly rather than implicitly represented.
Evolution isn’t conservative
Yet again, I don’t see why this matters for anything. When people make these kinds of arguments, they might bring up particular aspects of human or animal behavior—things like “humans don’t generally care about their inclusive genetic fitness” and “humans do care about love and friendship and beauty”. Both those properties are unrelated to situations where evolution produces cycles. E.g. love- and friendship- and beauty-related innate drives were stable local optima in early human evolution.
Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)
This was a great reply. In responding to it my confidence in my arguments declined substantially.
I’m going to make what I think is a very cruxy high level clarification and then address individual points.
High Level Clarification
My original post has clearly done a poor job at explaining why I think the mismatch between the optimisation definition given in RFLO and evolution matters. I think clarifying my position will address the bulk of your concerns.
“I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system? [...] Yet again, I don’t see why this matters for anything.”
I believe you have interpreted the high level motivation behind my post to be something along the lines of “evolution doesn’t fit this definition of optimisation, and therefore this should be a reason to doubt the conclusions of Nate, Eliezer or anyone else invoking evolution.”
This is a completely fair reading of my original post, but it wasn’t my intended message.
I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
I’m going to immediately edit my original post to make this more clear thanks to your feedback!
Detailed Responses to Individual Points
“And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy”
I agree. I mentioned Nate’s evolution analogy because I think it wasn’t needed to make the point and lead to confusion. I don’t think the properties of evolution I’ve mentioned can be used to argue against the Sharp Left Turn.
“If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.”
Keeping in mind the “deconfusion” lens that motivated my original post, I don’t think these distinctions point to any flaws in the definition of optimisation given in RFLO, in the same way that evolution failing to satisfy the criteria of having an internally represented objective does.
”I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated.”
I don’t have any great insight here, but that’s very interesting to think about. I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient. This seems broadly similar to the relationship between a search algorithm and an implementation of a the same algorithm that is simply a gigantic pre-computed lookup table.
“Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)”
Honestly, I think this example has caused me to lose substantial confidence in my original argument.
Clearly,the AlphaZero training process should fit under any reasonable definition of optimisation and as you point out there is no reason fundamental reason a similar training process on a variant game couldn’t get stuck in a loop.
The only distinction I can think of is that the definition of “checkmate” is essentially a function of board state and that function is internally represented in the system as a set of conditions. This means you can point to an internal representation and alter it by explicitly changing certain bits.
In contrast, evolution is stuck optimising for genes which are good at (directly or indirectly) getting passed on.
I guess the equivalent of changing the checkmate rules would be changing the environment to tweak which organisms tend to evolve. But the environment doesn’t provide an explicit representation.
To conclude
I’m fairly confident “explicit internal representation” part of the optimisation definition in RFLO needs tweaking.
I had previously been tossing around the idea that evolution was sort of it’s own thing that was meaningfully distinct from other things called optimisers, but the Alpha Go example has scuttled that idea.
> I don’t even know why the RFLO paper put that criterion in …
I don’t have any great insight here, but that’s very interesting to think about.
I thought about it a bit more and I think I know what they were doing. I bet they were trying to preempt the pedantic point (related) that everything is an optimization process if you allow the objective function to be arbitrarily convoluted and post hoc. E.g. any trained model M is the global maximum of the objective function “F where F(x)=1 if x is the exact model M, and F(x)=0 in all other cases”. So if you’re not careful, you can define “optimization process” in a way that also includes rocks.
I think they used “explicitly represented objective function” as a straightforward case that would be adequate to most applications, but if they had wanted to they could have replaced it with the slightly-more-general notion of “objective function that can be deduced relatively straightforwardly by inspecting the nuts-and-bolts of the optimization process, and in particular it shouldn’t be a post hoc thing where you have to simulate the entire process of running the (so-called) optimization algorithm in order to answer the question of what the objective is.”
I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient.
Oh sorry, that’s not what I meant. For example (see here) the Python code y *= 1.5 - x * y * y / 2 happens to be one iteration of Newton’s method to make y a better approximation to 1/√x. So if you keep running this line of code over and over, you’re straightforwardly running an optimization algorithm that finds the y that minimizes the objective function |x – 1/y²|. But I don’t see “|x – 1/y²|” written or calculated anywhere in that one line of source code. The source code skipped the objective and went straight to the update step.
I have a vague notion that I’ve seen a more direct example kinda like this in the RL literature. Umm, maybe it was the policy gradient formula used in some version of policy-gradient RL? I recall that (this version of) the policy gradient formula was something involving logarithms, and I was confused for quite a while where this formula came from, until eventually I found an explanation online where someone started with a straightforward intuitive objective function and did some math magic and wound up deriving that policy gradient formula with the logarithms. But the policy gradient formula (= update step) is all you need to actually run that RL process in practice. The actual objective function need not be written or calculated anywhere in the RL source code. (I could be misremembering. This was years ago.)
(For the record—I do think there are other reasons to think that the evolution example is not informative about the probability of AGI risk, namely the obvious point that different specific optimization algorithms may have different properties, see my brief discussion here.)
In general, I’m very strongly opposed to the activity that I call “analogy-target policing”, where somebody points out some differences between X and Y and says “therefore it’s dubious to analogize X and Y”, independent of how the analogy is being used. Context matters. There are always differences / disanalogies! That’s the whole point of an analogy—X≠Y! Nobody analogizes something to itself! So there have to be differences!!
And sometimes a difference between X and Y is critically important, as it undermines the point that someone is trying to make by bringing up the analogy between X and Y. And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy, so the analogy is perfectly great. See my discussion here with various examples and discussion, including Shakespeare’s comparing a woman to a summer’s day. :)
…Granted, you don’t say that you have proven all analogies between evolution and AGI x-risk to be invalid. Rather, you say “in the category of things that are referred to as optimisation, evolution has numerous properties that it does not share with ML optimisation, so be careful about invoking it as an analogy”. But that’s not news at all! If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.
Evolution is an optimization process. For some optimization processes, you can point to some “system” that is orchestrating that process, and for other optimization processes, you can’t. I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system?
OK, but if it did, it would change nothing, right? I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated. The difference between these is irrelevant, right? Why would anyone care? The same process will unfold, with the same results, for the same reason.
Again, I don’t think anyone has made any argument about evolution versus AGI risk that relies on the objective function being explicitly rather than implicitly represented.
Yet again, I don’t see why this matters for anything. When people make these kinds of arguments, they might bring up particular aspects of human or animal behavior—things like “humans don’t generally care about their inclusive genetic fitness” and “humans do care about love and friendship and beauty”. Both those properties are unrelated to situations where evolution produces cycles. E.g. love- and friendship- and beauty-related innate drives were stable local optima in early human evolution.
Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)
This was a great reply. In responding to it my confidence in my arguments declined substantially.
I’m going to make what I think is a very cruxy high level clarification and then address individual points.
High Level Clarification
My original post has clearly done a poor job at explaining why I think the mismatch between the optimisation definition given in RFLO and evolution matters. I think clarifying my position will address the bulk of your concerns.
“I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system?
[...]
Yet again, I don’t see why this matters for anything.”
I believe you have interpreted the high level motivation behind my post to be something along the lines of “evolution doesn’t fit this definition of optimisation, and therefore this should be a reason to doubt the conclusions of Nate, Eliezer or anyone else invoking evolution.”
This is a completely fair reading of my original post, but it wasn’t my intended message.
I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
I’m going to immediately edit my original post to make this more clear thanks to your feedback!
Detailed Responses to Individual Points
“And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy”
I agree. I mentioned Nate’s evolution analogy because I think it wasn’t needed to make the point and lead to confusion. I don’t think the properties of evolution I’ve mentioned can be used to argue against the Sharp Left Turn.
“If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.”
Keeping in mind the “deconfusion” lens that motivated my original post, I don’t think these distinctions point to any flaws in the definition of optimisation given in RFLO, in the same way that evolution failing to satisfy the criteria of having an internally represented objective does.
”I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated.”
I don’t have any great insight here, but that’s very interesting to think about. I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient. This seems broadly similar to the relationship between a search algorithm and an implementation of a the same algorithm that is simply a gigantic pre-computed lookup table.
“Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)”
Honestly, I think this example has caused me to lose substantial confidence in my original argument.
Clearly,the AlphaZero training process should fit under any reasonable definition of optimisation and as you point out there is no reason fundamental reason a similar training process on a variant game couldn’t get stuck in a loop.
The only distinction I can think of is that the definition of “checkmate” is essentially a function of board state and that function is internally represented in the system as a set of conditions. This means you can point to an internal representation and alter it by explicitly changing certain bits.
In contrast, evolution is stuck optimising for genes which are good at (directly or indirectly) getting passed on.
I guess the equivalent of changing the checkmate rules would be changing the environment to tweak which organisms tend to evolve. But the environment doesn’t provide an explicit representation.
To conclude
I’m fairly confident “explicit internal representation” part of the optimisation definition in RFLO needs tweaking.
I had previously been tossing around the idea that evolution was sort of it’s own thing that was meaningfully distinct from other things called optimisers, but the Alpha Go example has scuttled that idea.
Thanks for your reply! A couple quick things:
I thought about it a bit more and I think I know what they were doing. I bet they were trying to preempt the pedantic point (related) that everything is an optimization process if you allow the objective function to be arbitrarily convoluted and post hoc. E.g. any trained model M is the global maximum of the objective function “F where F(x)=1 if x is the exact model M, and F(x)=0 in all other cases”. So if you’re not careful, you can define “optimization process” in a way that also includes rocks.
I think they used “explicitly represented objective function” as a straightforward case that would be adequate to most applications, but if they had wanted to they could have replaced it with the slightly-more-general notion of “objective function that can be deduced relatively straightforwardly by inspecting the nuts-and-bolts of the optimization process, and in particular it shouldn’t be a post hoc thing where you have to simulate the entire process of running the (so-called) optimization algorithm in order to answer the question of what the objective is.”
Oh sorry, that’s not what I meant. For example (see here) the Python code
y *= 1.5 - x * y * y / 2
happens to be one iteration of Newton’s method to make y a better approximation to 1/√x. So if you keep running this line of code over and over, you’re straightforwardly running an optimization algorithm that finds the y that minimizes the objective function |x – 1/y²|. But I don’t see “|x – 1/y²|” written or calculated anywhere in that one line of source code. The source code skipped the objective and went straight to the update step.I have a vague notion that I’ve seen a more direct example kinda like this in the RL literature. Umm, maybe it was the policy gradient formula used in some version of policy-gradient RL? I recall that (this version of) the policy gradient formula was something involving logarithms, and I was confused for quite a while where this formula came from, until eventually I found an explanation online where someone started with a straightforward intuitive objective function and did some math magic and wound up deriving that policy gradient formula with the logarithms. But the policy gradient formula (= update step) is all you need to actually run that RL process in practice. The actual objective function need not be written or calculated anywhere in the RL source code. (I could be misremembering. This was years ago.)