There are meaningful distinctions between evolution and other processes referred to as “optimisers”
People should be substantially more careful about invoking evolution as an analogy for the development of AGI, as tempting as this comparison is to make.
“Risks From Learned Optimisation” is one of the most influential AI Safety papers ever written, so I’m going to use it’s framework for defining optimisation.
“We will say that a system is an optimiser if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system” ~Hubinger et al (2019)
It’s worth noting that the authors of this paper do consider evolution to be an example of optimisation (something stated explicitly in the paper). Despite this, I’m going to argue the definition shouldn’t apply to evolution.
2 strong (and 1 weak) Arguments That Evolution Doesn’t Fit This Definition:
Weak Argument 0: Evolution itself isn’t a separate system that is optimising for something. (Micro)evolution is the change in allele frequency over generations. There is no separate entity you can point to and call “evolution”.
Consider how different this is from a human engaged in optimisation to design a bottle cap. We have the system that optimises, and the system that is optimised.
It is tempting to say “the system optimises itself” but then go ahead and define the system you would say is engaged in optimisation. That system isn’t “evolution” but is instead something like “the environment”, “all carbon based structures complexes on earth” or “all matter on the surface of earth” etc.
Strong Argument 1: Evolution does not have an explicitly represented objective function.
This is a major issue. When I’m training a model against a loss function I can explicitly represent that loss function. It is possible to physically implement
There is no single explicit representation of what “fitness” is within our environment.
Strong Argument 2: Evolution isn’t a “conservative” process. The thing that it is optimising “toward” is dependent on the current state of the environment, and changes over time. It is possible for evolution to get caught in “loops” or “cycles”.
- A refresher on conservative fields.
In physics a conservative vector field is vector field that can be understood of the gradient of some other function. By associating any point in that vector field with the corresponding point on the other function, you meaningfully order each point in your field.
To be less abstract, imagine your field is “slope” which describes the gradient of a mountain range. You can talk meaningfully order the points in the slope field by the height of the point they correspond to on the mountain range.
In a conservative vector field, the curl everywhere is zero. Letting a ball roll down the mountain range (with a very high amount of friction) and the ball will find its way to a local minima and stop.
In a non-conservative vector field it is possible to create paths that loop forever.
My local theme-park has a ride called the “Lazy River” which is an artificial river which has been formed into a loop. There is no change in elevation, and the water is kept flowing clockwise by a series of underwater fans which continuously put energy into the system. Families hire floating platforms and can drift endlessly in a circle until their children get bored.
If you throw a ball into the Lazy River it will circle endlessly. If we write down a vector field that describes the force on the ball at any point in the river, it isn’t possible to describe this field as the gradient of another field. There is no absolute ordering of points in this field.
- Evolution isn’t conservative
In the ball rolling over the hills, we might be able to say that as time evolves it seems to be getting “lower”. By appealing to the function that it is the gradient of, we can meaningfully say if two points are higher, lower or the same height.
In the lazy river, this is no longer possible. Locally, you could describe the motion of the ball as rolling down a hill, but continuing this process around the entire loop tells you that you are describing an impossible MC Escher Waterfall.
If evolution is not conservative (and hence has no underlying goal it is optimising toward) then it would be possible to observe creatures evolving in circles, stuck in “loops”. Evolving, losing then re-evolving the same structures.
This is not only possible, but it has been observed. The side-blotched lizard appear to shift throat colours in a cyclic repeating pattern. For more details, See this talk by John Baez.
To summarise, the “direction” or “thing evolution is optimising toward” cannot be some internally represented thing, because the thing it optimises toward is a function of not just the environment but also of the things evolving in that environment.
Who cares?
Using evolution as an example of “optimisation” is incredible common among AI safety researchers, and can be found in Yudkowsky’s writing on Evolution in The Sequences.
I think the notion of evolution as an optimiser can do more harm than good.
As a concrete example, Nate’s “Sharp Left Turn” post was weakened substantially by invoking an evolution analogy, which spawned a lengthy debate (see Pope, 2023 and the response from Zvi). This issue could have been skipped entirely simply by arguing in favour of the Sharp Left Turn without any reference to evolution (see my upcoming post on this topic).
Clarification Edit: Further I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
To conclude
An intentionally provocative and attention-grabbing summary of this post might be “evolution is not an optimiser”, but that is essentially just a semantic argument and isn’t quite what I’m trying to say.
A better summary is “in the category of things that are referred to as optimisation, evolution has numerous properties that it does not share with ML optimisation, so be careful about invoking it as an analogy”.
On similarities between ML similar to RLHF and evolution: You might notice that any form of ML that relies on human feedback, also fails to have an “internal representation” of what it’s optimising toward, instead getting feedback from humans assessing it’s performance.
Like evolution, it is also possible to set up this optimisation process so that it is also not “conservative”.
A contrived example of this: Consider training a language model to complete text if the humans giving feedback exhibited a preference for text that was a function of what they’d just read. If the model outputs dense, scientific jargon the humans prefer lighter prose. If the models output light prose, the humans prefer more formal writing etc.
(This is a draft of a post, very keen for feedback and disagreement)
(For the record—I do think there are other reasons to think that the evolution example is not informative about the probability of AGI risk, namely the obvious point that different specific optimization algorithms may have different properties, see my brief discussion here.)
In general, I’m very strongly opposed to the activity that I call “analogy-target policing”, where somebody points out some differences between X and Y and says “therefore it’s dubious to analogize X and Y”, independent of how the analogy is being used. Context matters. There are always differences / disanalogies! That’s the whole point of an analogy—X≠Y! Nobody analogizes something to itself! So there have to be differences!!
And sometimes a difference between X and Y is critically important, as it undermines the point that someone is trying to make by bringing up the analogy between X and Y. And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy, so the analogy is perfectly great. See my discussion here with various examples and discussion, including Shakespeare’s comparing a woman to a summer’s day. :)
…Granted, you don’t say that you have proven all analogies between evolution and AGI x-risk to be invalid. Rather, you say “in the category of things that are referred to as optimisation, evolution has numerous properties that it does not share with ML optimisation, so be careful about invoking it as an analogy”. But that’s not news at all! If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.
Evolution itself isn’t a separate system that is optimising for something. (Micro)evolution is the change in allele frequency over generations. There is no separate entity you can point to and call “evolution”.
Evolution is an optimization process. For some optimization processes, you can point to some “system” that is orchestrating that process, and for other optimization processes, you can’t. I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system?
Evolution does not have an explicitly represented objective function
OK, but if it did, it would change nothing, right? I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated. The difference between these is irrelevant, right? Why would anyone care? The same process will unfold, with the same results, for the same reason.
Again, I don’t think anyone has made any argument about evolution versus AGI risk that relies on the objective function being explicitly rather than implicitly represented.
Evolution isn’t conservative
Yet again, I don’t see why this matters for anything. When people make these kinds of arguments, they might bring up particular aspects of human or animal behavior—things like “humans don’t generally care about their inclusive genetic fitness” and “humans do care about love and friendship and beauty”. Both those properties are unrelated to situations where evolution produces cycles. E.g. love- and friendship- and beauty-related innate drives were stable local optima in early human evolution.
Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)
This was a great reply. In responding to it my confidence in my arguments declined substantially.
I’m going to make what I think is a very cruxy high level clarification and then address individual points.
High Level Clarification
My original post has clearly done a poor job at explaining why I think the mismatch between the optimisation definition given in RFLO and evolution matters. I think clarifying my position will address the bulk of your concerns.
“I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system? [...] Yet again, I don’t see why this matters for anything.”
I believe you have interpreted the high level motivation behind my post to be something along the lines of “evolution doesn’t fit this definition of optimisation, and therefore this should be a reason to doubt the conclusions of Nate, Eliezer or anyone else invoking evolution.”
This is a completely fair reading of my original post, but it wasn’t my intended message.
I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
I’m going to immediately edit my original post to make this more clear thanks to your feedback!
Detailed Responses to Individual Points
“And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy”
I agree. I mentioned Nate’s evolution analogy because I think it wasn’t needed to make the point and lead to confusion. I don’t think the properties of evolution I’ve mentioned can be used to argue against the Sharp Left Turn.
“If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.”
Keeping in mind the “deconfusion” lens that motivated my original post, I don’t think these distinctions point to any flaws in the definition of optimisation given in RFLO, in the same way that evolution failing to satisfy the criteria of having an internally represented objective does.
”I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated.”
I don’t have any great insight here, but that’s very interesting to think about. I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient. This seems broadly similar to the relationship between a search algorithm and an implementation of a the same algorithm that is simply a gigantic pre-computed lookup table.
“Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)”
Honestly, I think this example has caused me to lose substantial confidence in my original argument.
Clearly,the AlphaZero training process should fit under any reasonable definition of optimisation and as you point out there is no reason fundamental reason a similar training process on a variant game couldn’t get stuck in a loop.
The only distinction I can think of is that the definition of “checkmate” is essentially a function of board state and that function is internally represented in the system as a set of conditions. This means you can point to an internal representation and alter it by explicitly changing certain bits.
In contrast, evolution is stuck optimising for genes which are good at (directly or indirectly) getting passed on.
I guess the equivalent of changing the checkmate rules would be changing the environment to tweak which organisms tend to evolve. But the environment doesn’t provide an explicit representation.
To conclude
I’m fairly confident “explicit internal representation” part of the optimisation definition in RFLO needs tweaking.
I had previously been tossing around the idea that evolution was sort of it’s own thing that was meaningfully distinct from other things called optimisers, but the Alpha Go example has scuttled that idea.
> I don’t even know why the RFLO paper put that criterion in …
I don’t have any great insight here, but that’s very interesting to think about.
I thought about it a bit more and I think I know what they were doing. I bet they were trying to preempt the pedantic point (related) that everything is an optimization process if you allow the objective function to be arbitrarily convoluted and post hoc. E.g. any trained model M is the global maximum of the objective function “F where F(x)=1 if x is the exact model M, and F(x)=0 in all other cases”. So if you’re not careful, you can define “optimization process” in a way that also includes rocks.
I think they used “explicitly represented objective function” as a straightforward case that would be adequate to most applications, but if they had wanted to they could have replaced it with the slightly-more-general notion of “objective function that can be deduced relatively straightforwardly by inspecting the nuts-and-bolts of the optimization process, and in particular it shouldn’t be a post hoc thing where you have to simulate the entire process of running the (so-called) optimization algorithm in order to answer the question of what the objective is.”
I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient.
Oh sorry, that’s not what I meant. For example (see here) the Python code y *= 1.5 - x * y * y / 2 happens to be one iteration of Newton’s method to make y a better approximation to 1/√x. So if you keep running this line of code over and over, you’re straightforwardly running an optimization algorithm that finds the y that minimizes the objective function |x – 1/y²|. But I don’t see “|x – 1/y²|” written or calculated anywhere in that one line of source code. The source code skipped the objective and went straight to the update step.
I have a vague notion that I’ve seen a more direct example kinda like this in the RL literature. Umm, maybe it was the policy gradient formula used in some version of policy-gradient RL? I recall that (this version of) the policy gradient formula was something involving logarithms, and I was confused for quite a while where this formula came from, until eventually I found an explanation online where someone started with a straightforward intuitive objective function and did some math magic and wound up deriving that policy gradient formula with the logarithms. But the policy gradient formula (= update step) is all you need to actually run that RL process in practice. The actual objective function need not be written or calculated anywhere in the RL source code. (I could be misremembering. This was years ago.)
Another example in ML of a “non-conservative” optimization process: a common failure mode of GANs is mode collapse, wherein the generator and discriminator get stuck in a loop. The generator produces just one output that fools the discriminator, the discriminator memorizes it, the generator switches to another, until eventually they get back to the same output again.
In the rolling ball analogy, we could say that the ball rolls down into a divot, but the landscape flexes against the ball to raise it up again, and then the ball rolls into another divot, and so on.
Evolution may not act as an optimizer globally, since selective pressure is different for different populations of organisms on different niches. However, it does act as an optimizer locally.
For a given population in a given environment that happens to be changing slowly enough, the set of all variations in each generation act as a sort of numerical gradient estimate of the local fitness landscape. This allows the population as a whole to perform stochastic gradient descent. Those with greater fitness for the environment could be said to be lower on the local fitness landscape, so their is an ordering for that population.
In a sufficiently constant environment, evolution very much does act as an optimization process. Sure, the fitness landscape can change, even by organisms undergoing evolution (e.g. the Great Oxygenation Event of yester-eon, or the Anthropogenic Mass Extinction of today), which can lead to cycling. But many organisms do find very stable local minima of the fitness landscape for their species, like the coelacanth, horseshoe crab, cockroach, and many other “living fossils”. Humans are certainly nowhere near our global optimum, especially with the rapid changes to the fitness function wrought by civilization, but that doesn’t mean that there isn’t a gradient that we’re following.
Also, consider a more traditional optimization process, such as a neural network undergoing gradient descent. If, in the process of training, you kept changing the training dataset, shifting the distribution, you would in effect be changing the optimization target.
Each minibatch generates a different gradient estimate, and a poorly randomized ordering of the data could even lead to training in circles.
Changing environments are like changing the training set for evolution. Differential reproductive success (mean squared error) is the fixed cost function, but the gradient that the population (network backpropagation) computes at any generation (training step) depends on the particular set of environmental factors (training data in the minibatch).
This is somewhat along the lines of the point I was trying to make with the Lazy River analogy.
I think the crux is that I’m arguing that because the “target” that evolution appears to be evolving towards is dependent on the state and differs as the state changes, it doesn’t seem right to refer to it as “internally represented”.
There are meaningful distinctions between evolution and other processes referred to as “optimisers”
People should be substantially more careful about invoking evolution as an analogy for the development of AGI, as tempting as this comparison is to make.
“Risks From Learned Optimisation” is one of the most influential AI Safety papers ever written, so I’m going to use it’s framework for defining optimisation.
“We will say that a system is an optimiser if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system” ~Hubinger et al (2019)
It’s worth noting that the authors of this paper do consider evolution to be an example of optimisation (something stated explicitly in the paper). Despite this, I’m going to argue the definition shouldn’t apply to evolution.
2 strong (and 1 weak) Arguments That Evolution Doesn’t Fit This Definition:
Weak Argument 0:
Evolution itself isn’t a separate system that is optimising for something. (Micro)evolution is the change in allele frequency over generations. There is no separate entity you can point to and call “evolution”.
Consider how different this is from a human engaged in optimisation to design a bottle cap. We have the system that optimises, and the system that is optimised.
It is tempting to say “the system optimises itself” but then go ahead and define the system you would say is engaged in optimisation. That system isn’t “evolution” but is instead something like “the environment”, “all carbon based structures complexes on earth” or “all matter on the surface of earth” etc.
Strong Argument 1:
Evolution does not have an explicitly represented objective function.
This is a major issue. When I’m training a model against a loss function I can explicitly represent that loss function. It is possible to physically implement
There is no single explicit representation of what “fitness” is within our environment.
Strong Argument 2:
Evolution isn’t a “conservative” process. The thing that it is optimising “toward” is dependent on the current state of the environment, and changes over time. It is possible for evolution to get caught in “loops” or “cycles”.
- A refresher on conservative fields.
In physics a conservative vector field is vector field that can be understood of the gradient of some other function. By associating any point in that vector field with the corresponding point on the other function, you meaningfully order each point in your field.
To be less abstract, imagine your field is “slope” which describes the gradient of a mountain range. You can talk meaningfully order the points in the slope field by the height of the point they correspond to on the mountain range.
In a conservative vector field, the curl everywhere is zero. Letting a ball roll down the mountain range (with a very high amount of friction) and the ball will find its way to a local minima and stop.
In a non-conservative vector field it is possible to create paths that loop forever.
My local theme-park has a ride called the “Lazy River” which is an artificial river which has been formed into a loop. There is no change in elevation, and the water is kept flowing clockwise by a series of underwater fans which continuously put energy into the system. Families hire floating platforms and can drift endlessly in a circle until their children get bored.
If you throw a ball into the Lazy River it will circle endlessly. If we write down a vector field that describes the force on the ball at any point in the river, it isn’t possible to describe this field as the gradient of another field. There is no absolute ordering of points in this field.
- Evolution isn’t conservative
In the ball rolling over the hills, we might be able to say that as time evolves it seems to be getting “lower”. By appealing to the function that it is the gradient of, we can meaningfully say if two points are higher, lower or the same height.
In the lazy river, this is no longer possible. Locally, you could describe the motion of the ball as rolling down a hill, but continuing this process around the entire loop tells you that you are describing an impossible MC Escher Waterfall.
If evolution is not conservative (and hence has no underlying goal it is optimising toward) then it would be possible to observe creatures evolving in circles, stuck in “loops”. Evolving, losing then re-evolving the same structures.
This is not only possible, but it has been observed. The side-blotched lizard appear to shift throat colours in a cyclic repeating pattern. For more details, See this talk by John Baez.
To summarise, the “direction” or “thing evolution is optimising toward” cannot be some internally represented thing, because the thing it optimises toward is a function of not just the environment but also of the things evolving in that environment.
Who cares?
Using evolution as an example of “optimisation” is incredible common among AI safety researchers, and can be found in Yudkowsky’s writing on Evolution in The Sequences.
I think the notion of evolution as an optimiser can do more harm than good.
As a concrete example, Nate’s “Sharp Left Turn” post was weakened substantially by invoking an evolution analogy, which spawned a lengthy debate (see Pope, 2023 and the response from Zvi). This issue could have been skipped entirely simply by arguing in favour of the Sharp Left Turn without any reference to evolution (see my upcoming post on this topic).
Clarification Edit:
Further I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
To conclude
An intentionally provocative and attention-grabbing summary of this post might be “evolution is not an optimiser”, but that is essentially just a semantic argument and isn’t quite what I’m trying to say.
A better summary is “in the category of things that are referred to as optimisation, evolution has numerous properties that it does not share with ML optimisation, so be careful about invoking it as an analogy”.
On similarities between ML similar to RLHF and evolution:
You might notice that any form of ML that relies on human feedback, also fails to have an “internal representation” of what it’s optimising toward, instead getting feedback from humans assessing it’s performance.
Like evolution, it is also possible to set up this optimisation process so that it is also not “conservative”.
A contrived example of this:
Consider training a language model to complete text if the humans giving feedback exhibited a preference for text that was a function of what they’d just read. If the model outputs dense, scientific jargon the humans prefer lighter prose. If the models output light prose, the humans prefer more formal writing etc.
(This is a draft of a post, very keen for feedback and disagreement)
(For the record—I do think there are other reasons to think that the evolution example is not informative about the probability of AGI risk, namely the obvious point that different specific optimization algorithms may have different properties, see my brief discussion here.)
In general, I’m very strongly opposed to the activity that I call “analogy-target policing”, where somebody points out some differences between X and Y and says “therefore it’s dubious to analogize X and Y”, independent of how the analogy is being used. Context matters. There are always differences / disanalogies! That’s the whole point of an analogy—X≠Y! Nobody analogizes something to itself! So there have to be differences!!
And sometimes a difference between X and Y is critically important, as it undermines the point that someone is trying to make by bringing up the analogy between X and Y. And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy, so the analogy is perfectly great. See my discussion here with various examples and discussion, including Shakespeare’s comparing a woman to a summer’s day. :)
…Granted, you don’t say that you have proven all analogies between evolution and AGI x-risk to be invalid. Rather, you say “in the category of things that are referred to as optimisation, evolution has numerous properties that it does not share with ML optimisation, so be careful about invoking it as an analogy”. But that’s not news at all! If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.
Evolution is an optimization process. For some optimization processes, you can point to some “system” that is orchestrating that process, and for other optimization processes, you can’t. I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system?
OK, but if it did, it would change nothing, right? I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated. The difference between these is irrelevant, right? Why would anyone care? The same process will unfold, with the same results, for the same reason.
Again, I don’t think anyone has made any argument about evolution versus AGI risk that relies on the objective function being explicitly rather than implicitly represented.
Yet again, I don’t see why this matters for anything. When people make these kinds of arguments, they might bring up particular aspects of human or animal behavior—things like “humans don’t generally care about their inclusive genetic fitness” and “humans do care about love and friendship and beauty”. Both those properties are unrelated to situations where evolution produces cycles. E.g. love- and friendship- and beauty-related innate drives were stable local optima in early human evolution.
Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)
This was a great reply. In responding to it my confidence in my arguments declined substantially.
I’m going to make what I think is a very cruxy high level clarification and then address individual points.
High Level Clarification
My original post has clearly done a poor job at explaining why I think the mismatch between the optimisation definition given in RFLO and evolution matters. I think clarifying my position will address the bulk of your concerns.
“I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system?
[...]
Yet again, I don’t see why this matters for anything.”
I believe you have interpreted the high level motivation behind my post to be something along the lines of “evolution doesn’t fit this definition of optimisation, and therefore this should be a reason to doubt the conclusions of Nate, Eliezer or anyone else invoking evolution.”
This is a completely fair reading of my original post, but it wasn’t my intended message.
I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
I’m going to immediately edit my original post to make this more clear thanks to your feedback!
Detailed Responses to Individual Points
“And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy”
I agree. I mentioned Nate’s evolution analogy because I think it wasn’t needed to make the point and lead to confusion. I don’t think the properties of evolution I’ve mentioned can be used to argue against the Sharp Left Turn.
“If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.”
Keeping in mind the “deconfusion” lens that motivated my original post, I don’t think these distinctions point to any flaws in the definition of optimisation given in RFLO, in the same way that evolution failing to satisfy the criteria of having an internally represented objective does.
”I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated.”
I don’t have any great insight here, but that’s very interesting to think about. I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient. This seems broadly similar to the relationship between a search algorithm and an implementation of a the same algorithm that is simply a gigantic pre-computed lookup table.
“Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)”
Honestly, I think this example has caused me to lose substantial confidence in my original argument.
Clearly,the AlphaZero training process should fit under any reasonable definition of optimisation and as you point out there is no reason fundamental reason a similar training process on a variant game couldn’t get stuck in a loop.
The only distinction I can think of is that the definition of “checkmate” is essentially a function of board state and that function is internally represented in the system as a set of conditions. This means you can point to an internal representation and alter it by explicitly changing certain bits.
In contrast, evolution is stuck optimising for genes which are good at (directly or indirectly) getting passed on.
I guess the equivalent of changing the checkmate rules would be changing the environment to tweak which organisms tend to evolve. But the environment doesn’t provide an explicit representation.
To conclude
I’m fairly confident “explicit internal representation” part of the optimisation definition in RFLO needs tweaking.
I had previously been tossing around the idea that evolution was sort of it’s own thing that was meaningfully distinct from other things called optimisers, but the Alpha Go example has scuttled that idea.
Thanks for your reply! A couple quick things:
I thought about it a bit more and I think I know what they were doing. I bet they were trying to preempt the pedantic point (related) that everything is an optimization process if you allow the objective function to be arbitrarily convoluted and post hoc. E.g. any trained model M is the global maximum of the objective function “F where F(x)=1 if x is the exact model M, and F(x)=0 in all other cases”. So if you’re not careful, you can define “optimization process” in a way that also includes rocks.
I think they used “explicitly represented objective function” as a straightforward case that would be adequate to most applications, but if they had wanted to they could have replaced it with the slightly-more-general notion of “objective function that can be deduced relatively straightforwardly by inspecting the nuts-and-bolts of the optimization process, and in particular it shouldn’t be a post hoc thing where you have to simulate the entire process of running the (so-called) optimization algorithm in order to answer the question of what the objective is.”
Oh sorry, that’s not what I meant. For example (see here) the Python code
y *= 1.5 - x * y * y / 2
happens to be one iteration of Newton’s method to make y a better approximation to 1/√x. So if you keep running this line of code over and over, you’re straightforwardly running an optimization algorithm that finds the y that minimizes the objective function |x – 1/y²|. But I don’t see “|x – 1/y²|” written or calculated anywhere in that one line of source code. The source code skipped the objective and went straight to the update step.I have a vague notion that I’ve seen a more direct example kinda like this in the RL literature. Umm, maybe it was the policy gradient formula used in some version of policy-gradient RL? I recall that (this version of) the policy gradient formula was something involving logarithms, and I was confused for quite a while where this formula came from, until eventually I found an explanation online where someone started with a straightforward intuitive objective function and did some math magic and wound up deriving that policy gradient formula with the logarithms. But the policy gradient formula (= update step) is all you need to actually run that RL process in practice. The actual objective function need not be written or calculated anywhere in the RL source code. (I could be misremembering. This was years ago.)
Another example in ML of a “non-conservative” optimization process: a common failure mode of GANs is mode collapse, wherein the generator and discriminator get stuck in a loop. The generator produces just one output that fools the discriminator, the discriminator memorizes it, the generator switches to another, until eventually they get back to the same output again.
In the rolling ball analogy, we could say that the ball rolls down into a divot, but the landscape flexes against the ball to raise it up again, and then the ball rolls into another divot, and so on.
Evolution may not act as an optimizer globally, since selective pressure is different for different populations of organisms on different niches. However, it does act as an optimizer locally.
For a given population in a given environment that happens to be changing slowly enough, the set of all variations in each generation act as a sort of numerical gradient estimate of the local fitness landscape. This allows the population as a whole to perform stochastic gradient descent. Those with greater fitness for the environment could be said to be lower on the local fitness landscape, so their is an ordering for that population.
In a sufficiently constant environment, evolution very much does act as an optimization process. Sure, the fitness landscape can change, even by organisms undergoing evolution (e.g. the Great Oxygenation Event of yester-eon, or the Anthropogenic Mass Extinction of today), which can lead to cycling. But many organisms do find very stable local minima of the fitness landscape for their species, like the coelacanth, horseshoe crab, cockroach, and many other “living fossils”. Humans are certainly nowhere near our global optimum, especially with the rapid changes to the fitness function wrought by civilization, but that doesn’t mean that there isn’t a gradient that we’re following.
Also, consider a more traditional optimization process, such as a neural network undergoing gradient descent. If, in the process of training, you kept changing the training dataset, shifting the distribution, you would in effect be changing the optimization target.
Each minibatch generates a different gradient estimate, and a poorly randomized ordering of the data could even lead to training in circles.
Changing environments are like changing the training set for evolution. Differential reproductive success (mean squared error) is the fixed cost function, but the gradient that the population (network backpropagation) computes at any generation (training step) depends on the particular set of environmental factors (training data in the minibatch).
This is somewhat along the lines of the point I was trying to make with the Lazy River analogy.
I think the crux is that I’m arguing that because the “target” that evolution appears to be evolving towards is dependent on the state and differs as the state changes, it doesn’t seem right to refer to it as “internally represented”.