This was a great reply. In responding to it my confidence in my arguments declined substantially.
I’m going to make what I think is a very cruxy high level clarification and then address individual points.
High Level Clarification
My original post has clearly done a poor job at explaining why I think the mismatch between the optimisation definition given in RFLO and evolution matters. I think clarifying my position will address the bulk of your concerns.
“I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system? [...] Yet again, I don’t see why this matters for anything.”
I believe you have interpreted the high level motivation behind my post to be something along the lines of “evolution doesn’t fit this definition of optimisation, and therefore this should be a reason to doubt the conclusions of Nate, Eliezer or anyone else invoking evolution.”
This is a completely fair reading of my original post, but it wasn’t my intended message.
I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
I’m going to immediately edit my original post to make this more clear thanks to your feedback!
Detailed Responses to Individual Points
“And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy”
I agree. I mentioned Nate’s evolution analogy because I think it wasn’t needed to make the point and lead to confusion. I don’t think the properties of evolution I’ve mentioned can be used to argue against the Sharp Left Turn.
“If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.”
Keeping in mind the “deconfusion” lens that motivated my original post, I don’t think these distinctions point to any flaws in the definition of optimisation given in RFLO, in the same way that evolution failing to satisfy the criteria of having an internally represented objective does.
”I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated.”
I don’t have any great insight here, but that’s very interesting to think about. I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient. This seems broadly similar to the relationship between a search algorithm and an implementation of a the same algorithm that is simply a gigantic pre-computed lookup table.
“Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)”
Honestly, I think this example has caused me to lose substantial confidence in my original argument.
Clearly,the AlphaZero training process should fit under any reasonable definition of optimisation and as you point out there is no reason fundamental reason a similar training process on a variant game couldn’t get stuck in a loop.
The only distinction I can think of is that the definition of “checkmate” is essentially a function of board state and that function is internally represented in the system as a set of conditions. This means you can point to an internal representation and alter it by explicitly changing certain bits.
In contrast, evolution is stuck optimising for genes which are good at (directly or indirectly) getting passed on.
I guess the equivalent of changing the checkmate rules would be changing the environment to tweak which organisms tend to evolve. But the environment doesn’t provide an explicit representation.
To conclude
I’m fairly confident “explicit internal representation” part of the optimisation definition in RFLO needs tweaking.
I had previously been tossing around the idea that evolution was sort of it’s own thing that was meaningfully distinct from other things called optimisers, but the Alpha Go example has scuttled that idea.
> I don’t even know why the RFLO paper put that criterion in …
I don’t have any great insight here, but that’s very interesting to think about.
I thought about it a bit more and I think I know what they were doing. I bet they were trying to preempt the pedantic point (related) that everything is an optimization process if you allow the objective function to be arbitrarily convoluted and post hoc. E.g. any trained model M is the global maximum of the objective function “F where F(x)=1 if x is the exact model M, and F(x)=0 in all other cases”. So if you’re not careful, you can define “optimization process” in a way that also includes rocks.
I think they used “explicitly represented objective function” as a straightforward case that would be adequate to most applications, but if they had wanted to they could have replaced it with the slightly-more-general notion of “objective function that can be deduced relatively straightforwardly by inspecting the nuts-and-bolts of the optimization process, and in particular it shouldn’t be a post hoc thing where you have to simulate the entire process of running the (so-called) optimization algorithm in order to answer the question of what the objective is.”
I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient.
Oh sorry, that’s not what I meant. For example (see here) the Python code y *= 1.5 - x * y * y / 2 happens to be one iteration of Newton’s method to make y a better approximation to 1/√x. So if you keep running this line of code over and over, you’re straightforwardly running an optimization algorithm that finds the y that minimizes the objective function |x – 1/y²|. But I don’t see “|x – 1/y²|” written or calculated anywhere in that one line of source code. The source code skipped the objective and went straight to the update step.
I have a vague notion that I’ve seen a more direct example kinda like this in the RL literature. Umm, maybe it was the policy gradient formula used in some version of policy-gradient RL? I recall that (this version of) the policy gradient formula was something involving logarithms, and I was confused for quite a while where this formula came from, until eventually I found an explanation online where someone started with a straightforward intuitive objective function and did some math magic and wound up deriving that policy gradient formula with the logarithms. But the policy gradient formula (= update step) is all you need to actually run that RL process in practice. The actual objective function need not be written or calculated anywhere in the RL source code. (I could be misremembering. This was years ago.)
This was a great reply. In responding to it my confidence in my arguments declined substantially.
I’m going to make what I think is a very cruxy high level clarification and then address individual points.
High Level Clarification
My original post has clearly done a poor job at explaining why I think the mismatch between the optimisation definition given in RFLO and evolution matters. I think clarifying my position will address the bulk of your concerns.
“I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system?
[...]
Yet again, I don’t see why this matters for anything.”
I believe you have interpreted the high level motivation behind my post to be something along the lines of “evolution doesn’t fit this definition of optimisation, and therefore this should be a reason to doubt the conclusions of Nate, Eliezer or anyone else invoking evolution.”
This is a completely fair reading of my original post, but it wasn’t my intended message.
I’m concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI’s. I am very interested in the broad category of “deconfusion”. It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.
I’m going to immediately edit my original post to make this more clear thanks to your feedback!
Detailed Responses to Individual Points
“And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy”
I agree. I mentioned Nate’s evolution analogy because I think it wasn’t needed to make the point and lead to confusion. I don’t think the properties of evolution I’ve mentioned can be used to argue against the Sharp Left Turn.
“If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either.”
Keeping in mind the “deconfusion” lens that motivated my original post, I don’t think these distinctions point to any flaws in the definition of optimisation given in RFLO, in the same way that evolution failing to satisfy the criteria of having an internally represented objective does.
”I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated.”
I don’t have any great insight here, but that’s very interesting to think about. I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient. This seems broadly similar to the relationship between a search algorithm and an implementation of a the same algorithm that is simply a gigantic pre-computed lookup table.
“Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)”
Honestly, I think this example has caused me to lose substantial confidence in my original argument.
Clearly,the AlphaZero training process should fit under any reasonable definition of optimisation and as you point out there is no reason fundamental reason a similar training process on a variant game couldn’t get stuck in a loop.
The only distinction I can think of is that the definition of “checkmate” is essentially a function of board state and that function is internally represented in the system as a set of conditions. This means you can point to an internal representation and alter it by explicitly changing certain bits.
In contrast, evolution is stuck optimising for genes which are good at (directly or indirectly) getting passed on.
I guess the equivalent of changing the checkmate rules would be changing the environment to tweak which organisms tend to evolve. But the environment doesn’t provide an explicit representation.
To conclude
I’m fairly confident “explicit internal representation” part of the optimisation definition in RFLO needs tweaking.
I had previously been tossing around the idea that evolution was sort of it’s own thing that was meaningfully distinct from other things called optimisers, but the Alpha Go example has scuttled that idea.
Thanks for your reply! A couple quick things:
I thought about it a bit more and I think I know what they were doing. I bet they were trying to preempt the pedantic point (related) that everything is an optimization process if you allow the objective function to be arbitrarily convoluted and post hoc. E.g. any trained model M is the global maximum of the objective function “F where F(x)=1 if x is the exact model M, and F(x)=0 in all other cases”. So if you’re not careful, you can define “optimization process” in a way that also includes rocks.
I think they used “explicitly represented objective function” as a straightforward case that would be adequate to most applications, but if they had wanted to they could have replaced it with the slightly-more-general notion of “objective function that can be deduced relatively straightforwardly by inspecting the nuts-and-bolts of the optimization process, and in particular it shouldn’t be a post hoc thing where you have to simulate the entire process of running the (so-called) optimization algorithm in order to answer the question of what the objective is.”
Oh sorry, that’s not what I meant. For example (see here) the Python code
y *= 1.5 - x * y * y / 2
happens to be one iteration of Newton’s method to make y a better approximation to 1/√x. So if you keep running this line of code over and over, you’re straightforwardly running an optimization algorithm that finds the y that minimizes the objective function |x – 1/y²|. But I don’t see “|x – 1/y²|” written or calculated anywhere in that one line of source code. The source code skipped the objective and went straight to the update step.I have a vague notion that I’ve seen a more direct example kinda like this in the RL literature. Umm, maybe it was the policy gradient formula used in some version of policy-gradient RL? I recall that (this version of) the policy gradient formula was something involving logarithms, and I was confused for quite a while where this formula came from, until eventually I found an explanation online where someone started with a straightforward intuitive objective function and did some math magic and wound up deriving that policy gradient formula with the logarithms. But the policy gradient formula (= update step) is all you need to actually run that RL process in practice. The actual objective function need not be written or calculated anywhere in the RL source code. (I could be misremembering. This was years ago.)