Ah, OK. I think I get you now. I agree that whether we know the mechanism doesn’t matter.
Personally, I’m comfortable saying that gravity is an optimization process. The interesting question is what gravity optimizes for. I might conclude, for example, after watching how gravitational fields affect matter, that gravity optimizes for minimizing the distance between sources of mass.
From that it follows that gravity is not a particularly powerful optimization process. I conclude this because I observe many situations where distance between masses is no longer being minimized because gravity only has a very limited way of arranging its environment to achieve that goal. And I suspect that one of the things you’re looking for, in attempting to arrive at a definition that distinguishes gravity from Clippy, is a notion of optimization power similar to this. In other words, it’s possible that your question can be rephrased as “how do we measure optimization power?”
Another possible distinction among optimization processes that we often implicitly talk about here is value-independence. That is, when we talk about AGI, what’s being evoked is a powerful optimization process that can optimize for paperclips, or shoes, or smileyfaces, or satisfied humans, or whatever it happens to value. It’s just as powerful an optimization process either way. Gravity doesn’t seem to have this property. Clippy might or might not.
The general assumption around here is that something as effective as Clippy is using algorithms which are generalizable; I’m not sure I’ve ever seen the idea of a non-generalizable powerful optimization process even discussed here. I suspect this derives in large part from the site’s focus on Bayes Theorem, which is entirely domain-independent, as the core of intelligence/optimization.
This focus is in-principle separable from the site’s focus on optimizing systems, but in practice the two are not explicitly separated during discussion.
But it seems like then every process can be an optimisation process, and when you measure the optimisation power that’s really telling you more about whether the ‘optimisation target’ you selected as your measure is a good fit for the process you’re looking at. It tells you more about your interpretation of the optimisation target than it does about the process itself.
Gravity isn’t very powerful for minimising distance between sources of mass, but it is very powerful for “making mass move in straight lines through curved spacetime”[1]. For any process at all, you just look at “whatever it actually ends up doing”, and then say that was its optimisation target all along, and hey presto, it turns out there are superpowerful optimisation processes everywhere you look, all being hugely successful at making things turn out how (you think) they wanted, provided you think they wanted things the way they actually turned out. If you get to choose your own interpretation of what the optimisation target is, ‘optimisation process’ doesn’t seem like a very useful notion at all.
Also, re: value independence: Evolution seems like a pretty definite candidate for what we want ‘optimisation process’ to mean, but its values seem to be pretty inextricably baked in to the algorithm. You can’t reprogram evolution to start optimising for paperclips, for example. It only optimises for whatever genes are selectively favoured by the environment.
[1] insert a more accurate description of what gravity does here if required.
Yes, I agree that on this account every process is technically speaking an optimization process for some target, and I agree that optimization power can only be measured relative to particular target or class of targets.
That said, when we say that evolution, or a human-level intelligence, is an optimization process we mean something rather more than this: we mean something like that it’s an optimization process with a lot of optimization power across a wide range of targets. (And, sure, I agree that if we hold the environment constant, including other evolved lifeforms, then evolution has a fixed target. If we lived in a universe where such constant environments were common, I suspect we would not be inclined to describe evolution as an optimization process.)
I don’t see how that makes “optimization process” a useless notion. What do you want to use it for?
I really just want to know what Eliezer means by it. It seems to me like I have some notion of an optimisation process, that says “yep, that’s definitely an optimisation process” when I think about evolution and human minds and Clippy, and says “nope, that’s not really an optimisation process—at least, not one worth the name” about water rolling down a hill and thermodynamics. And I think this notion is sufficiently similar to the one that Eliezer is using. But my attempts to formalise this definition, to reduce it, have failed—I can’t find any combination of words that seems to capture the boundary that my intuition is drawing between Clippy and gravity.
Treating the boundary between Clippy and gravity as a quantitative one (optimization power over a wide range of targets, as above) rather than a qualitative one does violence to some of my intuitions as well, but on balance I don’t endorse those intuitions. Gravity is, technically speaking, an optimization process… though, as you say, it’s not worth the name in a colloquial sense.
I find this very unsatisfying, not least because the optimisation power over a wide range of targets is easily gamed just by dividing any given ‘target’ of a process into a whole lot of smaller targets and then saying “look at all these different targets that the process optimised for!”
Claiming that optimisation power is defined simply by a process’s ability to hit some target from a wide range of starting states, and/or has a wide range of targets that it can hit, both seem to be easily gameable by clever sophistry with your choice of how you choose the targets by which you measure its optimisation power. There must be some part of it that separates processes we feel genuinely are good at optimising (like Clippy) from processes that only come out as good at optimising if we select clever targets to measure them by.
We seem to have very different understandings of what constitutes a wide range. A narrow target does not suddenly become a wide range of targets because I choose to subdivide it, any more than I can achieve a diversified stock portfolio by separately investing each dollar into the same company’s stock.
So I’m still pretty comfortable with my original stance here: optimization is as optimization does.
That said, I certainly agree that clever sophistry can blur the meaning of our definitions. This seems like a good reason to eschew clever sophistry when analyzing systems I want to interact effectively with.
And I can appreciate finding it unsatisfying. Sometimes the result of careful thinking about a system is that we discover our initial intuitions were incorrect, rather than discovering a more precise or compelling way to express our initial intuitions.
There must be some part of it that separates processes we feel genuinely are good at optimising (like Clippy)
I’m not really sure what you mean by “part” here. But general-purpose optimizers are more interesting than narrow optimizers, and powerful optimizers are more interesting than less powerful optimizers, and if we want to get at what’s interesting about them we need more and better tools than just the definition of an “optimization process”.
Ah, OK. I think I get you now.
I agree that whether we know the mechanism doesn’t matter.
Personally, I’m comfortable saying that gravity is an optimization process. The interesting question is what gravity optimizes for. I might conclude, for example, after watching how gravitational fields affect matter, that gravity optimizes for minimizing the distance between sources of mass.
From that it follows that gravity is not a particularly powerful optimization process. I conclude this because I observe many situations where distance between masses is no longer being minimized because gravity only has a very limited way of arranging its environment to achieve that goal. And I suspect that one of the things you’re looking for, in attempting to arrive at a definition that distinguishes gravity from Clippy, is a notion of optimization power similar to this. In other words, it’s possible that your question can be rephrased as “how do we measure optimization power?”
Another possible distinction among optimization processes that we often implicitly talk about here is value-independence. That is, when we talk about AGI, what’s being evoked is a powerful optimization process that can optimize for paperclips, or shoes, or smileyfaces, or satisfied humans, or whatever it happens to value. It’s just as powerful an optimization process either way. Gravity doesn’t seem to have this property. Clippy might or might not.
The general assumption around here is that something as effective as Clippy is using algorithms which are generalizable; I’m not sure I’ve ever seen the idea of a non-generalizable powerful optimization process even discussed here. I suspect this derives in large part from the site’s focus on Bayes Theorem, which is entirely domain-independent, as the core of intelligence/optimization.
This focus is in-principle separable from the site’s focus on optimizing systems, but in practice the two are not explicitly separated during discussion.
But it seems like then every process can be an optimisation process, and when you measure the optimisation power that’s really telling you more about whether the ‘optimisation target’ you selected as your measure is a good fit for the process you’re looking at. It tells you more about your interpretation of the optimisation target than it does about the process itself.
Gravity isn’t very powerful for minimising distance between sources of mass, but it is very powerful for “making mass move in straight lines through curved spacetime”[1]. For any process at all, you just look at “whatever it actually ends up doing”, and then say that was its optimisation target all along, and hey presto, it turns out there are superpowerful optimisation processes everywhere you look, all being hugely successful at making things turn out how (you think) they wanted, provided you think they wanted things the way they actually turned out. If you get to choose your own interpretation of what the optimisation target is, ‘optimisation process’ doesn’t seem like a very useful notion at all.
Also, re: value independence: Evolution seems like a pretty definite candidate for what we want ‘optimisation process’ to mean, but its values seem to be pretty inextricably baked in to the algorithm. You can’t reprogram evolution to start optimising for paperclips, for example. It only optimises for whatever genes are selectively favoured by the environment.
[1] insert a more accurate description of what gravity does here if required.
Yes, I agree that on this account every process is technically speaking an optimization process for some target, and I agree that optimization power can only be measured relative to particular target or class of targets.
That said, when we say that evolution, or a human-level intelligence, is an optimization process we mean something rather more than this: we mean something like that it’s an optimization process with a lot of optimization power across a wide range of targets. (And, sure, I agree that if we hold the environment constant, including other evolved lifeforms, then evolution has a fixed target. If we lived in a universe where such constant environments were common, I suspect we would not be inclined to describe evolution as an optimization process.)
I don’t see how that makes “optimization process” a useless notion. What do you want to use it for?
(apologies for delayed reply)
I really just want to know what Eliezer means by it. It seems to me like I have some notion of an optimisation process, that says “yep, that’s definitely an optimisation process” when I think about evolution and human minds and Clippy, and says “nope, that’s not really an optimisation process—at least, not one worth the name” about water rolling down a hill and thermodynamics. And I think this notion is sufficiently similar to the one that Eliezer is using. But my attempts to formalise this definition, to reduce it, have failed—I can’t find any combination of words that seems to capture the boundary that my intuition is drawing between Clippy and gravity.
Treating the boundary between Clippy and gravity as a quantitative one (optimization power over a wide range of targets, as above) rather than a qualitative one does violence to some of my intuitions as well, but on balance I don’t endorse those intuitions. Gravity is, technically speaking, an optimization process… though, as you say, it’s not worth the name in a colloquial sense.
I find this very unsatisfying, not least because the optimisation power over a wide range of targets is easily gamed just by dividing any given ‘target’ of a process into a whole lot of smaller targets and then saying “look at all these different targets that the process optimised for!”
Claiming that optimisation power is defined simply by a process’s ability to hit some target from a wide range of starting states, and/or has a wide range of targets that it can hit, both seem to be easily gameable by clever sophistry with your choice of how you choose the targets by which you measure its optimisation power. There must be some part of it that separates processes we feel genuinely are good at optimising (like Clippy) from processes that only come out as good at optimising if we select clever targets to measure them by.
We seem to have very different understandings of what constitutes a wide range. A narrow target does not suddenly become a wide range of targets because I choose to subdivide it, any more than I can achieve a diversified stock portfolio by separately investing each dollar into the same company’s stock.
So I’m still pretty comfortable with my original stance here: optimization is as optimization does.
That said, I certainly agree that clever sophistry can blur the meaning of our definitions. This seems like a good reason to eschew clever sophistry when analyzing systems I want to interact effectively with.
And I can appreciate finding it unsatisfying. Sometimes the result of careful thinking about a system is that we discover our initial intuitions were incorrect, rather than discovering a more precise or compelling way to express our initial intuitions.
I’m not really sure what you mean by “part” here. But general-purpose optimizers are more interesting than narrow optimizers, and powerful optimizers are more interesting than less powerful optimizers, and if we want to get at what’s interesting about them we need more and better tools than just the definition of an “optimization process”.