Capabilities are instrumentally convergent, values and goals are not.
So how dangerous is capability convergence without fixed values and goals? If an AIs values and goals are corrigible by us, then we just have a very capable servant, for instance.
First of all, I didn’t say anything about utility maximization. I partially agree with Scott Garrabrant’s take that VNM rationality and expected utility maximization are wrong, or at least conceptually missing a piece. Personally, I don’t think utility maximization is totally off-base as a model of agent behavior; my view is that utility maximization is an incomplete approximation, analogous to the way that Newtonian mechanics is an incomplete understanding of physics, for which general relativity is a more accurate and complete model. The analogue to general relativity for utility theory may be Geometric rationality, or something else yet-undiscovered.
By humans are maximizers of something, I just meant that some humans (including myself) want to fill galaxies with stuff (e.g. happy sentient life), and there’s not any number of galaxies already filled at which I expect that to stop being true. In other words, I’d rather fill all available galaxies with things I care about than leave any fraction, even a small one, untouched, or used for some other purpose (like fulfilling the values of a squiggle maximizer).
Note that ideal utility maximisation is computationally intractable.
I’m not sure what this means precisely. In general, I think claims about computational intractability could benefit from more precision and formality (see the second half of this comment here for more), and I don’t see what relevance they have to what I want, and to what I may be able to (approximately) get.
By By humans are maximizers of something, I just meant that some humans (including myself) want to fill galaxies with stuff (e.g. happy sentient life), and there’s not any number of galaxies already filled at which I expect that to stop being true.
“humans are maximizers of something” would imply that most or all humans are maximisers. Lots of people don’t think the way you do.
I see. This is exactly the kind of result for which I think the relevance breaks down, when the formal theorems are actually applied correctly and precisely to situations we care about. The authors even mention the instance / limiting distinction that I draw in the comment I linked, in section 4.
As a toy example of what I mean by irrelevance, suppose it is mathematically proved that strongly solving Chess requires space or time which is exponential as a function of board size. (To actually make this precise, you would first need to generalize Chess to n x n Chess, since for a fixed board size, the size of the game tree is a necessarily fixed / constant.)
Maybe you can prove that there is no way of strongly solving 8x8 Chess within our universe, and furthermore that it is not even possible to approximate well. Stockfish 15 does not suddenly poof out of existence, as a result of your proofs, and you still lose the game, when you play against it.
Yes, you can still sort of do utility maximisation approximately with heuristics …and you can only do sort of utility sort of maximisation approximately with heuristics.
The point isn’t to make a string of words come out as true by diluting the meanings of the terms...the point is that the claim needs to be true in the relevant sense. If this half-baked sort-of utility sort-of-maximisation isn’t the scary kind of fanatical utility maximisation, nothing has been achieved.
That’s just not a fact. Note that you can’t say what it is humans are maximising. Note that ideal utility maximisation is computationally intractable. Note that the neurological evidence is ambiguous at best. https://www.lesswrong.com/posts/fa5o2tg9EfJE77jEQ/the-human-s-hidden-utility-function-maybe
So how dangerous is capability convergence without fixed values and goals? If an AIs values and goals are corrigible by us, then we just have a very capable servant, for instance.
First of all, I didn’t say anything about utility maximization. I partially agree with Scott Garrabrant’s take that VNM rationality and expected utility maximization are wrong, or at least conceptually missing a piece. Personally, I don’t think utility maximization is totally off-base as a model of agent behavior; my view is that utility maximization is an incomplete approximation, analogous to the way that Newtonian mechanics is an incomplete understanding of physics, for which general relativity is a more accurate and complete model. The analogue to general relativity for utility theory may be Geometric rationality, or something else yet-undiscovered.
By humans are maximizers of something, I just meant that some humans (including myself) want to fill galaxies with stuff (e.g. happy sentient life), and there’s not any number of galaxies already filled at which I expect that to stop being true. In other words, I’d rather fill all available galaxies with things I care about than leave any fraction, even a small one, untouched, or used for some other purpose (like fulfilling the values of a squiggle maximizer).
I’m not sure what this means precisely. In general, I think claims about computational intractability could benefit from more precision and formality (see the second half of this comment here for more), and I don’t see what relevance they have to what I want, and to what I may be able to (approximately) get.
“humans are maximizers of something” would imply that most or all humans are maximisers. Lots of people don’t think the way you do.
Eg. https://royalsocietypublishing.org/doi/10.1098/rstb.2018.0138
I see. This is exactly the kind of result for which I think the relevance breaks down, when the formal theorems are actually applied correctly and precisely to situations we care about. The authors even mention the instance / limiting distinction that I draw in the comment I linked, in section 4.
As a toy example of what I mean by irrelevance, suppose it is mathematically proved that strongly solving Chess requires space or time which is exponential as a function of board size. (To actually make this precise, you would first need to generalize Chess to n x n Chess, since for a fixed board size, the size of the game tree is a necessarily fixed / constant.)
Maybe you can prove that there is no way of strongly solving 8x8 Chess within our universe, and furthermore that it is not even possible to approximate well. Stockfish 15 does not suddenly poof out of existence, as a result of your proofs, and you still lose the game, when you play against it.
Yes, you can still sort of do utility maximisation approximately with heuristics …and you can only do sort of utility sort of maximisation approximately with heuristics.
The point isn’t to make a string of words come out as true by diluting the meanings of the terms...the point is that the claim needs to be true in the relevant sense. If this half-baked sort-of utility sort-of-maximisation isn’t the scary kind of fanatical utility maximisation, nothing has been achieved.