As I have said elsewhere, there is an argument that goes like
Evolution optimized humans to be reproductively successful, but despite that humans do not optimize for their own inclusive genetic fitness.
This argument sounds insightful but is actually just wordplay: specifically, it is using “optimize for” to mean two different things.
Let’s name the types of optimization:
Deliberative Maximization: We can make reasonable predictions of what this system will do by assuming that it contains an internal model of the value of world states, the effect of its own actions on world states, and further assuming that it will choose whatever action its internal model says will maximize the value of the resulting world state. Concretely, a value network + MCTS based chess engine would fit this definition.
Selective Shaping: This system is the result of an iterative selection process in which certain behaviors resulted in the system being more likely to be selected. As such, we expect that the system will exhibit similar behaviors to those that resulted in it being selected in the past. An example might be m. septendecula cicadas, which breed every 17 years, because there are enough cicadas that come out every 17 years that predators get too full to eat them all, and so a cicada that comes out on the 17 year cycle is likely to survive and breed, while one that comes out at 16 or 18 years is likely to be eaten. Evolution is “optimizing for” cicadas that hatch every 17 years, but the individual cicadas aren’t “optimizing for” much of anything.
Substitute “optimize” with the more specific term, and we get
Evolution selectively shaped humans to be reproductively successful, but despite that humans do not deliberatively maximize their own inclusive genetic fitness.
Maybe I’m just overestimating the extent to which it’s obvious that “deliberately try to maximize the value of a nebulous metric based on imperfect sensory data and very limited world modeling ability in an adversarial setting” would not be something humans were selected for in the ancestral environment.
Also, it sounds like you think that the behavior “deliberately try to maximize some particular value as a terminal goal” is likely to be a strategy that emerges from a selectively shaped AI. Can you expand on the mechanism by which you expect that to happen (particularly the mechanism by which “install this as a terminal goal” will be reinforced by the training process / selected for by the selection process).
I think there are kernels of truth in what you wrote here, but I also think the original statement at the top can be fleshed out / tweaked / corrected into something that’s defensible.
You opened with “this sentence is wrong in three ways”. But then in part III, you don’t explain why that part of the sentence is incorrect as written; on the contrary, you seem to agree with the authors that it is not only true that humans don’t explicitly pursue IGF, but really obviously true. So, the original sentence is not in fact wrong in three ways, but only two ways, right? (Maybe you’ll say that you were objecting to the word “surprisingly”, in which case I think you’re misunderstanding how the authors were using that word in context. More likely, I’m guessing that your belief is “this part is obviously true but it doesn’t imply what the authors think it implies”, in which case, I think you should have said that.)
Anyway, again, this is an interesting topic of discussion, but given the level of snark and sloppiness on display here, I sure don’t want to have that discussion with you-in-particular. This comment thread is kinda related.
As I have said elsewhere, there is an argument that goes like
This argument sounds insightful but is actually just wordplay: specifically, it is using “optimize for” to mean two different things.
Let’s name the types of optimization:
Deliberative Maximization: We can make reasonable predictions of what this system will do by assuming that it contains an internal model of the value of world states, the effect of its own actions on world states, and further assuming that it will choose whatever action its internal model says will maximize the value of the resulting world state. Concretely, a value network + MCTS based chess engine would fit this definition.
Selective Shaping: This system is the result of an iterative selection process in which certain behaviors resulted in the system being more likely to be selected. As such, we expect that the system will exhibit similar behaviors to those that resulted in it being selected in the past. An example might be m. septendecula cicadas, which breed every 17 years, because there are enough cicadas that come out every 17 years that predators get too full to eat them all, and so a cicada that comes out on the 17 year cycle is likely to survive and breed, while one that comes out at 16 or 18 years is likely to be eaten. Evolution is “optimizing for” cicadas that hatch every 17 years, but the individual cicadas aren’t “optimizing for” much of anything.
Substitute “optimize” with the more specific term, and we get
Phrased that way it seems a lot less surprising.
nonetheless, i think the analogy is still suggestive that an AI selectively shaped for whatever might end up deliberately maximizing something else
Maybe I’m just overestimating the extent to which it’s obvious that “deliberately try to maximize the value of a nebulous metric based on imperfect sensory data and very limited world modeling ability in an adversarial setting” would not be something humans were selected for in the ancestral environment.
Also, it sounds like you think that the behavior “deliberately try to maximize some particular value as a terminal goal” is likely to be a strategy that emerges from a selectively shaped AI. Can you expand on the mechanism by which you expect that to happen (particularly the mechanism by which “install this as a terminal goal” will be reinforced by the training process / selected for by the selection process).
I think there are kernels of truth in what you wrote here, but I also think the original statement at the top can be fleshed out / tweaked / corrected into something that’s defensible.
You opened with “this sentence is wrong in three ways”. But then in part III, you don’t explain why that part of the sentence is incorrect as written; on the contrary, you seem to agree with the authors that it is not only true that humans don’t explicitly pursue IGF, but really obviously true. So, the original sentence is not in fact wrong in three ways, but only two ways, right? (Maybe you’ll say that you were objecting to the word “surprisingly”, in which case I think you’re misunderstanding how the authors were using that word in context. More likely, I’m guessing that your belief is “this part is obviously true but it doesn’t imply what the authors think it implies”, in which case, I think you should have said that.)
Anyway, again, this is an interesting topic of discussion, but given the level of snark and sloppiness on display here, I sure don’t want to have that discussion with you-in-particular. This comment thread is kinda related.