Maybe I’m just overestimating the extent to which it’s obvious that “deliberately try to maximize the value of a nebulous metric based on imperfect sensory data and very limited world modeling ability in an adversarial setting” would not be something humans were selected for in the ancestral environment.
Also, it sounds like you think that the behavior “deliberately try to maximize some particular value as a terminal goal” is likely to be a strategy that emerges from a selectively shaped AI. Can you expand on the mechanism by which you expect that to happen (particularly the mechanism by which “install this as a terminal goal” will be reinforced by the training process / selected for by the selection process).
nonetheless, i think the analogy is still suggestive that an AI selectively shaped for whatever might end up deliberately maximizing something else
Maybe I’m just overestimating the extent to which it’s obvious that “deliberately try to maximize the value of a nebulous metric based on imperfect sensory data and very limited world modeling ability in an adversarial setting” would not be something humans were selected for in the ancestral environment.
Also, it sounds like you think that the behavior “deliberately try to maximize some particular value as a terminal goal” is likely to be a strategy that emerges from a selectively shaped AI. Can you expand on the mechanism by which you expect that to happen (particularly the mechanism by which “install this as a terminal goal” will be reinforced by the training process / selected for by the selection process).