Hm, thanks for the additional comment, but I mostly think we are using words and frames differently, and disagree with my understanding of what you think values are.
We have an AI-CEO money maximiser, rewarded by the stock price ticker as a reward function. As long as the AI is constrained and weak, it continues to increase the value of the company; when it becomes powerful, it wireheads and takes over the stock price ticker.
Their previous “values” seemed to be a bundle of “have children, enjoy sex” and this has now been wireheaded into “enjoy sex”.
I think this is not what happened. Those desires are likely downstream of past reinforcement of different kinds; I do not think there is a “wireheading” mechanism here. Wireheading is a very specific kind of antecedent-computation-reinforcement chasing behavior, on my ontology.
I’m sceptical that the purely human version of it can extrapolate all the way up to superintelligence.
Not at all what I’m angling at. There’s a mechanistic generator for why humans navigate ontology shifts well (on my view). Learn about the generators, don’t copy the algorithm.
Not at all what I’m angling at. There’s a mechanistic generator for why humans navigate ontology shifts well (on my view). Learn about the generators, don’t copy the algorithm.
I agree that humans navigate “model splinterings” quite well. But I actually think the algorithm might be more important than the generators. The generators comes from evolution and human experience in our actual world; this doesn’t seem like it would generalise. The algorithm itself, though, may very generalisable (potential analogy: humans have instinctive grasp of all numbers under five, due to various evolutionary pressures, but we produced the addition algorithm that is far more generalisable).
I’m not sure that we disagree much. We may just have different emphases and slightly different takes on the same question?
Yes and no. I think most of our disagreements are probably like “what is instinctual?” and “what is the type signature of human values?” etc. And not on “should we understand what people are doing?”.
The generators comes from evolution and human experience in our actual world
By “generators”, I mean “the principles by which the algorithm operates”, which means the generators are found by studying the within-lifetime human learning process.
potential analogy: humans have instinctive grasp of all numbers under five, due to various evolutionary pressures
Dubious to me due to information inaccessibility & random initialization of neocortex (which is a thing I am reasonably confident in). I think it’s more likely that our architecture&compute&learning process makes it convergent to learn this quick ⇐ 5 number-sense.
Hm, thanks for the additional comment, but I mostly think we are using words and frames differently, and disagree with my understanding of what you think values are.
Reward is not the optimization target.
I think this is not what happened. Those desires are likely downstream of past reinforcement of different kinds; I do not think there is a “wireheading” mechanism here. Wireheading is a very specific kind of antecedent-computation-reinforcement chasing behavior, on my ontology.
Not at all what I’m angling at. There’s a mechanistic generator for why humans navigate ontology shifts well (on my view). Learn about the generators, don’t copy the algorithm.
I agree that humans navigate “model splinterings” quite well. But I actually think the algorithm might be more important than the generators. The generators comes from evolution and human experience in our actual world; this doesn’t seem like it would generalise. The algorithm itself, though, may very generalisable (potential analogy: humans have instinctive grasp of all numbers under five, due to various evolutionary pressures, but we produced the addition algorithm that is far more generalisable).
I’m not sure that we disagree much. We may just have different emphases and slightly different takes on the same question?
Yes and no. I think most of our disagreements are probably like “what is instinctual?” and “what is the type signature of human values?” etc. And not on “should we understand what people are doing?”.
By “generators”, I mean “the principles by which the algorithm operates”, which means the generators are found by studying the within-lifetime human learning process.
Dubious to me due to information inaccessibility & random initialization of neocortex (which is a thing I am reasonably confident in). I think it’s more likely that our architecture&compute&learning process makes it convergent to learn this quick ⇐ 5 number-sense.