In reality, the problem faced by evolution and by SGD is much easier than this: producing systems that behave the right way in all scenarios they are likely to encounter. In virtue of their aligned behavior, these systems will be “aimed at the right things” in every sense that matters in practice.
I find this passage remarkable, given that so many people are choosing to to have few or no children that fertility has fallen to 0.78 in Korea and 1.0 in China. Presumably you’re aware of these (or similar) facts and intended the meaning of this passage to be compatible with them, but I’m having trouble figuring out how...
By contrast, goal realism leads only to unfalsifiable speculation about an “inner actress” with utterly alien motivations.
In order for such speculation to be unfalsifiable, it seemingly has to be the case that we’re unable to ever develop good enough interpretability tools to definitively say whether the AI in question has such internal motivations. This could well turn out to be true, but I don’t understand how you’re able to predict this now. (Or maybe you mean something else by “unfalsifiable” but I can’t see what it could be. ETA: Maybe you mean “unfalsifiable with existing methods”?)
On the other hand, with your own proposed alignment method, we have to speculate about what scenarios an AI is likely to encounter. You could say that this is falsifiable (we just have to wait for the future to unfold), but is this actually an advantage?
The point of that section is that “goals” are not ontologically fundamental entities with precise contents, in fact they could not possibly be so given a naturalistic worldview. So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
The modern world is not a relevant scenario for evolution. “Evolution” did not need to, was not “intending to,” and could not have designed human brains so that they would do high inclusive genetic fitness stuff even when the environment wildly dramatically changes and culture becomes completely different from the ancestral environment.
So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
Your original phrase was “all scenarios they are likely to encounter”, but now you’ve switched to “relevant scenarios”. Do you not acknowledge that these two phrases are semantically very different (or likely to be interpreted very differently by many readers), since the modern world is arguably a scenario that “they are likely to encounter” (given that they actually did encounter it) but you say “the modern world is not a relevant scenario for evolution”?
Going forward, do you prefer to talk about “all scenarios they are likely to encounter”, or “relevant scenarios”, or both? If the latter, please clarify what you mean by “relevant”? (And please answer with respect to both evolution and AI alignment, in case the answer is different in the two cases. I’ll probably have more substantive things to say once we’ve cleared up the linguistic issues.)
No, I don’t think they are semantically very different. This seems like nitpicking. Obviously “they are likely to encounter” has to have some sort of time horizon attached to it, otherwise it would include times well past the heat death of the universe, or something.
It was not at all clear to me that you intended “they are likely to encounter” to have some sort of time horizon attached to it (as opposed to some other kind of restriction, or that you meant something pretty different from the literal meaning, or that your argument/idea itself was wrong), and it’s still not clear to me what sort of time horizon you have in mind.
I find this passage remarkable, given that so many people are choosing to to have few or no children that fertility has fallen to 0.78 in Korea and 1.0 in China. Presumably you’re aware of these (or similar) facts and intended the meaning of this passage to be compatible with them, but I’m having trouble figuring out how...
In order for such speculation to be unfalsifiable, it seemingly has to be the case that we’re unable to ever develop good enough interpretability tools to definitively say whether the AI in question has such internal motivations. This could well turn out to be true, but I don’t understand how you’re able to predict this now. (Or maybe you mean something else by “unfalsifiable” but I can’t see what it could be. ETA: Maybe you mean “unfalsifiable with existing methods”?)
On the other hand, with your own proposed alignment method, we have to speculate about what scenarios an AI is likely to encounter. You could say that this is falsifiable (we just have to wait for the future to unfold), but is this actually an advantage?
The point of that section is that “goals” are not ontologically fundamental entities with precise contents, in fact they could not possibly be so given a naturalistic worldview. So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
The modern world is not a relevant scenario for evolution. “Evolution” did not need to, was not “intending to,” and could not have designed human brains so that they would do high inclusive genetic fitness stuff even when the environment wildly dramatically changes and culture becomes completely different from the ancestral environment.
Your original phrase was “all scenarios they are likely to encounter”, but now you’ve switched to “relevant scenarios”. Do you not acknowledge that these two phrases are semantically very different (or likely to be interpreted very differently by many readers), since the modern world is arguably a scenario that “they are likely to encounter” (given that they actually did encounter it) but you say “the modern world is not a relevant scenario for evolution”?
Going forward, do you prefer to talk about “all scenarios they are likely to encounter”, or “relevant scenarios”, or both? If the latter, please clarify what you mean by “relevant”? (And please answer with respect to both evolution and AI alignment, in case the answer is different in the two cases. I’ll probably have more substantive things to say once we’ve cleared up the linguistic issues.)
No, I don’t think they are semantically very different. This seems like nitpicking. Obviously “they are likely to encounter” has to have some sort of time horizon attached to it, otherwise it would include times well past the heat death of the universe, or something.
It was not at all clear to me that you intended “they are likely to encounter” to have some sort of time horizon attached to it (as opposed to some other kind of restriction, or that you meant something pretty different from the literal meaning, or that your argument/idea itself was wrong), and it’s still not clear to me what sort of time horizon you have in mind.
The AI system builders’ time horizon seems to be a reasonable starting point