The point of that section is that “goals” are not ontologically fundamental entities with precise contents, in fact they could not possibly be so given a naturalistic worldview. So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
The modern world is not a relevant scenario for evolution. “Evolution” did not need to, was not “intending to,” and could not have designed human brains so that they would do high inclusive genetic fitness stuff even when the environment wildly dramatically changes and culture becomes completely different from the ancestral environment.
So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
Your original phrase was “all scenarios they are likely to encounter”, but now you’ve switched to “relevant scenarios”. Do you not acknowledge that these two phrases are semantically very different (or likely to be interpreted very differently by many readers), since the modern world is arguably a scenario that “they are likely to encounter” (given that they actually did encounter it) but you say “the modern world is not a relevant scenario for evolution”?
Going forward, do you prefer to talk about “all scenarios they are likely to encounter”, or “relevant scenarios”, or both? If the latter, please clarify what you mean by “relevant”? (And please answer with respect to both evolution and AI alignment, in case the answer is different in the two cases. I’ll probably have more substantive things to say once we’ve cleared up the linguistic issues.)
No, I don’t think they are semantically very different. This seems like nitpicking. Obviously “they are likely to encounter” has to have some sort of time horizon attached to it, otherwise it would include times well past the heat death of the universe, or something.
It was not at all clear to me that you intended “they are likely to encounter” to have some sort of time horizon attached to it (as opposed to some other kind of restriction, or that you meant something pretty different from the literal meaning, or that your argument/idea itself was wrong), and it’s still not clear to me what sort of time horizon you have in mind.
The point of that section is that “goals” are not ontologically fundamental entities with precise contents, in fact they could not possibly be so given a naturalistic worldview. So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
The modern world is not a relevant scenario for evolution. “Evolution” did not need to, was not “intending to,” and could not have designed human brains so that they would do high inclusive genetic fitness stuff even when the environment wildly dramatically changes and culture becomes completely different from the ancestral environment.
Your original phrase was “all scenarios they are likely to encounter”, but now you’ve switched to “relevant scenarios”. Do you not acknowledge that these two phrases are semantically very different (or likely to be interpreted very differently by many readers), since the modern world is arguably a scenario that “they are likely to encounter” (given that they actually did encounter it) but you say “the modern world is not a relevant scenario for evolution”?
Going forward, do you prefer to talk about “all scenarios they are likely to encounter”, or “relevant scenarios”, or both? If the latter, please clarify what you mean by “relevant”? (And please answer with respect to both evolution and AI alignment, in case the answer is different in the two cases. I’ll probably have more substantive things to say once we’ve cleared up the linguistic issues.)
No, I don’t think they are semantically very different. This seems like nitpicking. Obviously “they are likely to encounter” has to have some sort of time horizon attached to it, otherwise it would include times well past the heat death of the universe, or something.
It was not at all clear to me that you intended “they are likely to encounter” to have some sort of time horizon attached to it (as opposed to some other kind of restriction, or that you meant something pretty different from the literal meaning, or that your argument/idea itself was wrong), and it’s still not clear to me what sort of time horizon you have in mind.
The AI system builders’ time horizon seems to be a reasonable starting point