I think there are various “tricks” the genome can use to approximately reference abstractions that are close to sensory ground truth. However, I think these tricks quickly start to fail once you try to use them on deeper abstractions. E.g., I don’t think there’s any way the genome can specify a hard-coded neural algorithm that activates in response to an ontological shift and ensures that values still bind to the new ontology.
Rather, evolution configured learning processes (humans) that are robust to ontological shifts, which consistently acquire values in a way that was adaptive in the ancestral environment, and which have various other alignment properties. This matters because we can look at examples of these learning processes to try and figure out how they work, how their values form, and how their alignment properties emerge.
I think there are various “tricks” the genome can use to approximately reference abstractions that are close to sensory ground truth. However, I think these tricks quickly start to fail once you try to use them on deeper abstractions. E.g., I don’t think there’s any way the genome can specify a hard-coded neural algorithm that activates in response to an ontological shift and ensures that values still bind to the new ontology.
Rather, evolution configured learning processes (humans) that are robust to ontological shifts, which consistently acquire values in a way that was adaptive in the ancestral environment, and which have various other alignment properties. This matters because we can look at examples of these learning processes to try and figure out how they work, how their values form, and how their alignment properties emerge.