It seems like a solid symbol grounding solution would allow us to delegate some amount of “translate vague intuitions about alignment into actual policies”. In particular, there seems to be a correspondence between CIRL and symbol grounding—systems aware they do not know the goal they should optimize are similar to symbol-grounding machines aware there is a difference between the literal content of instructions and the desired behavior the instructions represent (although the instructions might be even more abstract symbols than words).
Is there any literature you’re aware of that would propose a seemingly robust alignment solution in a world where we have solved symbol grounding? e.g. Yudkowsky suggests Coherent Extrapolated Volition, and has a sentence or so in English that he proposes, but because machines cannot execute English it’s not clear this was meant literally, or more as a vague gesture at important properties solutions might have.
It seems like a solid symbol grounding solution would allow us to delegate some amount of “translate vague intuitions about alignment into actual policies”. In particular, there seems to be a correspondence between CIRL and symbol grounding—systems aware they do not know the goal they should optimize are similar to symbol-grounding machines aware there is a difference between the literal content of instructions and the desired behavior the instructions represent (although the instructions might be even more abstract symbols than words).
Is there any literature you’re aware of that would propose a seemingly robust alignment solution in a world where we have solved symbol grounding? e.g. Yudkowsky suggests Coherent Extrapolated Volition, and has a sentence or so in English that he proposes, but because machines cannot execute English it’s not clear this was meant literally, or more as a vague gesture at important properties solutions might have.
The similarity between value extrapolation and symbol grounding (similar to how you stated it) is why I suspect that solving one may solve the other.