wouldn’t interpret this as necessarily limiting the space of AI values, but rather (somewhat conservatively) as shared (linguistic) features between humans and AIs
I fail to see how the latter could arise without the former. Would you mind to connect these dots?
AIs could have representations of human values without being motivated to pursue them; also, their representations could be a superset of human representations.
Indeed their representations could form a superset of human representations, and that’s why it’s not random. Or, equivalently, it’s random but not under uniform prior.
(Yes, these further works are more evidence for « it’s not random at all », as if LLMs were discovering (some of) the same set of principles that allows our brains to construct/use our language rather than creating completely new cognitive structures. That’s actually reminiscent of alphazero converging toward human style without training on human input.)
I fail to see how the latter could arise without the former. Would you mind to connect these dots?
AIs could have representations of human values without being motivated to pursue them; also, their representations could be a superset of human representations.
(In practice, I do think having overlapping representations with human values likely helps, for reasons related to e.g. Predicting Inductive Biases of Pre-Trained Models and Alignment with human representations supports robust few-shot learning.)
Indeed their representations could form a superset of human representations, and that’s why it’s not random. Or, equivalently, it’s random but not under uniform prior.
(Yes, these further works are more evidence for « it’s not random at all », as if LLMs were discovering (some of) the same set of principles that allows our brains to construct/use our language rather than creating completely new cognitive structures. That’s actually reminiscent of alphazero converging toward human style without training on human input.)