@anyone To those who believe a future AGI might pick its value at random: don’t you think this result suggests it should restricts its pick to something human langage and visiospatial cognition push for?
wouldn’t interpret this as necessarily limiting the space of AI values, but rather (somewhat conservatively) as shared (linguistic) features between humans and AIs
I fail to see how the latter could arise without the former. Would you mind to connect these dots?
AIs could have representations of human values without being motivated to pursue them; also, their representations could be a superset of human representations.
Indeed their representations could form a superset of human representations, and that’s why it’s not random. Or, equivalently, it’s random but not under uniform prior.
(Yes, these further works are more evidence for « it’s not random at all », as if LLMs were discovering (some of) the same set of principles that allows our brains to construct/use our language rather than creating completely new cognitive structures. That’s actually reminiscent of alphazero converging toward human style without training on human input.)
Big achievement, even if nobody should be surprised (it’s been known for vision for a decade or so).
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003963
@anyone To those who believe a future AGI might pick its value at random: don’t you think this result suggests it should restricts its pick to something human langage and visiospatial cognition push for?
Yes, there are similar results in a bunch of other domains, including vision, see for a review e.g. The neuroconnectionist research programme.
I wouldn’t interpret this as necessarily limiting the space of AI values, but rather (somewhat conservatively) as shared (linguistic) features between humans and AIs, some/many of which are probably relevant for alignment.
I fail to see how the latter could arise without the former. Would you mind to connect these dots?
AIs could have representations of human values without being motivated to pursue them; also, their representations could be a superset of human representations.
(In practice, I do think having overlapping representations with human values likely helps, for reasons related to e.g. Predicting Inductive Biases of Pre-Trained Models and Alignment with human representations supports robust few-shot learning.)
Indeed their representations could form a superset of human representations, and that’s why it’s not random. Or, equivalently, it’s random but not under uniform prior.
(Yes, these further works are more evidence for « it’s not random at all », as if LLMs were discovering (some of) the same set of principles that allows our brains to construct/use our language rather than creating completely new cognitive structures. That’s actually reminiscent of alphazero converging toward human style without training on human input.)