hmm. i think you’re missing eliezer’s point. the idea was never that AI would be unable to identify actions which humans consider good, but that the AI would not have any particular preference to take those actions.
But my point isn’t just that the AI is able to produce similar ratings to humans’ for aesthetics, etc., but that it also seems to do so through at least partially overlapping computational mechanisms to humans’, as the comparisons to fMRI data suggest.
Agree that it doesn’t imply caring for. But I think given cumulating evidence for human-like representations of multiple non-motivational components of affect, one should also update at least a bit on the likelihood of finding / incentivizing human-like representations of the motivational component(s) too (see e.g. https://en.wikipedia.org/wiki/Affect_(psychology)#Motivational_intensity_and_cognitive_scope).
hmm. i think you’re missing eliezer’s point. the idea was never that AI would be unable to identify actions which humans consider good, but that the AI would not have any particular preference to take those actions.
But my point isn’t just that the AI is able to produce similar ratings to humans’ for aesthetics, etc., but that it also seems to do so through at least partially overlapping computational mechanisms to humans’, as the comparisons to fMRI data suggest.
I don’t think having a beauty-detector that works the same way humans’ beauty-detectors do implies that you care about beauty?
Agree that it doesn’t imply caring for. But I think given cumulating evidence for human-like representations of multiple non-motivational components of affect, one should also update at least a bit on the likelihood of finding / incentivizing human-like representations of the motivational component(s) too (see e.g. https://en.wikipedia.org/wiki/Affect_(psychology)#Motivational_intensity_and_cognitive_scope).