jessicata comments on We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap

jessicata 20 Sep 2024 0:10 UTC
9 points
4
I discussed something similar in the “Human brains don’t seem to neatly factorize” section of the Obliqueness post. I think this implies that, even assuming the Orthogonality Thesis, humans don’t have values that are orthogonal to human intelligence (they’d need to not respond to learning/reflection to be orthogonal in this fashion), so there’s not a straightforward way to align ASI with human values by plugging in human values to more intelligence.