the important thing to realise is that ‘human values’ do not really come from inner misalignment wrt our innate reward circuitry but rather are the result of a very long process of social construction influenced both by our innate drives but also by the game-theoretic social considerations needed to create and maintain large social groups, and that these value constructs have been distilled into webs of linguistic associations learnt through unsupervised text-prediction-like objectives which is how we practically interact with our values.
Most human value learning occurs through this linguistic learning grounded by our innate drives but extended to much higher abstractions by language.i.e. for humans we learn our values as some combination of bottom-up (how well do our internal reward evaluators in basal ganglia/hypothalamus) accord with the top-down socially constructed values) as well as top-down association of abstract value concepts with other more grounded linguistic concepts.
Can you give me some examples here? I don’t know that I follow what you’re pointing at.
Can you give me some examples here? I don’t know that I follow what you’re pointing at.