Seth Herd comments on The Computational Anatomy of Human Values

Seth Herd 7 Apr 2023 2:55 UTC
12 points
8
This is great!
I think your overall take on brain function is highly congruent with mine. I’ve also been working in neuroscience, specifically of the basal ganglia and interactions with the dopamine system and cortex to produce complex human decision-making. I also see your model as highly congruent with Steve Byrnes’ model. The most substantive disagreement in relation to alignment is on how much of our values is determined by the basic reward system, and how much is essentially arbitrary from there. I tend to side with you, but I’m not sure, and I do think that adult human values and behavior is still shaped in important ways by our innate reward signals. But the important question is whether we could do without those, or perhaps with a rough emulation of them, in an AGI that’s loosely brainlike.
My recent post Human preferences as RL critic values—implications for alignment tried to say essentially all of the same things. Frankly, I like yours better.
I’m currently working on a post to be titled something like “we’re likely to get loosely brainlike AGI”, on the theory that most people in alignment don’t care much how the brain does things, because they see ANNs and LLMs in particular to be wildly unlike the brain. I think there’s a vector space of similarities, and I agree with you that AGI research appears to be converging on something with important similarities to brain function. And that this could provide a real advantage in alignment efforts.
- beren 7 Apr 2023 21:40 UTC
  3 points
  1
  Parent
  Thanks for your comment.
  The most substantive disagreement in relation to alignment is on how much of our values is determined by the basic reward system, and how much is essentially arbitrary from there. I tend to side with you, but I’m not sure, and I do think that adult human values and behavior is still shaped in important ways by our innate reward signals. But the important question is whether we could do without those, or perhaps with a rough emulation of them, in an AGI that’s loosely brainlike.
  I am not sure how much we actually disagree here. I definitely agree that our adult behaviours and values are shaped significantly by our innate reward signals. It is a continuum and is clearly not all or nothing. In general, in this post I was mostly trying to emphasise the social and linguistic aspects since I think they are more under appreciated. In general, I also feel that most of the ‘dangerous’ and aspects of humans comes from our evolutionarily innate drives—i.e. status and power-seeking as well as survival etc, and it would be ideal if we don’t encode these into our AI systems if it is not necessary.
  I’m currently working on a post to be titled something like “we’re likely to get loosely brainlike AGI”, on the theory that most people in alignment don’t care much how the brain does things, because they see ANNs and LLMs in particular to be wildly unlike the brain. I think there’s a vector space of similarities, and I agree with you that AGI research appears to be converging on something with important similarities to brain function. And that this could provide a real advantage in alignment efforts.
  I also pretty strongly agree with this take that current ML models are already very brain like and are likely to get more brain like closer to AGI and that this is very helpful for our chances of alignment. Funnily enough, I also have a bunch of draft posts about this