Steven Byrnes comments on The shard theory of human values

Steven Byrnes 6 Sep 2022 20:57 UTC
LW: 4 AF: 2
0
AF
I think I sorta disagree in the sense that high-functioning sociopaths live in the same society as neurotypical people, but don’t wind up “aligned”. I think the innate reward function is playing a big role. (And by the way, nobody knows what that innate human reward function is or how it works, according to me.) That said, maybe the innate reward function is insufficient and we also need multi-agent dynamics. I don’t currently know.
I’m sympathetic to your broader point, but until somebody says exactly what the rewards (a.k.a. “reinforcement events”) are, I’m withholding judgment. I’m open to the weaker argument that there are kinda dumb obvious things to try where we don’t have strong reason to believe that they will create friendly AGI, but we also don’t have strong reason to believe that they won’t create friendly AGI. See here. This is a less pessimistic take than Eliezer’s, for example.
- tailcalled 9 Sep 2022 14:23 UTC
  3 points
  0
  Parent
  I agree that you need more than just reinforcement learning.
  
  I’m sympathetic to your broader point, but until somebody says exactly what the rewards (a.k.a. “reinforcement events”) are, I’m withholding judgment.
  
  So in a sense this is what I’m getting at. “This resembles prior ideas which seem flawed; how do you intend on avoiding those flaws?”.