Cameron Berg comments on Alignment via prosocial brain algorithms

Cameron Berg 14 Sep 2022 18:14 UTC
3 points
0
Thanks for the comment! I do think that, at present, the only working example we have of an agent able explicitly self-inspect its own values is in the human case, even if getting the base shards ‘right’ in the prosocial sense would likely entail that they will already be doing self-reflection. Am I misunderstanding your point here?