Ulisse Mini comments on Humans provide an untapped wealth of evidence about alignment

Ulisse Mini 14 Jul 2022 18:20 UTC
3 points
0
I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.
- MSRayne 14 Jul 2022 23:04 UTC
  1 point
  0
  Parent
  I joined the discord just a few hours ago, in fact! Hopefully I’ll be of some use. (And I’ve read the doc before, but probably should reread it every so often.)