MSRayne comments on Humans provide an untapped wealth of evidence about alignment

MSRayne 14 Jul 2022 15:21 UTC
3 points
0
How can I, a person who is better at introspection than basically anything else, help you with the shard theory project? I actually can explain in detail—at least, the kind of detail accessible to me, which doesn’t include e.g. neuron firing patterns—how I developed some of my values, or I can at least use reliable methods to figure out good hypotheses on the matter.
- Ulisse Mini 14 Jul 2022 18:20 UTC
  3 points
  0
  Parent
  I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.
  - MSRayne 14 Jul 2022 23:04 UTC
    1 point
    0
    Parent
    I joined the discord just a few hours ago, in fact! Hopefully I’ll be of some use. (And I’ve read the doc before, but probably should reread it every so often.)