I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.
I joined the discord just a few hours ago, in fact! Hopefully I’ll be of some use. (And I’ve read the doc before, but probably should reread it every so often.)
I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.
I joined the discord just a few hours ago, in fact! Hopefully I’ll be of some use. (And I’ve read the doc before, but probably should reread it every so often.)