How can I, a person who is better at introspection than basically anything else, help you with the shard theory project? I actually can explain in detail—at least, the kind of detail accessible to me, which doesn’t include e.g. neuron firing patterns—how I developed some of my values, or I can at least use reliable methods to figure out good hypotheses on the matter.
I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.
I joined the discord just a few hours ago, in fact! Hopefully I’ll be of some use. (And I’ve read the doc before, but probably should reread it every so often.)
How can I, a person who is better at introspection than basically anything else, help you with the shard theory project? I actually can explain in detail—at least, the kind of detail accessible to me, which doesn’t include e.g. neuron firing patterns—how I developed some of my values, or I can at least use reliable methods to figure out good hypotheses on the matter.
I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.
I joined the discord just a few hours ago, in fact! Hopefully I’ll be of some use. (And I’ve read the doc before, but probably should reread it every so often.)