Nod. And does it seem to have the ability to gain new cognitive skills? Like, if it reads a bunch of LessWrong posts or attends CFAR, does it’s ‘memory’ start to include things that prompt it to, say, “stop and notice it’s confused” and “form more hypotheses when facing weird phenomena” and “cultivate curiosity about it’s own internal structure.”
(I assume so, just doublechecking)
In that case, it seems like the most obvious ways to keep it friendly are the same way you make a human friendly (expose it to ideas you think will guide it on a useful moral trajectory).
I’m not actually sure what sort of other actions you’re allowing in the hypothetical.
Nod. And does it seem to have the ability to gain new cognitive skills? Like, if it reads a bunch of LessWrong posts or attends CFAR, does it’s ‘memory’ start to include things that prompt it to, say, “stop and notice it’s confused” and “form more hypotheses when facing weird phenomena” and “cultivate curiosity about it’s own internal structure.”
(I assume so, just doublechecking)
In that case, it seems like the most obvious ways to keep it friendly are the same way you make a human friendly (expose it to ideas you think will guide it on a useful moral trajectory).
I’m not actually sure what sort of other actions you’re allowing in the hypothetical.