Three recent downward updates for me on alignment getting solved in time:
Thinking for hours about AI strategy made me internalize that communication difficulties are real serious.
I’m not just solving technical problems—I’m also solving interpersonal problems, communication problems, incentive problems. Even if my current hot takes around shard theory / outer/inner alignment are right, and even if I put up a LW post which finally successfully communicates some of my key points, reality totally allows OpenAI to just train an AGI the next month without incorporating any insights which my friends nodded along with.
I’ve been saying “A smart AI knows about value drift and will roughly prevent it”, but people totally have trouble with e.g. resisting temptation into cheating on their diets / quitting addictions. Literally I have had trouble with value drift-y things recently, even after explicitly acknowledging their nature. Likewise, an AI can be aligned and still be “tempted” by the decision influences of shards which aren’t in the aligned shard coalition.
Three recent downward updates for me on alignment getting solved in time:
Thinking for hours about AI strategy made me internalize that communication difficulties are real serious.
I’m not just solving technical problems—I’m also solving interpersonal problems, communication problems, incentive problems. Even if my current hot takes around shard theory / outer/inner alignment are right, and even if I put up a LW post which finally successfully communicates some of my key points, reality totally allows OpenAI to just train an AGI the next month without incorporating any insights which my friends nodded along with.
I’ve been saying “A smart AI knows about value drift and will roughly prevent it”, but people totally have trouble with e.g. resisting temptation into cheating on their diets / quitting addictions. Literally I have had trouble with value drift-y things recently, even after explicitly acknowledging their nature. Likewise, an AI can be aligned and still be “tempted” by the decision influences of shards which aren’t in the aligned shard coalition.
Timelines going down.