I mean, I agree that the scenario is about adversarial action, but it’s not adversarial action by enemy humans—or even enemy AIs—it’s adversarial action by misaligned (specifically deceptive) mesa-optimizers pursuing convergent instrumental goals.
Can you say more about the distinction between enemy AIs and misaligned mesa-optmizers? I feel like I don’t have a concrete grasp of what the difference would look like in, say, an AI system in charge of a company.
I could imagine “enemy action” making sense as a label if the thing you’re worried about is enemy humans deploying misaligned AI, but that’s very much not what Paul is worried about in the original post. Rather, Paul is concerned about us accidentally training AIs which are misaligned and thus pursue convergent instrumental goals like resource and power acquisition that result in existential risk.
Furthermore, they’re also not “enemy AIs” in the sense that “the AI doesn’t hate you”—it’s just misaligned and you’re in its way—and so even if you specify something like “enemy AI action” that still seems to me to conjure up a pretty inaccurate picture. I think something like “influence-seeking AIs”—which is precisely the term that Paul uses in the original post—is much more accurate.
I think I understand why you think the term is misleading, though I still think it’s helpfully concrete and not inaccurate. I have a bunch of work to get back to, not planning to follow up on this more right now. Welcome to ping me via PM if you’d like me to follow up another day.
I mean, I agree that the scenario is about adversarial action, but it’s not adversarial action by enemy humans—or even enemy AIs—it’s adversarial action by misaligned (specifically deceptive) mesa-optimizers pursuing convergent instrumental goals.
Can you say more about the distinction between enemy AIs and misaligned mesa-optmizers? I feel like I don’t have a concrete grasp of what the difference would look like in, say, an AI system in charge of a company.
I could imagine “enemy action” making sense as a label if the thing you’re worried about is enemy humans deploying misaligned AI, but that’s very much not what Paul is worried about in the original post. Rather, Paul is concerned about us accidentally training AIs which are misaligned and thus pursue convergent instrumental goals like resource and power acquisition that result in existential risk.
Furthermore, they’re also not “enemy AIs” in the sense that “the AI doesn’t hate you”—it’s just misaligned and you’re in its way—and so even if you specify something like “enemy AI action” that still seems to me to conjure up a pretty inaccurate picture. I think something like “influence-seeking AIs”—which is precisely the term that Paul uses in the original post—is much more accurate.
I thought about it a bit more and changed my mind, it’s very confusing. I’ll make an edit later, maybe today.
I think I understand why you think the term is misleading, though I still think it’s helpfully concrete and not inaccurate. I have a bunch of work to get back to, not planning to follow up on this more right now. Welcome to ping me via PM if you’d like me to follow up another day.