It feels like the implicit message here is “And therefore we might coordinate around an alignment solution where all major actors agree to only train NNs that respect certain rules”, which… really doesn’t seem realistic, for a million reasons?
Like, even assuming major powers can agree to an “AI non-proliferation treaty” with specific metrics, individual people could still bypass the treaty with decentralized GPU networks. Rogue countries could buy enough GPUs to train an AGI, disable the verification hardware and go “What are you gonna do, invade us?”, under the assumption that going to war over AI safety is not going to be politically palatable. Companies could technically respect the agreed-upon rules but violate the spirit in ways that can’t be detected by automated hardware. Or they could train a perfectly-aligned AI on compliant hardware, then fine-tune it in non-aligned ways on non-compliant hardware for a fraction of the initial cost.
Anyway, my point is: any analysis of a “restrict all compute everywhere” strategy should start by examining what it actually looks like to implement that strategy, what the political incentives are, and how resistant that strategy will be to everyone on the internet trying to break it.
It feels like the author or this paper haven’t even begun to do that work.
There’s also the infohazard problem since it heavily involves geopolitical considerations, if someone or some group actually figured out a practical means, would they ever reveal it publicly?
Or is it even possible to reveal such a design without undermining the means by which it functions?
It feels like the implicit message here is “And therefore we might coordinate around an alignment solution where all major actors agree to only train NNs that respect certain rules”, which… really doesn’t seem realistic, for a million reasons?
Like, even assuming major powers can agree to an “AI non-proliferation treaty” with specific metrics, individual people could still bypass the treaty with decentralized GPU networks. Rogue countries could buy enough GPUs to train an AGI, disable the verification hardware and go “What are you gonna do, invade us?”, under the assumption that going to war over AI safety is not going to be politically palatable. Companies could technically respect the agreed-upon rules but violate the spirit in ways that can’t be detected by automated hardware. Or they could train a perfectly-aligned AI on compliant hardware, then fine-tune it in non-aligned ways on non-compliant hardware for a fraction of the initial cost.
Anyway, my point is: any analysis of a “restrict all compute everywhere” strategy should start by examining what it actually looks like to implement that strategy, what the political incentives are, and how resistant that strategy will be to everyone on the internet trying to break it.
It feels like the author or this paper haven’t even begun to do that work.
There’s also the infohazard problem since it heavily involves geopolitical considerations, if someone or some group actually figured out a practical means, would they ever reveal it publicly?
Or is it even possible to reveal such a design without undermining the means by which it functions?