One can argue that algorithmic & hardware improvements will never ever be enough to put human-genius-level human-speed AGI in the hands of tons of ordinary people e.g. university students with access to a cluster.
Or, one can argue that tons of ordinary people will get such access sooner or later, but meanwhile large institutional actors will have super-duper-AGIs, and they will use them to make the world resilient against merely human-genius-level-chaosGPTs, somehow or other.
Or, one can argue that ordinary people will never be able to do stupid things with human-genius-level AGIs because the government (or an AI singleton) will go around confiscating all the GPUs in the world or monitoring how they’re used with a keylogger and instant remote kill-switch or whatever.
(Also, “just for the lols” is not the only way to get ChaosGPT; another path is “We should do this to better understand and study possible future threats”, but then fail to contain it. Large institutions could plausibly do that. If you disagree—if you’re thinking “nobody would be so stupid as to do that”—note the existence of gain-of-function research, lab leaks, etc. in biology.)
If ordinary people have access to human-genius-level AGIs, then there will be many AGIs at that level (along with some far more powerful above them) and thus these weaker agents almost certainly won’t be dangerous unless a significant fraction are not just specifically misaligned in the most likely failure mode (selfish empowerment), but co-aligned specifically against humanity in their true utility functions (ie terminal rather than just instrumental values). These numerous weak AGI are not much more dangerous to humanity than psycopaths (unaligned to humanity yes, but also crucially unaligned with each other).
EY/MIRI? has a weird argument about AIs naturally coordinating because they can “read each others source code”, but that wouldn’t actually cause true alignment of utility functions, just enable greater cooperation, and regardless is not really compatible with how DL AGI works. There are strong economic/power incentives against sharing source code (open source models lag), it’s also only really useful for deterministic systems and ANNs are increasingly non-deterministic and moving towards BNNs in that regard, and too difficult to verify against various spoofing mechanisms regardless (even if an agent’s source code is completely avail and you have a full deterministic hash chain, difficult to have any surety the actual agent isn’t in some virtual prison with other agent(s) actually in control unless it’s chain amounts to enormous compute ).
I’ll note that a potential disagreement I have with your post on out-of-control AGIs ruining the world is that I actually expect the defense-offense balance to be much less biased towards the attack than you show here, and in particular, I think that to the extent AI improves things, my prior is that it’s symmetric in the improvement, so the offense-defense balance ultimately doesn’t change.
One can argue that algorithmic & hardware improvements will never ever be enough to put human-genius-level human-speed AGI in the hands of tons of ordinary people e.g. university students with access to a cluster.
Or, one can argue that tons of ordinary people will get such access sooner or later, but meanwhile large institutional actors will have super-duper-AGIs, and they will use them to make the world resilient against merely human-genius-level-chaosGPTs, somehow or other.
Or, one can argue that ordinary people will never be able to do stupid things with human-genius-level AGIs because the government (or an AI singleton) will go around confiscating all the GPUs in the world or monitoring how they’re used with a keylogger and instant remote kill-switch or whatever.
As it happens, I’m pretty pessimistic about all of those things, and therefore I do think lols are a legit concern.
(Also, “just for the lols” is not the only way to get ChaosGPT; another path is “We should do this to better understand and study possible future threats”, but then fail to contain it. Large institutions could plausibly do that. If you disagree—if you’re thinking “nobody would be so stupid as to do that”—note the existence of gain-of-function research, lab leaks, etc. in biology.)
If ordinary people have access to human-genius-level AGIs, then there will be many AGIs at that level (along with some far more powerful above them) and thus these weaker agents almost certainly won’t be dangerous unless a significant fraction are not just specifically misaligned in the most likely failure mode (selfish empowerment), but co-aligned specifically against humanity in their true utility functions (ie terminal rather than just instrumental values). These numerous weak AGI are not much more dangerous to humanity than psycopaths (unaligned to humanity yes, but also crucially unaligned with each other).
EY/MIRI? has a weird argument about AIs naturally coordinating because they can “read each others source code”, but that wouldn’t actually cause true alignment of utility functions, just enable greater cooperation, and regardless is not really compatible with how DL AGI works. There are strong economic/power incentives against sharing source code (open source models lag), it’s also only really useful for deterministic systems and ANNs are increasingly non-deterministic and moving towards BNNs in that regard, and too difficult to verify against various spoofing mechanisms regardless (even if an agent’s source code is completely avail and you have a full deterministic hash chain, difficult to have any surety the actual agent isn’t in some virtual prison with other agent(s) actually in control unless it’s chain amounts to enormous compute ).
I’ll note that a potential disagreement I have with your post on out-of-control AGIs ruining the world is that I actually expect the defense-offense balance to be much less biased towards the attack than you show here, and in particular, I think that to the extent AI improves things, my prior is that it’s symmetric in the improvement, so the offense-defense balance ultimately doesn’t change.