Nathan Helm-Burger comments on johnswentworth’s Shortform

Nathan Helm-Burger Jan 10, 2025, 7:59 PM
9 points
1
This is an interesting point. I disagree that scheming vs these ideas you mention is much of a ‘streetlighting’ case. I do, however, have my own fears that ‘streetlighting’ is occurring and causing some hard-but-critical avenues of risk to be relatively neglected.

[Edit: on further thought, I think this might not just be a “streetlighting”effect, but also a “keeping my hands clean” effect. I think it’s more tempting, especially for companies, to focus on harms that could plausibly be construed as being their fault. It’s my impression that, for instance, employees of a given company might spend a disproportionate amount of time thinking about how to keep their company’s product from harming people vs the general class of products from harming people. Also, less inclined to think about harm which could be averted via application of their product. This is additional reason for concern that having the bulk of AI safety work being funded by / done in AI companies will lead to correlated oversights.]

My concerns that I think are relatively neglected in AI safety discourse are mostly related to interactions with incompetent or evil humans. Good alignment and control techniques don’t do any good if someone opts not to use them in some critical juncture.

Some potential scenarios:
- If AI is very powerful, and held in check tenuously by fragile control systems, it might be released from control by a single misguided human or some unlucky chain of events, and then go rogue.
- If algorithmic progress goes surprisingly quickly, we might find ourselves in a regime where a catastrophically dangerous AI can be assembled from some mix of pre-existing open-weights models, plus fine-tuning, plus new models trained with new algorithms, and probably all stitched together with hacky agent frameworks. Then all it would take would be for sufficient hints about this algorithmic discovery to leak, and someone in the world to reverse-engineer it, and then there would be potent rogue AI all over the internet all of a sudden.
- If the AI is purely intent-aligned, a bad human might use it to pursue broad coercive power.
- Narrow technical AI might unlock increasingly powerful and highly offense-dominant technology with lower and lower activation costs (easy to build and launch with common materials). Even if the AI itself never got out of hand, if the dangerous tech secrets got leaked (or controlled by an aggressive government) then things could go very poorly for the world.