Thanks! How optimistic/excited would you be about research in the spirit of Tamper-Resistant Safeguards for Open-Weight LLMs, especially given that banning open-weight models seems politically unlikely, at least for now?
Extremely excited about the idea of such research succeeding in the near future! But skeptical that it will succeed in time to be at all relevant. So my overall expected value for that direction is low.
Also, I think there’s probably a very real risk that the bird has already flown the coop on this. If you can cheaply modify existing open-weight models to be ‘intent-aligned’ with terrorists, and to be competent at using scaffolding that you have built around ‘biological design tools’… then the LLM isn’t really a bottleneck anymore. The irreversible proliferation has occurred already. I’m not certain this is the case, but I’d give it about 75%.
So then you need to make sure that better biological design tools don’t get released, and that more infohazardous virology papers don’t get published, and that wetlab automation tech doesn’t get better, and… the big one.… that nobody releases an open-weight LLM so capable that it can successfully create tailor-made biological design tools. That’s a harder thing to censor out of an LLM than getting it to directly not help with biological weapons! Creation of biological design tools touches on a lot more things, like its machine learning knowledge and coding skill. What exactly do you censor to make a model helpful at building purely-good tools but not at building tools which have dual-use?
Basically, I think it’s a low-return area entirely. I think humanity’s best bet is in generalized biodefenses, plus an international ‘Council of Guardians’ which use strong tool-AI to monitor the entire world and enforce a ban on:
a) self-replicating weapons (e.g. bioweapons, nanotech)
b) unauthorized recursive-self-improving AI
Of these threats only bioweapons are currently at large. The others are future threats.
Thanks! How optimistic/excited would you be about research in the spirit of Tamper-Resistant Safeguards for Open-Weight LLMs, especially given that banning open-weight models seems politically unlikely, at least for now?
Extremely excited about the idea of such research succeeding in the near future! But skeptical that it will succeed in time to be at all relevant. So my overall expected value for that direction is low.
Also, I think there’s probably a very real risk that the bird has already flown the coop on this. If you can cheaply modify existing open-weight models to be ‘intent-aligned’ with terrorists, and to be competent at using scaffolding that you have built around ‘biological design tools’… then the LLM isn’t really a bottleneck anymore. The irreversible proliferation has occurred already. I’m not certain this is the case, but I’d give it about 75%.
So then you need to make sure that better biological design tools don’t get released, and that more infohazardous virology papers don’t get published, and that wetlab automation tech doesn’t get better, and… the big one.… that nobody releases an open-weight LLM so capable that it can successfully create tailor-made biological design tools. That’s a harder thing to censor out of an LLM than getting it to directly not help with biological weapons! Creation of biological design tools touches on a lot more things, like its machine learning knowledge and coding skill. What exactly do you censor to make a model helpful at building purely-good tools but not at building tools which have dual-use?
Basically, I think it’s a low-return area entirely. I think humanity’s best bet is in generalized biodefenses, plus an international ‘Council of Guardians’ which use strong tool-AI to monitor the entire world and enforce a ban on:
a) self-replicating weapons (e.g. bioweapons, nanotech)
b) unauthorized recursive-self-improving AI
Of these threats only bioweapons are currently at large. The others are future threats.