(Note: I am sick and tired of the cognitive trick that the alternative to ‘let anyone who wants to risk killing everyone on the planet do so’ is called authoritarian, and the assumption it would imply some sort of oppressive or dictatorial state. It wouldn’t.)
I agree up until you say “It wouldn’t.” The severity of governmental regulations varies with the difficulty of alignment, the difficulty of creating AGI and some other factors. In a Yudkowskian world, you’d have to have sever control over the compute supply, and distribution, chain, as well as on AI research, in order to be pretty confident we’d avert x-risks.
That said, there are relatively light touches which would improve our log odds by quite a bit under a broad range of views. And according to my model, there are interventions that would reduce P(doom) by like, a third, maybe even a half e.g. anualy-decreasing compute threshold, tracking chips at the hardware level, restrictions on what random biochemicals you can order synthesized without any checks etc. These are fairly restrictive, but not that onerous.
EDIT: I just realized that your statement is compatible with what I said.
A hackathon at MIT confirms that the cost of converting Llama-2 to ‘Spicy-Llama-2’ is about $200, after which it will happily walk you through synthesizing the 1918 flu virus. Without access to model weights, this is at least far more difficult and expensive. Nikki Teran has a thread summarizing.
I continue to be confused about this if they managed to undo all of the safeguards with $200 worth of compute. Does that mean Meta spent $200 in compute to install their safeguards? Perhaps naively, I thought it would take as much compute to undo some training as it would to do it in the first place. Unless, of course, they had access to the base-model. In which case, it would be trivial to undo the fine-tuning.
Yep, as the edit says I don’t think we disagree on the first point—there are versions that are oppressive, but also versions that are not that still have large positive effects.
On the second point, I believe this is because it is much harder to introduce safeguards than to remove them, because removing them is a highly blunt target, whereas good safeguards have to be detailed to avoid false positives (which Llama-2 did not do a good job avoiding, but they did try). This is the key asymmetry here, the amount Meta (or anyone else) spends tuning does not help here.
I agree up until you say “It wouldn’t.” The severity of governmental regulations varies with the difficulty of alignment, the difficulty of creating AGI and some other factors. In a Yudkowskian world, you’d have to have sever control over the compute supply, and distribution, chain, as well as on AI research, in order to be pretty confident we’d avert x-risks.
That said, there are relatively light touches which would improve our log odds by quite a bit under a broad range of views. And according to my model, there are interventions that would reduce P(doom) by like, a third, maybe even a half e.g. anualy-decreasing compute threshold, tracking chips at the hardware level, restrictions on what random biochemicals you can order synthesized without any checks etc. These are fairly restrictive, but not that onerous.
EDIT: I just realized that your statement is compatible with what I said.
I continue to be confused about this if they managed to undo all of the safeguards with $200 worth of compute. Does that mean Meta spent $200 in compute to install their safeguards? Perhaps naively, I thought it would take as much compute to undo some training as it would to do it in the first place. Unless, of course, they had access to the base-model. In which case, it would be trivial to undo the fine-tuning.
Yep, as the edit says I don’t think we disagree on the first point—there are versions that are oppressive, but also versions that are not that still have large positive effects.
On the second point, I believe this is because it is much harder to introduce safeguards than to remove them, because removing them is a highly blunt target, whereas good safeguards have to be detailed to avoid false positives (which Llama-2 did not do a good job avoiding, but they did try). This is the key asymmetry here, the amount Meta (or anyone else) spends tuning does not help here.