I still feel fine about what I said, but that’s two people finding it confusing (and thinking it is misleading) so I just changed it to something that is somewhat less contentful but hopefully clearer and less misleading.
Clarifying what I mean by way of analogy: suppose I’m worried about unzipping a malicious file causing my computer to start logging all my keystrokes and sending them to a remote server. I’d say that seems like a strange and extreme failure mode that you should be able to robustly avoid if we write our code right, regardless of how the logical facts shake out about how compression works. That said, I still agree that in some sense it’s the “default” behavior without extensive countermeasures. It’s rare for a failure to be so clearly different from what you want that you can actually hope to avoid them in the worst case. But that property is not enough to suggest that they are easily avoided.
I obviously don’t agree with the inference from “X is the default result of optimizing for almost anything” to “X is the default result of our attempt to build useful AI without exotic technology or impressive mitigation efforts.”
My overall level of optimism doesn’t mostly come from hopes about exotic alignment technology. I am indeed way more optimistic about “exotic alignment technology” than you and maybe that optimism cuts off 25-50% of the total alignment risk. I think that’s the most interesting/important disagreement between us since it’s the area we both work in. But more of the disagreement about P(alignment) comes from me thinking it is much, much more likely that “winging it” works long enough that early AI systems will have completely changed the game.
I spend a significant fraction of my time arguing with people who work in ML about why they should be more scared. The problem mostly doesn’t seem to be that I take a moderate or reassuring tone, it’s that they don’t believe the arguments I make (which are mostly strictly weaker forms your arguments, which they are in turn even less on board with).
I still feel fine about what I said, but that’s two people finding it confusing (and thinking it is misleading) so I just changed it to something that is somewhat less contentful but hopefully clearer and less misleading.
Clarifying what I mean by way of analogy: suppose I’m worried about unzipping a malicious file causing my computer to start logging all my keystrokes and sending them to a remote server. I’d say that seems like a strange and extreme failure mode that you should be able to robustly avoid if we write our code right, regardless of how the logical facts shake out about how compression works. That said, I still agree that in some sense it’s the “default” behavior without extensive countermeasures. It’s rare for a failure to be so clearly different from what you want that you can actually hope to avoid them in the worst case. But that property is not enough to suggest that they are easily avoided.
I obviously don’t agree with the inference from “X is the default result of optimizing for almost anything” to “X is the default result of our attempt to build useful AI without exotic technology or impressive mitigation efforts.”
My overall level of optimism doesn’t mostly come from hopes about exotic alignment technology. I am indeed way more optimistic about “exotic alignment technology” than you and maybe that optimism cuts off 25-50% of the total alignment risk. I think that’s the most interesting/important disagreement between us since it’s the area we both work in. But more of the disagreement about P(alignment) comes from me thinking it is much, much more likely that “winging it” works long enough that early AI systems will have completely changed the game.
I spend a significant fraction of my time arguing with people who work in ML about why they should be more scared. The problem mostly doesn’t seem to be that I take a moderate or reassuring tone, it’s that they don’t believe the arguments I make (which are mostly strictly weaker forms your arguments, which they are in turn even less on board with).