I would agree that some people figured this out faster than others, but the analogy is also instructional here: if even a small community like the infosec world has a hard time percolating information about failure modes and how to address them, we should expect the average ML engineer to be doing very unsafe things for a very long time by default.
To dive deeper into the XSS example, I think even among those that understood the output encoding and canonicalization solutions early, it still took a while to formalize the definition of an encoding context concisely enough to be able to have confidence that all such edge cases could be covered.
It might be enough to simply recognize an area of alignment that has dragons and let the experts safely explore the nature and contours of these dragons, but you probably couldn’t build a useful web application that doesn’t display user-influencable input. I think trying to get the industry to halt on building even obvious dragon-infested things is part of what has gotten Eliezer so burned out and pessimistic.
I would agree that some people figured this out faster than others, but the analogy is also instructional here: if even a small community like the infosec world has a hard time percolating information about failure modes and how to address them, we should expect the average ML engineer to be doing very unsafe things for a very long time by default.
To dive deeper into the XSS example, I think even among those that understood the output encoding and canonicalization solutions early, it still took a while to formalize the definition of an encoding context concisely enough to be able to have confidence that all such edge cases could be covered.
It might be enough to simply recognize an area of alignment that has dragons and let the experts safely explore the nature and contours of these dragons, but you probably couldn’t build a useful web application that doesn’t display user-influencable input. I think trying to get the industry to halt on building even obvious dragon-infested things is part of what has gotten Eliezer so burned out and pessimistic.