I think this makes sense, but I disagree with it as a factual assessment.
In particular I think “will make mistakes” is actually an example of some combination of inner and outer alignment problems that are exactly the focus of LW-style alignment.
I also tend to think that the failure to make this connection is perhaps the biggest single problem in both ethical AI and AI alignment spaces, and I continue to be confused about why no one else seems to take this perspective.
I think this makes sense, but I disagree with it as a factual assessment.
In particular I think “will make mistakes” is actually an example of some combination of inner and outer alignment problems that are exactly the focus of LW-style alignment.
I also tend to think that the failure to make this connection is perhaps the biggest single problem in both ethical AI and AI alignment spaces, and I continue to be confused about why no one else seems to take this perspective.
Necroing.
“This perspective” being smuggling in LW alignment into corps through expanding the fear of the AI “making mistakes” to include our fears?