baturinsky comments on Are extrapolation-based AIs alignable?

baturinsky 25 Mar 2023 14:56 UTC
1 point
0
I suspect GPT already can figure what is the description of the “benevolent” action. If not, please give me an example of AI mislabeling it.
Problems are that AI now is too dumb to figure if the act is bad if it is described in some roundabout way https://humanevents.com/2023/03/24/chatgpt-helps-plan-a-state-run-death-camp , or is too complex, or have to be inferred from non-text information etc.
For example, it would take a very smart AI, probably AGI, to reliably figure out that some abstract math or engineering task is actually a weapon recipe.