a very extreme failure of natural abstraction, such that human concepts cannot be faithfully and robustly translated into the system’s internal ontology at all.
This hypothetical suggests to me that the AI might not be very good at e.g. manipulating humans in an AI-box experiment, since it just doesn’t understand how humans think all that well.
I wonder what MIRI thinks about this 2013 post (“The genie knows, but doesn’t care”) nowadays. Seems like the argument is less persuasive now, with AIs that seem to learn representations first, and later are given agency by the devs. I actually suspect your model of Eliezer is wrong, because it seems to imply he believes “the AI actually just doesn’t know”, and it’s a little hard for me to imagine him saying that.
Alternatively, maybe the “faithfully and robustly” bit is supposed to be very load-bearing. However, it’s already the case that humans learn idiosyncratic, opaque neural representations of our values from sense data—yet we’re able to come into alignment with each other, without a bunch of heavy-duty interpretability or robustness techniques.
This hypothetical suggests to me that the AI might not be very good at e.g. manipulating humans in an AI-box experiment, since it just doesn’t understand how humans think all that well.
I wonder what MIRI thinks about this 2013 post (“The genie knows, but doesn’t care”) nowadays. Seems like the argument is less persuasive now, with AIs that seem to learn representations first, and later are given agency by the devs. I actually suspect your model of Eliezer is wrong, because it seems to imply he believes “the AI actually just doesn’t know”, and it’s a little hard for me to imagine him saying that.
Alternatively, maybe the “faithfully and robustly” bit is supposed to be very load-bearing. However, it’s already the case that humans learn idiosyncratic, opaque neural representations of our values from sense data—yet we’re able to come into alignment with each other, without a bunch of heavy-duty interpretability or robustness techniques.
The genie argument was flawed at the time, for reasons pointed out at the time, and ignored at the time.
Ignored or downvoted. Perhaps someone could make a postmortem analysis of those comment threads today.