This definition is not ideal, however. It misses a core element of what alignment researchers consider important in understanding machine learning models. In particular, in order for a model to be simulatable, it must also be at a human-level or lower. Otherwise, a human would not be able to go step by step through the decision procedure.
I don’t think that’s quite true. It’s possible in theory that your training procedure discovers a radically simple way to model the data that the human wasn’t aware of, and the human can easily step through this radically simple model once it’s discovered. In which case the model could be characterized as both super-human and simulatable. Of course, deep learning doesn’t work this way.
I don’t think that’s quite true. It’s possible in theory that your training procedure discovers a radically simple way to model the data that the human wasn’t aware of, and the human can easily step through this radically simple model once it’s discovered. In which case the model could be characterized as both super-human and simulatable. Of course, deep learning doesn’t work this way.