I can’t help but feel that this sneakily avoids some of the hard parts of the problem by assuming that we know how to find certain things like “the state according to the AI/human” from the start.
Yeah, this is the part I’m confused about as well. I think this proposal involves training a neural network emulating a human? Otherwise I’m not sure how EvalH(F(sm),oh) is supposed to work. It requires a human to make a prediction about the next step using observations and the direct translation of the machine state, which requires us to have some way to describe the full state in a way that the “human” we’re using can understand. This precludes using actual humans to label the data, because I don’t think we actually have any way to provide such a description. We’d need to train up a human simulator specifically adapted for parsing this sort of output.
I can’t help but feel that this sneakily avoids some of the hard parts of the problem by assuming that we know how to find certain things like “the state according to the AI/human” from the start.
Yeah, this is the part I’m confused about as well. I think this proposal involves training a neural network emulating a human? Otherwise I’m not sure how EvalH(F(sm),oh) is supposed to work. It requires a human to make a prediction about the next step using observations and the direct translation of the machine state, which requires us to have some way to describe the full state in a way that the “human” we’re using can understand. This precludes using actual humans to label the data, because I don’t think we actually have any way to provide such a description. We’d need to train up a human simulator specifically adapted for parsing this sort of output.