We can then train a model to predict these human evaluations, and search for actions that lead to predicted futures that look good.
For simplicity and concreteness you can imagine a brute force search. A more interesting system might train a value function and/or policy, do Monte-Carlo Tree Search with learned heuristics, and so on. These techniques introduce new learned models, and in practice we would care about ELK for each of them. But we don’t believe that this complication changes the basic picture and so we leave it out.
I think this seems wrong at first pass. I don’t know how this “complication” doesn’t change the basic picture for ELK. Unless they’re trying to inner-align the model to the predictor as its objective? But I still think that changes the basic picture. Realistic cognition is not argmax over a crisp objective, not even spiritually, not even almost.
(This is not me giving a positive explanation for how I think this picture is changed, just briefly marking disagreement.)
From the ELK report:
I think this seems wrong at first pass. I don’t know how this “complication” doesn’t change the basic picture for ELK. Unless they’re trying to inner-align the model to the predictor as its objective? But I still think that changes the basic picture. Realistic cognition is not argmax over a crisp objective, not even spiritually, not even almost.
(This is not me giving a positive explanation for how I think this picture is changed, just briefly marking disagreement.)