Well, yes, it also includes learning weak agent’s models more generally, not just the “values”. But I think the point stands. It’s elaborated better in the linked post. As AIs will receive most of the same information that humans receive through always-on wearable sensors, there won’t be much to learn for AIs from humans. Rather, it’s humans that will need to do their homework, to increase the quality of their value judgements.
I don’t think “weak-to-strong generalization” is well described as “trying to learn the values of weak agents”.
Well, yes, it also includes learning weak agent’s models more generally, not just the “values”. But I think the point stands. It’s elaborated better in the linked post. As AIs will receive most of the same information that humans receive through always-on wearable sensors, there won’t be much to learn for AIs from humans. Rather, it’s humans that will need to do their homework, to increase the quality of their value judgements.