Ramana Kumar comments on Will Capabilities Generalise More?

Ramana Kumar 6 Jul 2022 15:39 UTC
LW: 6 AF: 5
3
AF
The desiderata you mentioned:
1. Make sure the feedback matches the preferences
2. Make sure the agent isn’t changing the preferences
It seems that RRM/Debate somewhat addresses both of these, and path-specific objectives is mainly aimed at addressing issue 2. I think (part of) John’s point is that RRM/Debate don’t address issue 1 very well, because we don’t have very good or robust processes for judging the various ways we could construct or improve these schemes. Debate relies on a trustworthy/reliable judge at the end of the day, and we might not actually have that.