I’m not that excited for projects along the lines of “let’s figure out how to make human feedback more sample efficient”, because I expect that non-takeover-risk-motivated people will eventually be motivated to work on that problem, and will probably succeed quickly given motivation. (Also I guess because I expect capabilities work to largely solve this problem on its own, so maybe this isn’t actually a great example?) I’m fairly excited about projects that try to apply human oversight to problems that the humans find harder to oversee, because I think that this is important for avoiding takeover risk but that the ML research community as a whole will procrastinate on it.
I’m not that excited for projects along the lines of “let’s figure out how to make human feedback more sample efficient”, because I expect that non-takeover-risk-motivated people will eventually be motivated to work on that problem, and will probably succeed quickly given motivation. (Also I guess because I expect capabilities work to largely solve this problem on its own, so maybe this isn’t actually a great example?) I’m fairly excited about projects that try to apply human oversight to problems that the humans find harder to oversee, because I think that this is important for avoiding takeover risk but that the ML research community as a whole will procrastinate on it.