I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans.
Seems plausible, but I specifically asked for reward gaming, instrumental convergence, and the challenges of value learning. (I’m fine with not having concrete disaster scenarios.) Do you still think this is that plausible?
And thirdly, once you get agreement that there are problems, you basically get “we should fix the problems first” for free.
I agree that Q2 is more of a blocker than Q3, though I am less optimistic than you seem to be.
Overall I updated towards slightly sooner based on your comment and Beth’s comment below (given that both of you interact with more AGI researchers than I do), but not that much, because I’m not sure whether you were looking at just reward gaming or all three conditions I laid out, and most of the other considerations were ones I had thought about and it’s not obvious how to update on an argument of the form “I think <already-considered consideration>, therefore you should update in this direction”. It would have been easier to update on “I think <already-considered consideration>, therefore the absolute probability in the next N years is X%”.
Seems plausible, but I specifically asked for reward gaming, instrumental convergence, and the challenges of value learning. (I’m fine with not having concrete disaster scenarios.) Do you still think this is that plausible?
I agree that Q2 is more of a blocker than Q3, though I am less optimistic than you seem to be.
Overall I updated towards slightly sooner based on your comment and Beth’s comment below (given that both of you interact with more AGI researchers than I do), but not that much, because I’m not sure whether you were looking at just reward gaming or all three conditions I laid out, and most of the other considerations were ones I had thought about and it’s not obvious how to update on an argument of the form “I think <already-considered consideration>, therefore you should update in this direction”. It would have been easier to update on “I think <already-considered consideration>, therefore the absolute probability in the next N years is X%”.