i find it interesting that both Buck and Rohin seem to be pessimistic about actually using impact measurement in AGI setups, where Buck wonders what the story is here for how that could work out. Or, he wonders what the story is that motivates researchers (such as myself) to work on the problem. Buck’s position seems to be “I don’t see how this scales, so I’m pessimistic about this area of research”.
ironically, I agree with the first part, but that’s not why I’m excited about impact measure research. As i’ve written previously, i don’t think there’s presently a good story for how this scales to AGI or for how you overcome the game theoretic issues of getting everyone to play nice with low-impact deployments or for how impact measurement significantly helps with inner alignment, etc.
But that’s not really the point of why I’m excited: I’m presently excited about deconfusion. I keep putting effort into the AUP agenda and I keep getting surprising empirical results out (SafeLife) or interesting theory (instrumental convergence formalization) or philosophical insight into agent incentives (the catastrophic convergence conjecture).
So, maybe after I finish my PhD in ~2 years, I end up pivoting research focus, or maybe i do it sooner, or maybe not, but it seems like understanding power-seeking dynamics is at least informally important to aligning AI systems. (And for me personally it’s right on the edge of publishable and useful for long-term work).
in summary, I broadly agree with the pessimism about the use case, but not with the pessimism about the usefulness of the research.
ETA: Another potential point of disagreement (it’s hard to tell because the interview moved a little quickly): it seems like Buck doesn’t think impact measurement will work partly because it’s not clear what the conceptual ideal is or how to get there, while I think it’s pretty clear what the conceptual ideal is – just that there might not be a clean way to do it.
Rohin Shah: I don’t actually know what their actual plan would be [for impact measurement], but one plan I could imagine is figure out what exactly the conceptual things we have to do with impact measurement are, and then whatever method we have for building AGI, probably there’s going to be some part which is specify the goal and then in the specify goal part, instead of just saying pursue X, we want to say pursue X without changing your ability to pursue Y, and Z and W, and P, and Q.
I think this is a pretty reasonable plan – as far as plans go for impact measurement actually being useful in practice with superhuman agents. Which, again, I’m not optimistic about.
i find it interesting that both Buck and Rohin seem to be pessimistic about actually using impact measurement in AGI setups, where Buck wonders what the story is here for how that could work out. Or, he wonders what the story is that motivates researchers (such as myself) to work on the problem. Buck’s position seems to be “I don’t see how this scales, so I’m pessimistic about this area of research”.
ironically, I agree with the first part, but that’s not why I’m excited about impact measure research. As i’ve written previously, i don’t think there’s presently a good story for how this scales to AGI or for how you overcome the game theoretic issues of getting everyone to play nice with low-impact deployments or for how impact measurement significantly helps with inner alignment, etc.
But that’s not really the point of why I’m excited: I’m presently excited about deconfusion. I keep putting effort into the AUP agenda and I keep getting surprising empirical results out (SafeLife) or interesting theory (instrumental convergence formalization) or philosophical insight into agent incentives (the catastrophic convergence conjecture).
So, maybe after I finish my PhD in ~2 years, I end up pivoting research focus, or maybe i do it sooner, or maybe not, but it seems like understanding power-seeking dynamics is at least informally important to aligning AI systems. (And for me personally it’s right on the edge of publishable and useful for long-term work).
in summary, I broadly agree with the pessimism about the use case, but not with the pessimism about the usefulness of the research.
ETA: Another potential point of disagreement (it’s hard to tell because the interview moved a little quickly): it seems like Buck doesn’t think impact measurement will work partly because it’s not clear what the conceptual ideal is or how to get there, while I think it’s pretty clear what the conceptual ideal is – just that there might not be a clean way to do it.
I think this is a pretty reasonable plan – as far as plans go for impact measurement actually being useful in practice with superhuman agents. Which, again, I’m not optimistic about.