Yeah I agree in retrospect about utility functions not being a good formulation of corrigibility, we phrased it like that because we spent some time thinking about the MIRI corrigibility paper, which uses a framing like this to make it concrete.
On outer alignment: I think if we have utility function over universe histories/destinies, then this is a sufficiently general framing that any outer alignment solution should be capable of being framed this way. Although it might not end up being the most natural framing.
On cruxes: Good point, we started off using it pretty much correctly but ended up abusing those sections. Oops.
On soft optimization: We talked a lot about quantilizers, the quantilizers paper is among my favorite papers. I’m not really convinced yet that the problems we would want an AGI to solve (in the near term) are in the “requires super high optimization pressure” category. But we did discuss how to improve the capabilities of quantilizers, by adjusting the level of quantilization based on some upper bound on the uncertainty about the goal on the local task.
On pointers problem: Yeah we kind of mushed together extra stuff into the pointers problem section, because this was how our discussion went. Someone did also argue that it would be more of a problem if NAH was weak, but overall I thought it would probably be a bad way to frame the problem if the NAH was weak.
Yeah I agree in retrospect about utility functions not being a good formulation of corrigibility, we phrased it like that because we spent some time thinking about the MIRI corrigibility paper, which uses a framing like this to make it concrete.
On outer alignment: I think if we have utility function over universe histories/destinies, then this is a sufficiently general framing that any outer alignment solution should be capable of being framed this way. Although it might not end up being the most natural framing.
On cruxes: Good point, we started off using it pretty much correctly but ended up abusing those sections. Oops.
On soft optimization: We talked a lot about quantilizers, the quantilizers paper is among my favorite papers. I’m not really convinced yet that the problems we would want an AGI to solve (in the near term) are in the “requires super high optimization pressure” category. But we did discuss how to improve the capabilities of quantilizers, by adjusting the level of quantilization based on some upper bound on the uncertainty about the goal on the local task.
On pointers problem: Yeah we kind of mushed together extra stuff into the pointers problem section, because this was how our discussion went. Someone did also argue that it would be more of a problem if NAH was weak, but overall I thought it would probably be a bad way to frame the problem if the NAH was weak.