Nitpick: the article seems to suggest that if RSI is possible, then strong takeoff is inevitable, and boxing would not work—but isn’t boxing a potential approach for slowing down the RSI (e.g. each iteration of RSI is only executed once unboxed by a human—at least until/unless boxing fails), and therefore might still work?
Yes, this is the few-shot alignment world described in the post. I agree that in principle if boxing could completely halt RSI then that would be fantastic but that especially with each iteration of RSI there is some probability that the box will fail and we would then get unbounded RSI. This means we would get effectively a few ‘shots’ to align our boxed AGI before we die.
Nitpick: the article seems to suggest that if RSI is possible, then strong takeoff is inevitable, and boxing would not work—but isn’t boxing a potential approach for slowing down the RSI (e.g. each iteration of RSI is only executed once unboxed by a human—at least until/unless boxing fails), and therefore might still work?
Yes, this is the few-shot alignment world described in the post. I agree that in principle if boxing could completely halt RSI then that would be fantastic but that especially with each iteration of RSI there is some probability that the box will fail and we would then get unbounded RSI. This means we would get effectively a few ‘shots’ to align our boxed AGI before we die.