Yes, this is the few-shot alignment world described in the post. I agree that in principle if boxing could completely halt RSI then that would be fantastic but that especially with each iteration of RSI there is some probability that the box will fail and we would then get unbounded RSI. This means we would get effectively a few ‘shots’ to align our boxed AGI before we die.
Yes, this is the few-shot alignment world described in the post. I agree that in principle if boxing could completely halt RSI then that would be fantastic but that especially with each iteration of RSI there is some probability that the box will fail and we would then get unbounded RSI. This means we would get effectively a few ‘shots’ to align our boxed AGI before we die.