Note that a wrench current paradigms throw in this is that self-improvement processes would not look uniquely recursive, since all training algorithms sort of look like “recursive self improvement”. instead, RSI is effectively just “oh no, the training curve was curved differently on this training run”, which is something most likely to happen in open world RL. But I agree, open world RL has the ability to be suddenly surprising in capability growth. and there wouldn’t be much of an opportunity to notice the problem unless we’ve already solved how to intentionally bound capabilities in RL.
There has been some interesting work on bounding capability growth in safe RL already, though. I haven’t looked closely at it, I wonder if any of it is particularly good.
edit: note that I am in fact claiming that after miri deconfuses us, it’ll turn out to apply to ordinary gradient updates
Note that a wrench current paradigms throw in this is that self-improvement processes would not look uniquely recursive, since all training algorithms sort of look like “recursive self improvement”. instead, RSI is effectively just “oh no, the training curve was curved differently on this training run”, which is something most likely to happen in open world RL. But I agree, open world RL has the ability to be suddenly surprising in capability growth. and there wouldn’t be much of an opportunity to notice the problem unless we’ve already solved how to intentionally bound capabilities in RL.
There has been some interesting work on bounding capability growth in safe RL already, though. I haven’t looked closely at it, I wonder if any of it is particularly good.
edit: note that I am in fact claiming that after miri deconfuses us, it’ll turn out to apply to ordinary gradient updates