This sounds like a reinvention of quantilization, and yes that’s a thing you can do to improve safety, but 1. you still need your prior over plans to come from somewhere (perhaps you start out with something IRL-like, and then update it based on experience of what worked, which brings you back to square one), 2. it just gives you a safety-capabilities tradeoff dial rather than particularly solving safety.
If you do basic reinforcement based on experience, then that’s an unbounded adversarial search, but it’s really slow and therefore might be safe. And it also raises the question of whether there are other safer approaches.
This sounds like a reinvention of quantilization, and yes that’s a thing you can do to improve safety, but 1. you still need your prior over plans to come from somewhere (perhaps you start out with something IRL-like, and then update it based on experience of what worked, which brings you back to square one), 2. it just gives you a safety-capabilities tradeoff dial rather than particularly solving safety.
Or hmm...
If you do basic reinforcement based on experience, then that’s an unbounded adversarial search, but it’s really slow and therefore might be safe. And it also raises the question of whether there are other safer approaches.