It’s noteworthy that the safety guarantee relies on the “hidden cost” (:= proxy_utility—actual_utility) of each action being bounded above. If it’s unbounded, then the theoretical guarantee disappears.
Good point! And indeed I am skeptical that there are useful bounds on the cost...
It’s noteworthy that the safety guarantee relies on the “hidden cost” (:= proxy_utility—actual_utility) of each action being bounded above. If it’s unbounded, then the theoretical guarantee disappears.
Good point! And indeed I am skeptical that there are useful bounds on the cost...