For a simple task like booking a restaurant, we could just ask the (frozen) overseer-AI to pick[1] actions, no?
The interesting application MONA seems to be when the myopic RL agent is able to produce better suggestions than the overseer
Edit: I elaborated
Plus maybe let the overseer observe the result and say “oops” and roll back that action, if we can implement a rollback in this context
If it were as simple as “just ask an LLM to choose actions” someone would have deployed this product a while ago.
But in any case I agree this isn’t the most interesting case for MONA, I talked about it because that’s what Daniel asked about.
For a simple task like booking a restaurant, we could just ask the (frozen) overseer-AI to pick[1] actions, no?
The interesting application MONA seems to be when the myopic RL agent is able to produce better suggestions than the overseer
Edit: I elaborated
Plus maybe let the overseer observe the result and say “oops” and roll back that action, if we can implement a rollback in this context
If it were as simple as “just ask an LLM to choose actions” someone would have deployed this product a while ago.
But in any case I agree this isn’t the most interesting case for MONA, I talked about it because that’s what Daniel asked about.