Simply wrap the I/O of the non-utility model, and then assign the (possibly compound) action the agent will actually take in each timestep utility 1 and assign all other actions a utility 0 - and then take the highest utility action in each timestep.
I’m not sure I understand—is this something that gives you an actual utility function that you can use, say, to getthe utility of various scenarios, calculate expected utility, etc.?
If you have an AI design to which you can provide a utility function to maximize (Instant AI! Just add Utility!), it seems that there are quite a few things that AI might want to do with the utility function that it can’t do with your model.
So it seems that you’re not only replacing the utility function, but also the bit that decides which action to do depending on that utility function. But I may have misunderstood you.
(from the comment you linked)
I’m not sure I understand—is this something that gives you an actual utility function that you can use, say, to getthe utility of various scenarios, calculate expected utility, etc.?
If you have an AI design to which you can provide a utility function to maximize (Instant AI! Just add Utility!), it seems that there are quite a few things that AI might want to do with the utility function that it can’t do with your model.
So it seems that you’re not only replacing the utility function, but also the bit that decides which action to do depending on that utility function. But I may have misunderstood you.