IDA is really aiming to be cost-competitive and performance-competitive, say to within overhead of 10%. That may or may not be possible, but it’s the goal.
If the compute required to build and run your reward function is small relative to the compute required to train your model, then it seems like overhead is small. If you can do semi-supervised RL and only require a reward function evaluation on a minority of trajectories (e.g. because most of the work is learning about how to manipulate the environment), then you can be OK as long as the cost of running the reward function isn’t too much higher.
Whether that’s possible is a big open question. Whether it’s date competitive depends on how fast you figure out how to do it.
I knew that the goal was to get IDA to be cost-competitive, but I thought current versions of it weren’t. But that was just my rough impression; glad to be wrong, since it makes IDA seem even more promising. :) Of all the proposals I’ve heard of, IDA seems to have the best combination of cost, date, and performance-competitiveness.
IDA is really aiming to be cost-competitive and performance-competitive, say to within overhead of 10%. That may or may not be possible, but it’s the goal.
If the compute required to build and run your reward function is small relative to the compute required to train your model, then it seems like overhead is small. If you can do semi-supervised RL and only require a reward function evaluation on a minority of trajectories (e.g. because most of the work is learning about how to manipulate the environment), then you can be OK as long as the cost of running the reward function isn’t too much higher.
Whether that’s possible is a big open question. Whether it’s date competitive depends on how fast you figure out how to do it.
I knew that the goal was to get IDA to be cost-competitive, but I thought current versions of it weren’t. But that was just my rough impression; glad to be wrong, since it makes IDA seem even more promising. :) Of all the proposals I’ve heard of, IDA seems to have the best combination of cost, date, and performance-competitiveness.
I think our current best implementation of IDA would neither be competitive nor scalably aligned :)