I think they may have gotten it working for MDPs, but I see little evidence so far that they’ve gotten it for POMDPs, due to the general lack of explicit exploration. (Devin does ask questions but rarely, consistent with the base model capabilities and imitation learning.) The former would still be a big deal, however.
I think they may have gotten it working for MDPs, but I see little evidence so far that they’ve gotten it for POMDPs, due to the general lack of explicit exploration. (Devin does ask questions but rarely, consistent with the base model capabilities and imitation learning.) The former would still be a big deal, however.