the problem with this is the state space is so large that it cannot explore every transition, so it can’t follow transitions backwards in a straight forward manner as you’ve proposed. It needs some kind of intuition to minimize the search space, to generalize it.
Unfortunately I’m not sure what that would look like. :(
Oh I see: for that specific instance of the task.
I’d like to see someone make this AI, I want to know how it could be done.