beren comments on Counting arguments provide no evidence for AI doom

beren 28 Feb 2024 15:50 UTC
14 points
0
This monograph by Bertsekas on the interrelationship between offline RL and online MCTS/search might be interesting—http://www.athenasc.com/Frontmatter_LESSONS.pdf—since it argues that we can conceptualise the contribution of MCTS as essentially that of a single Newton step from the offline start point towards the solution of the Bellman equation. If this is actually the case (I haven’t worked through all details yet) then this seems to be able to be used to provide some kind of bound on the improvement / divergence you can get once you add online planning to a model-free policy.