The AlphaZero algorithm doesn’t obviously not involve an LLM. It has a “policy network” to propose moves, and I don’t know what that looks like in the case of AlphaProof. If I had to guess blindly I would guess it’s an LLM, but maybe they’ve got something else instead.
The diagram actually says it uses the AlphaZero algorithm. Which obviously doesn’t involve an LLM.
The AlphaZero algorithm doesn’t obviously not involve an LLM. It has a “policy network” to propose moves, and I don’t know what that looks like in the case of AlphaProof. If I had to guess blindly I would guess it’s an LLM, but maybe they’ve got something else instead.