TL;DR: if this rumored Q* thing represents a shift from “most probable” to “most accurate” token completion,
Q* is most likely a RL method and thus more about a shift from “most probable” to “most valuable”.
Q* is most likely a RL method and thus more about a shift from “most probable” to “most valuable”.