Thanks. One thing that confuses me is that, if this is true, why do mini reasoning models often seem to out-perform their full counterparts at certain tasks?
e.g. grok 3 beta mini (think) performed overall roughly the same or better than grok 3 beta (think) on benchmarks[1]. And I remember a similar thing with OAI’s reasoning models.
Do you think that as each psychological continuations plays out, they’ll remain identical to one another? Surely not. They will diverge. So although each is itself, each is a psychological stream distinct from the other, originating at the point of brain scanning. Which psychological stream one-at-the-moment-of-brain-scan ends up in is a matter of chance. As you say, they are all equally “true” copies, yet they are separate. So, which stream one ends up in is a matter of chance or, as I said in the original post, a gamble.
Think of it like this: if one had one continuation in which one lived a perfect life, one would be guaranteed to live that perfect life. But if one had 10 copies in which one lived a perfect life, one does benefit at all. It’s the average that matters.
But one is deciding how to use one’s compute at time t (before any copies are made). Ones at time t is under no obligation to spend one’s compute on someone almost entirely unrelated to one just because that person is perhaps still technically oneself. The “once they diverge” statement is beside the point—the decision is made prior to the divergence.
I go into more detail in a post on my Substack (although it’s perhaps a lot less readable, and I still work from similar assumptions, and one would be best to read the first post in the series first).