Simulacra are belief structures (i.e., a multi-factor probability distribution, with time dimension). LM fine-tuning doesn’t select beliefs structures among a pre-existing set of distinct belief structures (there is no such set represented by anything in the physical reality of the training process), it updates a singular beliefs structure, held (in some sense) by the LM after every training step. The belief structure could be superposed initially (“99% I’m Luigi, 1% I’m Waluigi”), but still it is a singular belief structure, and the updates should be relatively smooth (assuming a small learning rate), i.e., the belief structure couldn’t transform between training steps in clearly discontinuous jumps in the statistical manifold.
If I parse things right, the initial state is something like 1⁄3 “I’m Luigi” 1⁄3 “I’m bowser” and 1⁄3 “I’m waluigi”, and the RLHF eliminates the bowser belief while having no effect on the other beliefs.
Simulacra are belief structures (i.e., a multi-factor probability distribution, with time dimension). LM fine-tuning doesn’t select beliefs structures among a pre-existing set of distinct belief structures (there is no such set represented by anything in the physical reality of the training process), it updates a singular beliefs structure, held (in some sense) by the LM after every training step. The belief structure could be superposed initially (“99% I’m Luigi, 1% I’m Waluigi”), but still it is a singular belief structure, and the updates should be relatively smooth (assuming a small learning rate), i.e., the belief structure couldn’t transform between training steps in clearly discontinuous jumps in the statistical manifold.
If I parse things right, the initial state is something like 1⁄3 “I’m Luigi” 1⁄3 “I’m bowser” and 1⁄3 “I’m waluigi”, and the RLHF eliminates the bowser belief while having no effect on the other beliefs.