gwern comments on Fabien’s Shortform

gwern Mar 22, 2025, 7:42 PM
LW: 5 AF: 4
0
AF
You would also expect that the larger models will be more sample-efficient, including at in-context learning of variations of existing tasks (which of course is what steganography is). So all scale-ups go much further than any experiment at small-scale like 8B would indicate. (No idea what ‘medium-scale’ here might mean.)