Chris_Leong comments on Thomas Kwa’s Shortform

Chris_Leong 13 Nov 2024 6:11 UTC
LW: 4 AF: 2
0
AF
“How can we get more evidence on whether scheming is plausible?”—What if we ran experiments where we included some pressure towards scheming (either RL or fine-tuning) and we attempted to determine the minimum such pressure required to cause scheming? We could further attempt to see how this interacts with scaling.