Logan Riggs comments on Karl Krueger’s Shortform

Logan Riggs 22 Dec 2024 12:08 UTC
2 points
0
Claude 3.5 seems to understand the spirit of the law when pursuing a goal X.
A concern I have is that future training procedures will incentivize more consequential reasoning (because those get higher reward). This might be obvious or foreseeable, but could be missed/ignored under racing pressure or when lab’s LLMs are implementing all the details of research.