I admire the Shard Theory crowd for the following reason: They have idiosyncratic intuitions about deep learning and they’re keen to tell you how those intuitions should shift you on various alignment-relevant questions.
For example, “How likely is scheming?”, “How likely is sharp left turn?”, “How likely is deception?”, “How likely is X technique to work?”, “Will AIs acausally trade?”, etc.
These aren’t rigorous theorems or anything, just half-baked guesses. But they do actually say whether their intuitions will, on the margin, make someone more sceptical or more confident in these outcomes, relative to the median bundle of intuitions.
I admire the Shard Theory crowd for the following reason: They have idiosyncratic intuitions about deep learning and they’re keen to tell you how those intuitions should shift you on various alignment-relevant questions.
For example, “How likely is scheming?”, “How likely is sharp left turn?”, “How likely is deception?”, “How likely is X technique to work?”, “Will AIs acausally trade?”, etc.
These aren’t rigorous theorems or anything, just half-baked guesses. But they do actually say whether their intuitions will, on the margin, make someone more sceptical or more confident in these outcomes, relative to the median bundle of intuitions.
The ideas ‘pay rent’.