Strongly upvoted, this was a very valuable objection.
My theory of change for theorems sounds like Cardano from your example:
My theory of change is using adequate/​comprehensive selection theoretic results to design training setups/​environments that select for safety properties we want of our systems.
[Copied from a comment I made on Discord.]
Perhaps it’s a pessimistic bet, but decorrelated alignment bets are good. 😋
Strongly upvoted, this was a very valuable objection.
My theory of change for theorems sounds like Cardano from your example:
[Copied from a comment I made on Discord.]
Perhaps it’s a pessimistic bet, but decorrelated alignment bets are good. 😋