How much have you read about deep learning from “normal” (non-xrisk-aware) AI academics? Belrose’s Tweet-length argument against deceptive alignment sounds really compelling to the sort of person who’s read (e.g.) Simon Prince’s textbook but not this website. (This is a claim about what sounds compelling to which readers rather than about the reality of alignment, but if xrisk-reducers don’t understand why an argument would sound compelling to normal AI practitioners in the current paradigm, that’s less dignified than understanding it well enough to confirm or refute it.)
How much have you read about deep learning from “normal” (non-xrisk-aware) AI academics? Belrose’s Tweet-length argument against deceptive alignment sounds really compelling to the sort of person who’s read (e.g.) Simon Prince’s textbook but not this website. (This is a claim about what sounds compelling to which readers rather than about the reality of alignment, but if xrisk-reducers don’t understand why an argument would sound compelling to normal AI practitioners in the current paradigm, that’s less dignified than understanding it well enough to confirm or refute it.)