It’s pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won’t FOOM, or we otherwise needn’t do anything inconvenient to get good outcomes. It’s proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
FWIW I’m considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected. There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible. That optimism just doesn’t take it under the stop worrying threshold.
It’s pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won’t FOOM, or we otherwise needn’t do anything inconvenient to get good outcomes. It’s proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
FWIW I’m considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected. There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible. That optimism just doesn’t take it under the stop worrying threshold.