The claim is verification is easier than generation. This post considers a completely different claim that “verification is easy”, e.g.
How does the ease-of-verification delta propagate to AI?
if I apply the “verification is generally easy” delta to my models, then delegating alignment work to AI makes total sense.
if I apply a “verification is generally easy” delta, then I expect the world to generally contain far less low-hanging fruit
I just don’t care much if the refrigerator or keyboard or tupperware or whatever might be bad in non-obvious ways that we failed to verify, unless you also argue that it would be easier to create better versions from scratch than to notice the flaws.
Now to be fair, maybe Paul and I are just fooling ourselves, and really all of our intuitions come from “verification is easy”, which John gestures at:
But I don’t think “verification is easy” matters much to my views. Re: the three things you mention:
From my perspective (and Paul’s) the air conditioning thing had very little bearing on alignment.
In principle I could see myself thinking bureaucracies are terrible given sufficient difficulty-of-verification. But like, most of my reasoning here is just looking at the world and noticing large bureaucracies often do better (see e.g. comments here). Note I am not saying large human bureaucracies don’t have obvious, easily-fixable problems—just that, in practice, they often do better than small orgs.
Separately, from an alignment perspective, I don’t care much what human bureaucracies look like, since they are very disanalogous to AI bureaucracies.
If you take AI progress as exogenous (i.e. you can’t affect it), outsourcing safety is a straightforward consequence of (a) not-super-discontinuous progress (sometimes called “slow takeoff”) and (b) expecting new problems as capability increases.
Once you get to AIs that are 2x smarter than you, and have to align the AIs that are going to be 4x smarter than you, it seems like either (a) you’ve failed to align the 2x AIs (in which case further human-only research seems unlikely to change much, so it doesn’t change much if you outsource to the AIs and they defect) or (b) you have aligned the 2x AIs (in which case your odds for future AIs are surely better if you use the 2x AIs to do more alignment research).
Obviously “how hard is verification” has implications for whether you work on slowing AI progress, but this doesn’t seem central.
There’s lots of complications I haven’t discussed but I really don’t think “verification is easy” ends up mattering very much to any of them.
The claim is verification is easier than generation. This post considers a completely different claim that “verification is easy”, e.g.
I just don’t care much if the refrigerator or keyboard or tupperware or whatever might be bad in non-obvious ways that we failed to verify, unless you also argue that it would be easier to create better versions from scratch than to notice the flaws.
Now to be fair, maybe Paul and I are just fooling ourselves, and really all of our intuitions come from “verification is easy”, which John gestures at:
But I don’t think “verification is easy” matters much to my views. Re: the three things you mention:
From my perspective (and Paul’s) the air conditioning thing had very little bearing on alignment.
In principle I could see myself thinking bureaucracies are terrible given sufficient difficulty-of-verification. But like, most of my reasoning here is just looking at the world and noticing large bureaucracies often do better (see e.g. comments here). Note I am not saying large human bureaucracies don’t have obvious, easily-fixable problems—just that, in practice, they often do better than small orgs.
Separately, from an alignment perspective, I don’t care much what human bureaucracies look like, since they are very disanalogous to AI bureaucracies.
If you take AI progress as exogenous (i.e. you can’t affect it), outsourcing safety is a straightforward consequence of (a) not-super-discontinuous progress (sometimes called “slow takeoff”) and (b) expecting new problems as capability increases.
Once you get to AIs that are 2x smarter than you, and have to align the AIs that are going to be 4x smarter than you, it seems like either (a) you’ve failed to align the 2x AIs (in which case further human-only research seems unlikely to change much, so it doesn’t change much if you outsource to the AIs and they defect) or (b) you have aligned the 2x AIs (in which case your odds for future AIs are surely better if you use the 2x AIs to do more alignment research).
Obviously “how hard is verification” has implications for whether you work on slowing AI progress, but this doesn’t seem central.
There’s lots of complications I haven’t discussed but I really don’t think “verification is easy” ends up mattering very much to any of them.