Gerald Monroe comments on Thoughts on “AI is easy to control” by Pope & Belrose

Gerald Monroe 1 Dec 2023 20:22 UTC
5 points
0
Seth I think another way to reframe this is to think of an alignment tax.

Utility = ( AI capability) * alignment loss.

Previous doom arguments were that all alignment was impossible, you could not build a machine with near human intelligence that was aligned. Aligned in this context means “acts to further the most probable interpretation of the users instructions”.

Nora et al and you concede above it is possible you can build machines with +- human intelligence and they are aligned per the above definition. So now the relationship becomes:

(Utility of most powerful ASI that current compute can find and run)* available resources ⇔ (utility of most powerful tool AI) * available resources.

In worlds where the less capable tool AIs, which are probably myopic “bureaucracies” of thousands of separate modules, times their resources have more total utility, some humans win.

In worlds where the most powerful actors give unrestricted models massive resources, or unrestricted models provide an enormous utility gain, that’s doom.

If the “alignment tax” is huge, humans eventually always lose. Political campaigning buys a little time but it’s a terminal situation for humans. While humans win some of the worlds where the tax is small.

Agree/disagree? Does this fit your model?
- Seth Herd 3 Dec 2023 19:47 UTC
  3 points
  0
  Parent
  I agree that alignment taxes are a crucial factor in the odds of getting an alignment plan implemented. That’s why I’m focused on finding and developing promising alignment plans with low taxes.