Linkpost: A tale of 2.5 orthogonality theses

DavidW13 Mar 2023 14:19 UTC

9 points

Object-Level AI Risk Skepticism AI AI Risk

Creating a link post to an important argument about a potential motte and bailey fallacy used in the AI alignment community.

DavidW13 Mar 2023 14:19 UTC

9 points

3 comments1 min readLW link

Object-Level AI Risk Skepticism AI AI Risk

Viliam 14 Mar 2023 13:25 UTC
5 points
3
tl;dr—what Max_He-Ho said
Author makes a good point that humans are not choosing the utility function for the AI randomly. They are trying to build something useful, which dramatically limits the possible choices.
The problem is that after filtering the choices by “does something profitable and seemingly safe (in the beta version)”, there are still many possible utility functions left, most of them such that we would not want a superhuman AI to optimize for that.
- DavidW 14 Mar 2023 17:03 UTC
  1 point
  0
  Parent
  I’d be curious to hear what you think about my arguments that deceptive alignment is unlikely. Without deceptive alignment, there are many fewer realistic internal goals that produce good training results.
  - Viliam 14 Mar 2023 21:56 UTC
    2 points
    0
    Parent
    Active deception seems unlikely, I agree with that part (weak opinion, didn’t spend much time thinking about it). At this moment, my risk model is like: “AI destroys everything the humans were not paying attention to… plus a few more things after the environment changes dramatically”.
    (Humans care about 100 different things. If you train the AI, you check 10 of them, and the AI will sincerely tell you whether it cares about them or not. You make the AI care about all 10 of them and run it. The remaining 90 things are now lost. Out of the 10 preserved things, 1 was specified a little incorrectly, now it is too late to do anything about it. As a consequence of losing the 90 things, the environment changes so that 2 more of the 10 preserved things do not make much sense in the new context, so they are effectively also lost. The humanity gets to keep 7 out of 100 things they originally cared about.)