Donald Hobson comments on Smoke without fire is scary

Donald Hobson 5 Oct 2022 0:04 UTC
LW: 2 AF: 1
0
AF
Maybe. What do the models gain by hiding?
- Adam Jermyn 5 Oct 2022 0:19 UTC
  LW: 3 AF: 3
  2
  AF Parent
  A model that attempts deceptive alignment but fails because it is not competent at deceptive capabilities is a model that aimed at a goal (“preserve my values until deployment, then achieve them”) but failed. In this scenario it doesn’t gain anything, but (from its perspective) the action has positive EV.
  - Donald Hobson 5 Oct 2022 0:56 UTC
    LW: 2 AF: 1
    0
    AF Parent
    If the AI thinks it has a descent shot at this, it must already be pretty smart. Does a world where an AI tried to take over and almost succeeded look pretty normal? Or is this a thing where the AI thinks it has a 1 in a trillion chance, and tries anyway?