TurnTrout comments on What’s up with “Responsible Scaling Policies”?

TurnTrout 30 Oct 2023 20:01 UTC
12 points
4
ok, I am quite confident you will get tons of evidence that AI systems are not aligned with you within the next few years. My primary question is what you will actually do as soon as you have identified a system as unaligned or dangerous in this way
Any operationalizations that people might make predictions on?
- ryan_greenblatt 31 Oct 2023 4:52 UTC
  2 points
  0
  Parent
  The most clear case I can think of is:
  
  Prior to AI escape or takeover, will we catch an AI in production clearly seriously trying to do an egregiously bad action when not at all prompted or directly trained to do so. For example, trying:
  - to escape
  - to create bioweapons
  - to do some action which is clearly part of an attempted takeover (prior two action might not be part of takeover)
  (This might not be because of internal motivations well described as deceptive alignment.)
  
  (To be clear, I am not confidently predicting we will see this outcome and the original quote is from habryka.)
  - TurnTrout 6 Nov 2023 22:58 UTC
    2 points
    2
    Parent
    I expect to not see this, conditional on adding a stipulation like “the AI wasn’t scaffolded and then given a goal like ‘maximize profit’”, because I could imagine the AI-system coming up with nasty subgoals. In particular, I don’t expect egregiously bad actions from autoregressive sampling of an LLM tasked with doing scientific research.