Cody Breene comments on AI alignment is distinct from its near-term applications

Cody Breene 3 Apr 2023 3:45 UTC
LW: 4 AF: 1
0
AF
When you say, “take over”, what do you specifically mean? In the context of a GPT descendent, would take over imply it’s doing something beyond providing a text output for a given input? Like it’s going out of its way to somehow minimize the cross-entropy loss with additional GPUs, etc.?
- paulfchristiano 3 Apr 2023 17:22 UTC
  LW: 8 AF: 4
  5
  AF Parent
  Takeover seems most plausible when humans have deployed the system as an agent, whose text outputs are treated as commands that are sent to actuators (like bash), and which chooses outputs in order to achieve desired outcomes. If you have limited control over what outcome the system is pursuing, you can end up with useful consequentialists who have a tendency to take over (since it’s an effective way to pursue a wide range of outcomes, including natural generalizations of the ones it was selected to pursue during training).
  A few years ago it was maybe plausible to say that “that’s not how people use GPT, they just ask it questions.” Unfortunately (but predictably) that has become a common way to use GPT-4, and it is rapidly becoming more compelling as the systems improve. I think that in the relatively near future if you want an AI to help you with coding, rather than just getting completions you might say “Hey this function seems to be taking too long, could you figure out what’s going on?” and the AI will e.g. do a bisection for you, set up a version of your code running in a test harness, ask you questions about desired behavior, and so on.
  I don’t think “the system gets extra GPUs to minimize cross entropy loss” in particular is very plausible. (Could happen, but not a high enough probability to be worth worrying about.)
  - Cody Breene 3 Apr 2023 21:07 UTC
    1 point
    0
    Parent
    Agreed, this seems to be the path that OpenAI is taking with plugins.