Anomalous comments on Agentized LLMs will change the alignment landscape

Anomalous 10 Apr 2023 16:56 UTC
15 points
10
I am absolutely floored. ChaosGPT. How blindly optimistic haven’t I been? How naive and innocent? I’ve been thinking up complicated disaster scenarios like “the AI might find galaxy-brained optima for its learned proxy-goals far off the distribution we expected and will deceptively cooperate until it’s sure it can defeat us.” No, some idiot will plain code up ChaosGPT-5 in 10 minutes and tell it to destroy the world.
I’ve implicitly been imagining alignment as “if we make sure it doesn’t accidentally go off and kill us all...” when I should have been thinking “can anyone on the planet use this to destroy the world if they seriously tried?”
Fool! Idiot! Learn the lesson.
What links here?
- the gears to ascension's comment on But why would the AI kill us? by So8res (18 Apr 2023 2:43 UTC; 4 points)
- Kaj_Sotala 11 Apr 2023 8:33 UTC
  6 points
  1
  Parent
  Moore’s Law of Mad Science: Every 18 months, the minimum IQ to destroy the world drops by one point.
- Sven Nilsen 10 Apr 2023 21:13 UTC
  3 points
  1
  Parent
  It is also worth thinking if you put in context that people said “no, obviously, humans would not let it out of the box”. Their confident arguments persuaded smart people into thinking that this was not a problem.
  You also have the camp “no, the problem will not be people telling the AI do bad stuff, but about this hard theoretical problem we have to spend years doing research on in order to save humanity” versus “we worry that people will use it for bad things” which in hindsight is the first problem that occurred, while alignment research either comes too late or becomes relevant only once many other problems already happened.
  
  However, in the long run, alignment research might be like building the lighthouse in advance of ship traffic on the ocean. If you never seen the ocean before, a lighthouse factory seems mysterious as it is on land and has no seemingly purpose that is easy to relate to. Yet, such infrastructure might be the engine of civilizations that reaches the next Kardashev scale.
  - Seth Herd 10 Apr 2023 21:24 UTC
    3 points
    0
    Parent
    Yes; but I think that conclusion is based on a logical fallacy that we can only worry about one of those problems. Both are real. This helps with alignment but doesn’t solve it, particularly outer alignment and alignment stability. It definitely increases the practical problem of malicious use of aligned AGI.
    - Sven Nilsen 10 Apr 2023 21:56 UTC
      2 points
      0
      Parent
      Obviously, problems are not exclusive! I find it easier to imagine a civilization that has survived for a long time and made significant technological progress: How would a such civilization approach ASI? I think they will analyze the problem to death and use automated theorem proving as much as possible and having a culture where only a tiny amount of ideas ever get implemented, even if most of those ideas never implemented would seem very good to us. In short: Higher standards for safety.
      
      One challenge with the “people will use it for bad stuff”-situations is that a sufficiently aligned AGI needs to be confidently non-trusting towards minds of people who in general wants to change the underlying physical processes of life as it evolved on Earth. This also holds for more bizarre and somewhat safe goals such as “make human babies have pointy ears”. It is not an X-risk, but we still don’t want that kind of stuff to happen. However, how to engineer AGI systems such that they refuse to cooperate with such people, is enormously difficult and beyond my level of intelligence.