Seth Herd comments on Agentized LLMs will change the alignment landscape

Seth Herd 10 Apr 2023 21:24 UTC
3 points
0
Yes; but I think that conclusion is based on a logical fallacy that we can only worry about one of those problems. Both are real. This helps with alignment but doesn’t solve it, particularly outer alignment and alignment stability. It definitely increases the practical problem of malicious use of aligned AGI.
- Sven Nilsen 10 Apr 2023 21:56 UTC
  2 points
  0
  Parent
  Obviously, problems are not exclusive! I find it easier to imagine a civilization that has survived for a long time and made significant technological progress: How would a such civilization approach ASI? I think they will analyze the problem to death and use automated theorem proving as much as possible and having a culture where only a tiny amount of ideas ever get implemented, even if most of those ideas never implemented would seem very good to us. In short: Higher standards for safety.
  
  One challenge with the “people will use it for bad stuff”-situations is that a sufficiently aligned AGI needs to be confidently non-trusting towards minds of people who in general wants to change the underlying physical processes of life as it evolved on Earth. This also holds for more bizarre and somewhat safe goals such as “make human babies have pointy ears”. It is not an X-risk, but we still don’t want that kind of stuff to happen. However, how to engineer AGI systems such that they refuse to cooperate with such people, is enormously difficult and beyond my level of intelligence.