HoldenKarnofsky comments on Discussion with Nate Soares on a key alignment difficulty

HoldenKarnofsky 14 Apr 2023 16:45 UTC
3 points
0
The hope discussed in this post is that you could have a system that is aligned but not superintelligent (more like human-level-ish, and aligned in the sense that it is imitation-ish), doing the kind of alignment work humans are doing today, which could hopefully lead to a more scalable alignment approach that works on more capable systems.
- Guillaume Charrier 14 Apr 2023 17:55 UTC
  1 point
  0
  Parent
  But then would a less intelligent being (i.e. the collectivity of human alignment researchers and less powerful AI systems that they use as tool in their research) be capable of validly examining a more intelligent being, without being deceived by the more intelligent being?
  - HoldenKarnofsky 14 Apr 2023 18:02 UTC
    2 points
    0
    Parent
    It seems like the same question would apply to humans trying to solve the alignment problem—does that seem right? My answer to your question is “maybe”, but it seems good to get on the same page about whether “humans trying to solve alignment” and “specialized human-ish safe AIs trying to solve alignment” are basically the same challenge.