Chris_Leong comments on Finding the Wisdom to Build Safe AI

Chris_Leong 4 Jul 2024 23:02 UTC
LW: 6 AF: 3
3
AF
My intuition is that the best way to build wise AI would be to train imitation learning agents on people who we consider to be wise. If we trained imitations of people with a variety of perspectives, we could then simulate discussions between them and try to figure out the best discussion formats between such agents. This could likely get us reasonably far.

The reason why I say imitation learning is because that would give us something that we could treat as an optimisation target which is what we require for training ML systems.
- Gordon Seidoh Worley 5 Jul 2024 15:36 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Seems reasonable. I do still worry quite a bit about Goodharting, but perhaps this could be reasonably addressed with careful oversight by some wise humans to do the wisdom equivalent of red teaming.
  - Chris_Leong 6 Jul 2024 0:24 UTC
    LW: 2 AF: 1
    0
    AF Parent
    You mean it might still Goodhart to what we think they might say? Ideally, the actual people would be involved in the process.