Thomas Kwa comments on We Need a Consolidated List of Bad AI Alignment Solutions

Thomas Kwa 5 Jul 2022 0:11 UTC
9 points
3
Now break your own proposal! It’s a really useful exercise.
- Why might this be difficult to implement (remember, at the end of the day we have to write code that implements this)?
- How might this go wrong, even if it is implemented as specified?
In general I think people with alignment proposals should think for at least 5 minutes about why they might not work.
- Double 5 Jul 2022 1:10 UTC
  3 points
  −3
  Parent
  I think that a problem with my solution is that how can the AI “understand” the behaviors and thought-processes of a “more powerful agent.” If you know what someone smarter than you would think then you are simply that smart. If we abstract the specific more-powerful-agent’s-thoughts away, then we are left with Kantian ethics, and we are back where we started, trying to put ethics/morals in the AI.
  It’s a bit rude to call my idea so stupid that I must not have thought about it for more than five minutes, but thanks for your advice anyways. It is good advice.
  - Thomas Kwa 5 Jul 2022 2:23 UTC
    9 points
    1
    Parent
    It’s a bit rude to call my idea so stupid that I must not have thought about it for more than five minutes,
    I didn’t intend this.