I think that a problem with my solution is that how can the AI “understand” the behaviors and thought-processes of a “more powerful agent.” If you know what someone smarter than you would think then you are simply that smart. If we abstract the specific more-powerful-agent’s-thoughts away, then we are left with Kantian ethics, and we are back where we started, trying to put ethics/morals in the AI.
It’s a bit rude to call my idea so stupid that I must not have thought about it for more than five minutes, but thanks for your advice anyways. It is good advice.
Now break your own proposal! It’s a really useful exercise.
Why might this be difficult to implement (remember, at the end of the day we have to write code that implements this)?
How might this go wrong, even if it is implemented as specified?
In general I think people with alignment proposals should think for at least 5 minutes about why they might not work.
I think that a problem with my solution is that how can the AI “understand” the behaviors and thought-processes of a “more powerful agent.” If you know what someone smarter than you would think then you are simply that smart. If we abstract the specific more-powerful-agent’s-thoughts away, then we are left with Kantian ethics, and we are back where we started, trying to put ethics/morals in the AI.
It’s a bit rude to call my idea so stupid that I must not have thought about it for more than five minutes, but thanks for your advice anyways. It is good advice.
I didn’t intend this.