[Question] Competence vs Alignment

Ariel Kwiatkowski30 Sep 2020 21:03 UTC

7 points

Is there a reliable way of determining whether an agent is misaligned, as compared to just incompetent? And can we talk of such distinction at all? (I suppose the orthogonality thesis implies that we can)

Let’s say I give you an agent with suboptimal performance. Without having insights inside its “brain”, and only observing its behavior, can we determine whether it’s trying to optimize the correct value function but failing, or maybe it’s just misaligned?

Ariel Kwiatkowski30 Sep 2020 21:03 UTC

7 points

4 comments1 min readLW link

AI Outer Alignment