Jeremy Gillen comments on Another argument against maximizer-centric alignment paradigms

Jeremy Gillen 23 Sep 2024 21:12 UTC
2 points
0
I think you are mischaracterizing my beliefs here.
“almost all agents getting sufficiently high performance on sufficiently hard tasks score high on some metric of coherence.”
This seems right to me. Maybe see my comment further up, I think it’s relevant to arguments we’ve had before.
This might be fine if proving things about the internal structure of an agent is overkill and we just care about behavior?
We can’t say much about the detailed internal structure of an agent, because there’s always a lot of ways to implement an algorithm. But we do only care about (generalizing) behavior, so we only need some very abstract properties relevant to that.