I think you are mischaracterizing my beliefs here.
“almost all agents getting sufficiently high performance on sufficiently hard tasks score high on some metric of coherence.”
This seems right to me. Maybe see my comment further up, I think it’s relevant to arguments we’ve had before.
This might be fine if proving things about the internal structure of an agent is overkill and we just care about behavior?
We can’t say much about the detailed internal structure of an agent, because there’s always a lot of ways to implement an algorithm. But we do only care about (generalizing) behavior, so we only need some very abstract properties relevant to that.
I think you are mischaracterizing my beliefs here.
This seems right to me. Maybe see my comment further up, I think it’s relevant to arguments we’ve had before.
We can’t say much about the detailed internal structure of an agent, because there’s always a lot of ways to implement an algorithm. But we do only care about (generalizing) behavior, so we only need some very abstract properties relevant to that.