But ‘alignment is tractable when you actually work on it’ doesn’t imply ‘the only reason capabilities outgeneralized alignment in our evolutionary history was that evolution was myopic and therefore not able to do long-term planning aimed at alignment desiderata’.
I am not claiming evolution is ‘not able to do long-term planning aimed at alignment desiderata’.
I am claiming it did not even try.
If you’re myopically optimizing for two things (‘make the agent want to pursue the intended goal’ and ‘make the agent capable at pursuing the intended goal’) and one generalizes vastly better than the other, this points toward a difference between the two myopically-optimized targets.
This looks like a strong steelman of the post, which I gladly accept.
But it seemed to me that the post was arguing:
1. That alignment was hard (it mentions that technical alignment contains the hard bits, multiple specific problems in alignment), etc.
2. That current approaches do not work
That you do not get alignment by default looks like a much weaker thesis than 1&2, one that I agree with.
This would obviously be an incredibly positive development, and would increase our success odds a ton! Nate isn’t arguing ‘when you actually try to do alignment, you can never make any headway’.
This unfortunately didn’t answer my question. We all agree that it would be a positive development, my question was how much. But from my point of view, it could even be enough.
The question that I was trying to ask was: “What is the difficulty ratio that you see between alignment and capabilities?”
I understood the post as making a claim (among others) that “Alignment is very more difficult than capabilities, as evidenced by Natural Selection”.
I do not often comment on Less Wrong. (Although I am starting to, this is one of my first comment!)
Hopefully, my thoughts will become clearer as I write more, and get myself more acquainted with the local assumptions and cultural codes.
In the meanwhile, let me expand:
2 is the correct one.
But even though I read the post again with your interpretation in mind, I am still confused about why 2 is irrelevant. Consider:
On one hand, in the analogy with Natural Selection, “by default” means “When you don’t even try to do alignment, when you 100% optimize for a given goal.”. Ie: When NS optimized for IGF, capabilities generalized, but not alignment.
On the other hand, when speaking of alignment directly, “by default” means “Even if you optimize for alignment, but not having in mind some specific considerations”. Ie: Some specific alignment proposals will fail.
My point was that the former is not evidence for the latter.