I’ve come to think that isn’t actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn’t essentially flawless:
I haven’t read that post, but I broadly agree with the excerpt. On green did a good job imo in showing how weirdly imprecise optimal human values are.
It’s true that when you stare at something with enough focus, it often loses that bit of “sacredness” which I attribute to green. As in, you might zoom in enough on the human emotion of love and discover that it’s just an endless tiling of Shrodinger’s equation.
If we discover one day that “human values” are eg 23.6% love, 15.21% adventure and 3% embezzling funds for yachts, and decide to tile the universe in exactly those proportions...[1] I don’t know, my gut doesn’t like it. Somehow, breaking it all into numbers turned humans into sock puppets reflecting the 23.6% like mindless drones.
The target “human values” seems to be incredibly small, which I guess encapsulates the entire alignment problem. So I can see how you could easily build an intuition from this along the lines of “optimizing maximally for any particular thing always goes horribly wrong”. But I’m not sure that’s correct or useful. Human values are clearly complicated, but so long as we haven’t hit a wall in deciphering them, I wouldn’t put my hands up in the air and act as if they’re indecipherable.
Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There’s a bunch of attempts to patch this, but none have really worked so far, and it doesn’t seem like any will ever work.
I’m going to read your post and see the alternative you suggest.
I haven’t read that post, but I broadly agree with the excerpt. On green did a good job imo in showing how weirdly imprecise optimal human values are.
It’s true that when you stare at something with enough focus, it often loses that bit of “sacredness” which I attribute to green. As in, you might zoom in enough on the human emotion of love and discover that it’s just an endless tiling of Shrodinger’s equation.
If we discover one day that “human values” are eg 23.6% love, 15.21% adventure and 3% embezzling funds for yachts, and decide to tile the universe in exactly those proportions...[1] I don’t know, my gut doesn’t like it. Somehow, breaking it all into numbers turned humans into sock puppets reflecting the 23.6% like mindless drones.
The target “human values” seems to be incredibly small, which I guess encapsulates the entire alignment problem. So I can see how you could easily build an intuition from this along the lines of “optimizing maximally for any particular thing always goes horribly wrong”. But I’m not sure that’s correct or useful. Human values are clearly complicated, but so long as we haven’t hit a wall in deciphering them, I wouldn’t put my hands up in the air and act as if they’re indecipherable.
I’m going to read your post and see the alternative you suggest.
Sounds like a Douglas Adams plot