You know, I considered “Bob embezzled the funds to buy malaria nets” because I KNEW someone in the comments would complain about the orphanage. Please don’t change.
Actually, the orphanage being a cached thought is precisely why I used it. The writer-pov lesson that comes with “don’t fight the hypothetical” is “don’t make your hypothetical needlessly distracting”. But maybe I miscalculated and malaria nets would be less distracting to LWers.
Anyway, I’m of course not endorsing fund-embezzling, and I think Bob is stupid. You’re right in that failure modes associated with Bob’s ambitions (eg human extinction) might be a lot worse than those of your typical fund-embezzler (eg the opportunity cost of buying yachts). I imagined Bob as being kind-hearted and stupid, but in your mind he might be some cold-blooded brooding “the price must be paid” type consequentialist. I didn’t give details either way, so that’s fair.
If you go around saying “the ends justify the means” you’re likely to make major mistakes, just like if you walk around saying “lying is okay sometimes”. The true lesson here is “don’t trust your own calculations, so don’t try being clever and blowing up TSMC”, not “consequentialism has inherent failure modes”. The ideal of consequentialism is essentially flawless; it’s when you hand it to sex-obsessed murder monkeys as an excuse to do things that shit hits the fan.
In my mind then, Bob was a good guy running on flawed hardware. Eliezer calls patching your consequentialism by making it bounded “consequentialism, one meta-level up”. For him, refusing to embezzle funds for a good cause because the plan could obviously turn sour is just another form of consequentialism. It’s like belief in intelligence, but flipped; you don’t know exactly how it’ll go wrong, but there’s a good chance you’re unfathomably stupid and you’ll make everything worse by acting on “the ends justify the means”.
From a practical standpoint though, we both agree and nothing changes: both the cold-hearted Bob and the kind Bob must be stopped. (And both are indeed more likely to make ethically dubious decisions because “the ends justify the means”.)
Post-scriptum:
Honestly the one who embezzles funds for unbounded consequentialist purposes sounds much more intellectually interesting
Yeah, this kind of story makes for good movies. When I wrote Bob I was thinking of The Wonderful Story of Mr.Sugar, by Roald Dahl and adapted by Wes Anderson on Netflix. It’s at least vaguely EA-spirited, and is kind of in that line (although the story is wholesome, as the name indicates, and isn’t meant to warn against dangers associated with boundless consequentialism at all).[1]
I think your position here is approximately-optimal within the framework of consequentialism.
It’s just that I worry that consequentialism itself is the reason we have problems like AI x-risk, in the sense that the thing that drives x-risk scenarios may be the theory of agency that is shared with consequentialism.
I’ve been working on a post—actually I’m going to temporarily add you as a co-author so you can see the draft and add comments if you’re interested—where I discuss the flaws and how I think one should approach it differently. One of the major inspirations is Against responsibility, but I’ve sort of taken inspiration from multiple places, including critics of EA and critics of economics.
The ideal of consequentialism is essentially flawless; it’s when you hand it to sex-obsessed murder monkeys as an excuse to do things that shit hits the fan.
I’ve come to think that isn’t actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn’t essentially flawless:
Now, of course, utilitarianism-in-theory was never, erm, actually very tolerant. Utilitarianism is actually kinda pissed about all these hobbies. For example: did you notice the way they aren’t hedonium? Seriously tragic. And even setting aside the not-hedonium problem (it applies to all-the-things), I checked Jim’s pleasure levels for the trashy-TV, and they’re way lower than if he got into Mozart; Mary’s stamp-collecting is actually a bit obsessive and out-of-balance; and Mormonism seems too confident about optimal amount of coffee. Oh noes! Can we optimize these backyards somehow? And Yudkowsky’s paradigm misaligned AIs are thinking along the same lines – and they’ve got the nano-bots to make it happen.
Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There’s a bunch of attempts to patch this, but none have really worked so far, and it doesn’t seem like any will ever work.
I’ve come to think that isn’t actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn’t essentially flawless:
I haven’t read that post, but I broadly agree with the excerpt. On green did a good job imo in showing how weirdly imprecise optimal human values are.
It’s true that when you stare at something with enough focus, it often loses that bit of “sacredness” which I attribute to green. As in, you might zoom in enough on the human emotion of love and discover that it’s just an endless tiling of Shrodinger’s equation.
If we discover one day that “human values” are eg 23.6% love, 15.21% adventure and 3% embezzling funds for yachts, and decide to tile the universe in exactly those proportions...[1] I don’t know, my gut doesn’t like it. Somehow, breaking it all into numbers turned humans into sock puppets reflecting the 23.6% like mindless drones.
The target “human values” seems to be incredibly small, which I guess encapsulates the entire alignment problem. So I can see how you could easily build an intuition from this along the lines of “optimizing maximally for any particular thing always goes horribly wrong”. But I’m not sure that’s correct or useful. Human values are clearly complicated, but so long as we haven’t hit a wall in deciphering them, I wouldn’t put my hands up in the air and act as if they’re indecipherable.
Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There’s a bunch of attempts to patch this, but none have really worked so far, and it doesn’t seem like any will ever work.
I’m going to read your post and see the alternative you suggest.
You know, I considered “Bob embezzled the funds to buy malaria nets” because I KNEW someone in the comments would complain about the orphanage. Please don’t change.
Actually, the orphanage being a cached thought is precisely why I used it. The writer-pov lesson that comes with “don’t fight the hypothetical” is “don’t make your hypothetical needlessly distracting”. But maybe I miscalculated and malaria nets would be less distracting to LWers.
Anyway, I’m of course not endorsing fund-embezzling, and I think Bob is stupid. You’re right in that failure modes associated with Bob’s ambitions (eg human extinction) might be a lot worse than those of your typical fund-embezzler (eg the opportunity cost of buying yachts). I imagined Bob as being kind-hearted and stupid, but in your mind he might be some cold-blooded brooding “the price must be paid” type consequentialist. I didn’t give details either way, so that’s fair.
If you go around saying “the ends justify the means” you’re likely to make major mistakes, just like if you walk around saying “lying is okay sometimes”. The true lesson here is “don’t trust your own calculations, so don’t try being clever and blowing up TSMC”, not “consequentialism has inherent failure modes”. The ideal of consequentialism is essentially flawless; it’s when you hand it to sex-obsessed murder monkeys as an excuse to do things that shit hits the fan.
In my mind then, Bob was a good guy running on flawed hardware. Eliezer calls patching your consequentialism by making it bounded “consequentialism, one meta-level up”. For him, refusing to embezzle funds for a good cause because the plan could obviously turn sour is just another form of consequentialism. It’s like belief in intelligence, but flipped; you don’t know exactly how it’ll go wrong, but there’s a good chance you’re unfathomably stupid and you’ll make everything worse by acting on “the ends justify the means”.
From a practical standpoint though, we both agree and nothing changes: both the cold-hearted Bob and the kind Bob must be stopped. (And both are indeed more likely to make ethically dubious decisions because “the ends justify the means”.)
Post-scriptum:
Yeah, this kind of story makes for good movies. When I wrote Bob I was thinking of The Wonderful Story of Mr.Sugar, by Roald Dahl and adapted by Wes Anderson on Netflix. It’s at least vaguely EA-spirited, and is kind of in that line (although the story is wholesome, as the name indicates, and isn’t meant to warn against dangers associated with boundless consequentialism at all).[1]
Let’s wait for the SBF movie on that one
I think your position here is approximately-optimal within the framework of consequentialism.
It’s just that I worry that consequentialism itself is the reason we have problems like AI x-risk, in the sense that the thing that drives x-risk scenarios may be the theory of agency that is shared with consequentialism.
I’ve been working on a post—actually I’m going to temporarily add you as a co-author so you can see the draft and add comments if you’re interested—where I discuss the flaws and how I think one should approach it differently. One of the major inspirations is Against responsibility, but I’ve sort of taken inspiration from multiple places, including critics of EA and critics of economics.
I’ve come to think that isn’t actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn’t essentially flawless:
Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There’s a bunch of attempts to patch this, but none have really worked so far, and it doesn’t seem like any will ever work.
I haven’t read that post, but I broadly agree with the excerpt. On green did a good job imo in showing how weirdly imprecise optimal human values are.
It’s true that when you stare at something with enough focus, it often loses that bit of “sacredness” which I attribute to green. As in, you might zoom in enough on the human emotion of love and discover that it’s just an endless tiling of Shrodinger’s equation.
If we discover one day that “human values” are eg 23.6% love, 15.21% adventure and 3% embezzling funds for yachts, and decide to tile the universe in exactly those proportions...[1] I don’t know, my gut doesn’t like it. Somehow, breaking it all into numbers turned humans into sock puppets reflecting the 23.6% like mindless drones.
The target “human values” seems to be incredibly small, which I guess encapsulates the entire alignment problem. So I can see how you could easily build an intuition from this along the lines of “optimizing maximally for any particular thing always goes horribly wrong”. But I’m not sure that’s correct or useful. Human values are clearly complicated, but so long as we haven’t hit a wall in deciphering them, I wouldn’t put my hands up in the air and act as if they’re indecipherable.
I’m going to read your post and see the alternative you suggest.
Sounds like a Douglas Adams plot