Do you think Yann LeCun ((edited: apologies for the misspelling, thanks to RK for pointing it out)) or Joscha Bach or others who think alignment work is pointless would agree with you?
I was trying to write something that might pass their ITT, and that of many people who nod along with them.
I don’t know these people, and little of them save that Yann LeCun spells his name that way and pooh-poohs the very idea of even “ordinary” dangers from AI, comparing the six-month pause to a six-month pause on the invention of printing or fire. These people, or at least Yann LeCun, foresee business as usual during alterations.
If you want to compare them to fools of God drunk on their blind faith that how bad could it be ha ha ha ha ha ha bonk, well, I can’t say you’re wrong, but I don’t know what they would say.
Eliezer foresees DOOM.
I don’t think the intermediate position of a six-month pause makes any sense. It’s going to be either business as usual or DOOM, and no-one has said except in terms of applause lights what is supposed to happen in that six months. I find it telling that as far as I have seen, no-one engages with Eliezer’s arguments, but they resort to name-calling instead. Just look at recent postings on LessWrong on the subject and weep that the Sequences were written in vain.
… I don’t think it’s true that no one engages with Yud’s arguments at all?
Quintin Pope does here, for instance, and Yud basically just doesn’t respond, to a rounding error.
I also don’t think his arguments are well articulated to-be-able-to-be responded to. They lack epistemic legibility quite badly, as is evident from the MIRI dialogues where people try really hard to engage with them and often just fail to manage to make different predictions.
Quentin’s claims seem to rely on something like “common sense humanism” but I don’t see a process connected to the discussion that will reliably cause common sense humanism to be the only possible outcome.
Metaphorically: There is a difference between someone explaining how easy it is to ride a bike vs someone explaining how much it costs to mine and refine metal with adequate tensile strength for a bicycle seatpost that will make it safe for overweight men to also ride the bike, not just kids.
A lot of the nuanced and detailed claims in Quentin’s post might be true, but he did NOT explain (1) how he was going get funding to make a “shard-aligned AGI” on a reasonable time frame, or (2) how he would execute adequately if he did get funding and definitely not make an error and let something out of the lab that isn’t good for the world, and (3) also would go fast enough that no other lab would make the errors he thinks he would not make before he gets results that could “make the errors of other labs irrelevant to the future of the world”.
I grant that I didn’t read very thoroughly. Did you see a funding component and treaty system in his plan that I missed?
I don’t think Quintin’s claims are of the kind where he needs to propose a funding component / treaty system.
They’re of the kind where he thinks the representations ML systems learn, and the shards of behavior they acquire, make it just not super inevitable for malign optimizers to come into existence, given that the humans training models don’t want to produce malign optimizers. Or, i.e., Bensinger’s intuition about a gravitational well of optimization out there indifferent to human values is just plain wrong, at least as applied to actual minds we are likely to make.
Quintin could be wrong—I think he’s right, and his theory retrodicts the behavior of systems we have, while Bensinger et al. make no specific retrodictions as far as I know, apart from generic retrodictions standard ML theory also makes—but not including a funding component and treaty system isn’t an argument against it, because the theory is about how small in probable-future mindspace malign optimizers are, not about a super-careful way of avoiding malign optimizers that loom large in future-probable mindspace.
If you have a good plan for how that could help then I might be able to muster some tears? But I doubt it shows up as a step in winning plans.
If you want to compare them to friendly fools drunk on their blind faith that how bad could it be ha ha, well, I can’t say you’re wrong, but I don’t know what they would say.
This write-up felt like it was net positive to write in almost any world? (And if you think I’m wrong, please let me know in public or in private.)
First, the comparison to “Friendly Drunk Fools” might awaken up some honor-loving Timocrats to the folly of their essential confusion? Surely no one wants to be seriously associated with something as lacking in the halo of prestige as the “Friendly Drunk Fool” plan, right?
Second, I really do think that Myerson–Satterthwaite is a deep result that relates honesty, incentives, and servant leadership in a non-trivial way. It kind of predicts potlatching, as a practice! Almost anyone who understands that “incentive compatibility” is a magic word, who hasn’t looked at this theorem, should study the MST some more. (And if they don’t know about incentive compatibility then start with that.)
Third, it might work for the people who insist that it isn’t a confused plan, who accuse me of creating a straw-man, to attempt a dunk on me by steelmanning the FDF somehow into something as coherent and workable as a safe bridge design and that would be… better than the alternatives!
I had fourth, fifth, and sixth points, but they are plausibly pointless to talk about in public. The fourth one descended into quoting Solzhenitsyn, which is usually a sign one should wrap up their speech <3
Do you think Yann LeCun ((edited: apologies for the misspelling, thanks to RK for pointing it out)) or Joscha Bach or others who think alignment work is pointless would agree with you?
I was trying to write something that might pass their ITT, and that of many people who nod along with them.
I don’t know these people, and little of them save that Yann LeCun spells his name that way and pooh-poohs the very idea of even “ordinary” dangers from AI, comparing the six-month pause to a six-month pause on the invention of printing or fire. These people, or at least Yann LeCun, foresee business as usual during alterations.
If you want to compare them to fools of God drunk on their blind faith that how bad could it be ha ha ha ha ha ha bonk, well, I can’t say you’re wrong, but I don’t know what they would say.
Eliezer foresees DOOM.
I don’t think the intermediate position of a six-month pause makes any sense. It’s going to be either business as usual or DOOM, and no-one has said except in terms of applause lights what is supposed to happen in that six months. I find it telling that as far as I have seen, no-one engages with Eliezer’s arguments, but they resort to name-calling instead. Just look at recent postings on LessWrong on the subject and weep that the Sequences were written in vain.
… I don’t think it’s true that no one engages with Yud’s arguments at all?
Quintin Pope does here, for instance, and Yud basically just doesn’t respond, to a rounding error.
I also don’t think his arguments are well articulated to-be-able-to-be responded to. They lack epistemic legibility quite badly, as is evident from the MIRI dialogues where people try really hard to engage with them and often just fail to manage to make different predictions.
Quentin’s claims seem to rely on something like “common sense humanism” but I don’t see a process connected to the discussion that will reliably cause common sense humanism to be the only possible outcome.
Metaphorically: There is a difference between someone explaining how easy it is to ride a bike vs someone explaining how much it costs to mine and refine metal with adequate tensile strength for a bicycle seatpost that will make it safe for overweight men to also ride the bike, not just kids.
A lot of the nuanced and detailed claims in Quentin’s post might be true, but he did NOT explain (1) how he was going get funding to make a “shard-aligned AGI” on a reasonable time frame, or (2) how he would execute adequately if he did get funding and definitely not make an error and let something out of the lab that isn’t good for the world, and (3) also would go fast enough that no other lab would make the errors he thinks he would not make before he gets results that could “make the errors of other labs irrelevant to the future of the world”.
I grant that I didn’t read very thoroughly. Did you see a funding component and treaty system in his plan that I missed?
I don’t think Quintin’s claims are of the kind where he needs to propose a funding component / treaty system.
They’re of the kind where he thinks the representations ML systems learn, and the shards of behavior they acquire, make it just not super inevitable for malign optimizers to come into existence, given that the humans training models don’t want to produce malign optimizers. Or, i.e., Bensinger’s intuition about a gravitational well of optimization out there indifferent to human values is just plain wrong, at least as applied to actual minds we are likely to make.
Quintin could be wrong—I think he’s right, and his theory retrodicts the behavior of systems we have, while Bensinger et al. make no specific retrodictions as far as I know, apart from generic retrodictions standard ML theory also makes—but not including a funding component and treaty system isn’t an argument against it, because the theory is about how small in probable-future mindspace malign optimizers are, not about a super-careful way of avoiding malign optimizers that loom large in future-probable mindspace.
I don’t think that weeping helps?
If you have a good plan for how that could help then I might be able to muster some tears? But I doubt it shows up as a step in winning plans.
This write-up felt like it was net positive to write in almost any world? (And if you think I’m wrong, please let me know in public or in private.)
First, the comparison to “Friendly Drunk Fools” might awaken up some honor-loving Timocrats to the folly of their essential confusion? Surely no one wants to be seriously associated with something as lacking in the halo of prestige as the “Friendly Drunk Fool” plan, right?
Second, I really do think that Myerson–Satterthwaite is a deep result that relates honesty, incentives, and servant leadership in a non-trivial way. It kind of predicts potlatching, as a practice! Almost anyone who understands that “incentive compatibility” is a magic word, who hasn’t looked at this theorem, should study the MST some more. (And if they don’t know about incentive compatibility then start with that.)
Third, it might work for the people who insist that it isn’t a confused plan, who accuse me of creating a straw-man, to attempt a dunk on me by steelmanning the FDF somehow into something as coherent and workable as a safe bridge design and that would be… better than the alternatives!
I had fourth, fifth, and sixth points, but they are plausibly pointless to talk about in public. The fourth one descended into quoting Solzhenitsyn, which is usually a sign one should wrap up their speech <3
I am warming to your style. :)