Probably the most ultimately consequential part of this meeting was Michael verbally confirming to Ziz that MIRI had settled with a disgruntled former employee, Louie Helm, who had put up a website slandering them.
Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren’t just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren’t at least partially true...or if someone were to go digging, they’d find things even more damning?
Those who are savvy in high-corruption equilibria maintain the delusion that high corruption is common knowledge, to justify expropriating those who naively don’t play along, by narratizing them as already knowing and therefore intentionally attacking people, rather than being lied to and confused.
Ouch.
[..]Regardless of the initial intent, scrupulous rationalists were paying rent to something claiming moral authority, which had no concrete specific plan to do anything other than run out the clock, maintaining a facsimile of dialogue in ways well-calibrated to continue to generate revenue.
Really ouch.
So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock. I donated a six figure amount to MIRI over the years, working my ass off to earn to give...and that’s it?
Fuck.
I remember being at a party in 2015 and asking Michael what else I should spend my San Francisco software engineer money on, if not the EA charities I was considering. I was surprised when his answer was, “You.”
Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren’t just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren’t at least partially true...or if someone were to go digging, they’d find things even more damning?
Louie Helm was behind MIRICult (I think as a result of some dispute where he asked for his job back after he had left MIRI and MIRI didn’t want to give him his job back). As far as I can piece together from talking to people, he did not get paid out, but there was a threat of a lawsuit which probably cost him a bunch of money in lawyers, and it was settled by both parties signing an NDA (which IMO was a dumb choice on MIRI’s part since the NDA has made it much harder to clear things up here).
Overall I am quite confident that he didn’t end up with more money than he started with after the whole miricult thing. Also, I don’t think the accusations are “at least partially true”. Like it’s not the case that literally every sentence of the miricult page is false, but basically all the salacious claims are completely made up.
So, I started off with the idea that Ziz’s claims about MIRI were frankly crazy...because Ziz was pretty clearly crazy (see their entire theory of hemispheres, “collapse the timeline,” etc.) so I marked most of their claims as delusions or manipulations and moved on, especially since their recounting of other events on the page where they talked about miricult (which is linked in OP) comes off as completely unhinged.
But Zack confirming this meeting happened and vaguely confirming its contents completely changes all the probabilities. I now need to go back and recalculate a ton of likelihoods here starting from “this node with Vassar saying this event happened.”
From Ziz’s page:
LessWrong dev Oliver Habryka said it would be inappropriate for me to post about this on LessWrong, the community’s central hub website that mostly made it. Suggested me saying this was defamation.
It’s obviously not defamation since Ziz believes its true.
<insert list of rationality community platforms I’ve been banned from for revealing the statutory rape coverup by blackmail payout with misappropriated donor funds and whistleblower silencing, and Gwen as well for protesting that fact.>
Inasmuch as this is true, this is weak Bayesian evidence that Ziz’s accusations are more true than false because otherwise you would just post something like your above response to me in response to them. “No, actually official people can’t talk about this because there’s an NDA, but I’ve heard second hand there’s an NDA” clears a lot up, and would have been advantageous to post earlier, so why wasn’t it?
It’s obviously not defamation since Ziz believes its true.
We’re veering dangerously close into dramaposting here, but just FYI habyka has already contested that they ever said this. I would like to know if the ban accusations are true, though.
Can confirm that I don’t believe I said anything about defamation, and in general continue to think that libel suits are really quite bad and do not think they are an appropriate tool in almost any circumstance.
I don’t think we ever took any other moderation action, though I would likely ban then again, since like, I really don’t want them around on LessWrong and they have far surpassed thresholds for acceptable behavior.
I would not ban anyone writing up details of the miricult stuff (including false accusations, and relatively strong emotions). Indeed somewhat recently I wrote like 3-5 pages of content here on a private Facebook thread with a lot of rationality community people on it. I would be up for someone extracting the parts that seem shareable more broadly. Seems good to finally have something more central and public.
The author shares how terrible it feels that X is true, without bringing arguments for X being true in the first place (based on me skimming the post). That can bypass the reader’s fact-check (because why would he write about how bad it made him feel that X is true if it wasn’t?).
It feels to me like he’s trying to combine an emotional exposition (no facts, talking about his feelings) with an expository blogpost (explaining a topic), while trying to grab the best of both worlds (the persuasiveness and emotions of the former and the social status of the latter) without the substance to back it up.
Which IMO was a dumb choice on MIRI’s part since the NDA has made it much harder to clear things up here
The lack of comment from Eliezer and other MIRI personnel had actually convinced me in particular that the claims were true. This is the first I heard that there’s any kind of NDA preventing them from talking about it.
The lack of comment from Eliezer and other MIRI personnel had actually convinced me in particular that the claims were true. This is the first I heard that there’s any kind of NDA preventing them from talking about it.
I think this means you had incorrect priors (about how often legal cases conclude with settlements containing nondisparagement agreements.)
You can confirm this if you’re aware that it’s a possibility, and interpret carefully-phrased refusals to comment in a way that’s informed by reasonable priors. You should not assume that anyone is able to directly tell you that an agreement exists.
Why not? Is it common for NDAs/non-disparagement agreements to also have a clause stating the parties aren’t allowed to tell anyone about it? I’ve never heard of this outside of super-injunctions which seems a pretty separate thing
Absolutely common. Most non-disparagement agreements are paired with non-disclosure agreements (or clauses in the non-disparagement wording) that prohibit talking about the agreement, as much as talking about the forbidden topics.
It’s pretty obvious to lawyers that “I would like to say this, but I have a legal agreement that I won’t” is equivalent, in many cases, to saying it outright.
“he didn’t end up with more money than he started with after the whole miricult thing” is such a weirdly specific way to phrase things.
My speculation from this is that MIRI paid Helm or his lawyers some money, but less money than Helm had spent on the harassment campaign, and among people who know the facts there is a semantic disagreement about whether this constitutes a “payout”. Some people say something like “it’s a financial loss for Helm, so game-theoretically it doesn’t provide an incentive to blackmail, therefore it’s fine” and others say something like “if you pay out money in response to blackmail, that’s a blackmail payout, you don’t get to move the bar like that”.
I would appreciate it if someone who knows what happened can confirm or deny this.
(AFAICT the only other possibility is that somewhere along the line, at least one of the various sources of contradictory-sounding rumors was just lying-or-so-careless-as-to-be-effectively-lying. Which is very possible, of course, that happens with rumors a lot.)
I sadly don’t know the answer to this. To open up the set of possibilities further, I have heard rumors that maybe Louie was demanding some donations back he had given MIRI previously, and if that happened, that might also complicate the definition of a “payout”.
and others say something like “if you pay out money in response to blackmail, that’s a blackmail payout, you don’t get to move the bar like that”.
I don’t understand the logic of this. Does seem like game-theoretically the net-payout is really what matters. What would be the argument for something else mattering?
I don’t understand the logic of this. Does seem like game-theoretically the net-payout is really what matters. What would be the argument for something else mattering?
BEORNWULF: A messenger from the besiegers!
WIGMUND: Send him away. We have nothing to discuss with the norsemen while we are at war.
AELFRED: We might as well hear them out. This siege is deadly dull. Norseman, deliver your message, and then leave so that we may discuss our reply.
MESSENGER: Sigurd bids me say that if you give us two thirds of the gold in your treasury, our army will depart. He reminds you that if this siege goes on, you will lose the harvest, and this will cost you more dearly than the gold he demands.
The messenger exits.
AELFRED: Ah. Well, I can’t blame him for trying. But no, certainly not.
BEORNWULF: Hold on, I know what you’re thinking, but this actually makes sense. When Sigurd’s army first showed up, I was the first to argue against paying him off. After all, if we’d paid right at the start, then he would’ve made a profit on the attack, and it would only encourage more. But the siege has been long and hard for us both. If we accept this deal *now*, he’ll take a net loss. We’ve spent most of the treasury resisting the siege—
WIGMUND: As we should! Millions for defense, but not one cent for tribute!
BEORNWULF: Certainly. But the gold we have left won’t even cover what they’ve already spent on their attack. Their net payout will still be negative, so game-theoretically, it doesn’t make sense to think of it as “tribute”. As long as we’re extremely sure they’re in the red, we should minimize our own costs, and missing the harvest would be a *huge* cost. People will starve. The deal is a good one.
WIGMUND: Never! if once you have paid him the danegeld, you never get rid of the Dane!
BEORNWULF: Not quite. The mechanism matters. The Dane has an incentive to return *only if the danegeld exceeds his costs*.
WIGMUND: Look, you can mess with the categories however you like, and find some clever math that justifies doing whatever you’ve already decided you want to do. None of that constrains your behavior and so none of that matters. What matters is, take away all the fancy definitions and you’re still just paying danegeld.
BEORNWULF: How can I put this in language you’ll understand—it doesn’t matter whether the definitions support what *I* want to do, it matters whether the definitions reflect the *norsemen’s* decision algorithm. *They* care about the net payout, not the gross payout.
AELFRED: Hold on. Are you modeling the norsemen as profit-maximizers?
BEORNWULF: More or less? I mean, no one is perfectly rational, but yeah, everyone *approximates* a rational profit-maximizer.
WIGMUND: They are savage, irrational heathens! They never even study game theory!
BEORNWULF: Come on. I’ll grant that they don’t use the same jargon we do, but they attack because they expect to make a profit off it. If they don’t expect to profit, they’ll stop. Surely they do *that* much even without explicit game theoretic proofs.
AELFRED: That affects their decision, yes, but it’s far from the whole story. The norsemen care about more than just gold and monetary profit. They care about pride. Dominance. Social rank and standing. Their average warrior is a young man in his teens or early twenties. When he decides whether to join the chief’s attack, he’s not sitting down with spreadsheets and a green visor to compute the expected value, he’s remembering that time cousin Guthrum showed off the silver chalice he looted from Lindisfarne. Remember, Sigurd brought the army here in the first place to avenge his brother’s death—
BEORNWULF: That’s a transparent pretext! He can’t possibly blame us for that, we killed Agnarr in self-defense during the raid on the abbey.
WIGMUND: You can tell that to Sigurd. If it had been my brother, I’d avenge him too.
AELFRED: Among their people, when a man is murdered, it’s not a *tragedy* to his family, it’s an *insult*. It can only be wiped away with either a weregeld payment from the murderer or a blood feud. Yes, Sigurd cares about gold, but he also cares tremendously about *personally knowing he defeated us*, in order to remove the shame we dealt him by killing Agnarr. Modeling his decisions as profit-maximizing will miss a bunch of his actual decision criteria and constraints, and therefore fail to predict the norsemen’s future actions.
WIGMUND: You’re overcomplicating this. If we pay, the norsemen will learn that we pay, and more will come. If we do not pay, they will learn that we do not pay, and fewer will come.
BEORNWULF: They don’t care if we *pay*, they care if it’s *profitable*. This is basic accounting.
AELFRED: They *do* care if we pay. Most of them won’t know or care what the net-payout is. If we pay tribute, this will raise Sigurd’s prestige in their eyes no matter how much he spent on the expedition, and he needs his warriors’ support more than he needs our gold. Taking a net loss won’t change his view on whether he’s avenged the insult to his family, and we do *not* want the Norsemen to think they can get away with coming here to avenge “insults” like killing their raiders in self-defense. On the other hand, if Sigurd goes home doubly shamed by failing to make us submit, they’ll think twice about trying that next time.
BEORNWULF: I don’t care about insults. I don’t care what Sigurd’s warriors think of him. I don’t care who can spin a story of glorious victory or who ends up feeling like they took a shameful defeat. I care about how many of our people will die on norse spears, and how many of our people will die of famine if we don’t get the harvest in. All that other stuff is trivial bullshit in comparison.
AELFRED: That all makes sense. You still ought to track those things instrumentally. The norsemen care about all that, and it affects their behavior. If you want a model of how to deter them, you have to model the trivial bullshit that they care about. If you abstract away what they *actually do* care about with a model of what you think they *ought* to care about, then your model *won’t work*, and you might find yourself surprised when they attack again because they correctly predict that you’ll cave on “trivial bullshit”. Henry IV could swallow his pride and say “Paris is well worth a mass”, but that was because he was *correctly modeling* the Parisians’ pride.
WIGMUND: Wait. That is *wildly* anachronistic. Henry converted to Catholicism in 1593. This dialogue is taking place in, what, probably the 9th century?
AELFRED: Hey, I didn’t make a fuss when you quoted Kipling.
I don’t understand the logic of this. Does seem like game-theoretically the net-payout is really what matters. What would be the argument for something else mattering?
Suppose that Alice blackmails me and I pay her $1,000,000. Alice has spent $1,500,000 on lawyers in the process of extracting this payout from me. The result of this interaction is that I have lost $1,000,000, while Alice has lost $500,000. (Alice’s lawyers have made a lot of money, of course.)
Bob hears about this. He correctly realizes that I am blackmailable. He talks to his lawyer, and they sign a contract whereby the lawyer gets half of any payout that they’re able to extract from me. Bob blackmails me and I pay him $1,000,000. Bob keeps $500,000, and his lawyer gets the other $500,000. Now I have again lost $1,000,000, while Bob has gained $500,000.
(How might this happen? Well, Bob’s lawyer is better than Alice’s lawyers were. Bob’s also more savvy, and knows how to find a good lawyer, how to negotiate a good contract, etc.)
That is: once the fact that you’re blackmailable is known, the net payout (taking into account expenditures needed to extract it from you) is not relevant, because those expenditures cannot be expected to hold constant—because they can be optimized. And the fact that (as is now a known fact) money can be extracted from you by blackmail, is the incentive to optimize them.
Note that a lawyer who participated in that would be committing a crime. In the case of LH, there was (by my unreliable secondhand understanding) an employment-contract dispute and a blackmail scheme happening concurrently. The lawyers would have been involved only in the employment-contract dispute, not in the blackmail, and any settlement reached would have nominally been only for dropping the employment-contract-related claims. An ordinary employment dispute is a common-enough thing that each side’s lawyers would have experience estimating the other side’s costs at each stage of litigation, and using those estimates as part of a settlement negotiation.
(Filing lawsuits without merit is sometimes analogized to blackmail, but US law defines blackmail much more narrowly, in such a way that asking for payment to not allege statutory rape on a website is blackmail, but asking for payment to not allege unfair dismissal in a civil court is not.)
Then it sounds like the blackmailer in question spent $0 on perpetrating the blackmail, which is even more of an incentive for others to blackmail MIRI in the future.
Then it sounds like the blackmailer in question spent $0 on perpetrating the blackmail
No, that’s not what I said (and is false). To estimate the cost you have to compare the outcome of the legal case to the counterfactual baseline in which there was no blackmail happening on the side (that baseline is not zero), and you have to include other costs besides lawyers.
To estimate the cost you have to compare the outcome of the legal case to the counterfactual baseline in which there was no blackmail happening on the side (that baseline is not zero)
Seems wrong to me. Opportunity cost is not the same as expenditures.
and you have to include other costs besides lawyers.
Sure, I agree that this is true. But as long as you run a policy that is sensitive to your counterparty optimizing expenditures, I think this no longer holds?
Like, I think in-general a policy I have for stuff like this is something like “ensure the costs to my counterparty were higher than their gains”, and then take actions appropriate to the circumstances. This seems like it wouldn’t allow for the kind of thing you describe above (and also seems like the most natural strategy for me in blackmail cases like this).
But as long as you run a policy that is sensitive to your counterparty optimizing expenditures, I think this no longer holds?
What would this look like…? It doesn’t seem to me to be the sort of thing which it’s at all feasible to do in practice. Indeed it’s hard to see what this would even mean; if the end result is that you pay out sometimes and refuse other times, all that happens is that external observers conclude “he pays out sometimes”, and keep blackmailing you.
in-general a policy I have for stuff like this is something like “ensure the costs to my counterparty were higher than their gains”, and then take actions appropriate to the circumstances
Actions like what?
Like, let’s say that you’re MIRI and you’re being blackmailed. You don’t know how much your blackmailer is paying his lawyers (why would you, after all?). What do you do?
And for all you know, the contract your blackmailer’s got with his lawyers might be as I described—lawyers get some percent of payout, and nothing if there’s no payout. What costs do you impose on the blackmailer?
In short, I think the policy you describe is usually impossible to implement in practice.
But note that this is all tangential. It’s only relevant to the original question (about MIRI) if you claim that MIRI were attempting to implement a policy such as you describe. Do you claim this? If so, have you any evidence?
I mean, the policy here really doesn’t seem very hard. If you do know how much your opposing party is paying their lawyers, you optimize that hard. If you don’t know, you make some conservative estimate. I’ve run policies like this in lots of different circumstances, and it’s also pretty close to common sense as a response to blackmail and threats.
Do you claim this? If so, have you any evidence?
I’ve asked some MIRI people this exact question and they gave me this answer, with pretty strong confidence and relatively large error margins.
I have to admit that I still haven’t the faintest clue what concrete behavior you’re actually suggesting. I repeat my questions: “What would this look like…?” and “Actions like what?” (Indeed, since—as I understand it—you say you’ve done this sort of thing, can you give concrete examples from those experiences?)
I’ve asked some MIRI people this exact question and they gave me this answer, with pretty strong confidence and relatively large error margins.
Alright, and what has this looked like in practice for MIRI…?
It means you sit down, you make some fermi estimates of how much benefit the counterparty could be deriving from this threat/blackmail, then you figure out what you would need to do to roughly net out to zero, then you do those things. If someone asks you what your policy is, you give this summary.
In every specific instance this looks different. Sometimes this means you reach out to people they know and let them know about the blackmailing in a way that would damage their reputation. Sometimes it means you threaten to escalate to a legal battle where you are willing to burn resources to make the counterparty come out in the red.
In every specific instance this looks different. Sometimes this means you reach out to people they know and let them know about the blackmailing in a way that would damage their reputation. Sometimes it means you threaten to escalate to a legal battle where you are willing to burn resources to make the counterparty come out in the red.
Why would you condition any of this on how much they’re spending?
And how exactly would you calibrate it to impose a specific amount of cost on the blackmailer? (How do you even map some of these things to monetary cost…?)
So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.
Er… is anyone actually claiming this? This is quite the accusation, and if it were being made, I’d want to see some serious evidence, but… is it, in fact, being made?
(It does seem like OP is saying this, but… in a weird way that doesn’t seem to acknowledge the magnitude of the accusation, and treats it as a reasonable characterization of other claims made earlier in the post. But that doesn’t actually seem to make sense. Am I misreading, or what?)
The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...
MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won’t be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.
Ooops.
The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn. Yudkowsky’s dream of building the singleton Sysop is gone and was probably never achievable in the first place.
People double down with the “mesaoptimizer” frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that would be very easy to empower private interests over public interests. Enraging to me, there’s a pattern of engagement where it seems like AI Doomers will only interact with weak arguments instead of strong ones: Yud mostly argues with low quality e/accs on twitter where it’s easy to score Ws; it was mildly surprising when he even responded with “This is kinda long.” to Quinton Pope’s objection thread.
What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
So what am I supposed to extract from this pattern of behaviour?
Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn.
I think you’ve updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.
If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they’re idiot-savants with skill gaps that prevent them from working independently, and no AI system has passed the litmus test I use for identifying good (human) programmers. They’re advancing in that direction pretty rapidly, but they’re unambiguously not there yet.
Similarly, if a treacherous turn happens, it happens no earlier than the point where AI systems can do strategic reasoning with long chains of inference; this again has an idiot-savant dynamic going on, which can create the false impression that this landmark has been reached, when in fact it hasn’t.
They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively.
Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)
They predict fast takeoff and FOOM. … Deep Learning systems don’t look like they FOOM.
It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it’s my understanding that Eliezer still predicts this. Or is that false?)
Stochastic Gradient Descent doesn’t look like it will treacherous turn.
It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
So what am I supposed to extract from this pattern of behaviour?
I think that at least some of the things you take to be obvious conclusions that Eliezer/MIRI should’ve drawn, are in fact not obvious, and some are even plausibly false.
You also make some good points. But there isn’t nearly so clear a pattern as you suggest.
It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
As I understand the argument, it goes like the following:
For evolutionary methods, you can’t predict the outcome of changes before they’re made, and so you end up with ‘throw the spaghetti at the wall and see what sticks’. At some point, those changes accumulate to a mind that’s capable of figuring out what environment it’s in and then performing well at that task, so you get what looks like an aligned agent while you haven’t actually exerted any influence on its internal goals (i.e. what it’ll do once it’s out in the world).
For gradient-descent based methods, you can predict the outcome of changes before they’re made; that’s the gradient part. It’s overall less plausible that the system you’re building figures out generic reasoning and then applies that generic reasoning to a specific task, compared to figuring out the specific reasoning for the task that you’d like solved. Jumps in the loss look more like “a new cognitive capacity has emerged in the network” and less like “the system is now reasoning about its training environment”.
Of course, that “overall less plausible” is making a handwavy argument about what simplicity metric we should be using and which design is simpler according to that metric. Related, earlier research: Are minimal circuits deceptive?
IMO this should be somewhat persuasive but not conclusive. I’m much happier with a transformer shaped by a giant English text corpus than I am with whatever is spit out by a neural-architecture-search program pointed at itself! But for cognitive megaprojects, I think you probably have to have something-like-a-mind in there, even if you got to it by SGD.
It’s pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won’t FOOM, or we otherwise needn’t do anything inconvenient to get good outcomes. It’s proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
FWIW I’m considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected. There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible. That optimism just doesn’t take it under the stop worrying threshold.
This doesn’t seem consistent to me with MIRI having run a research program with a machine learning focus. IIRC (I don’t have links handy but I’m pretty sure there were announcements made) that they wound up declaring failure on that research program, and it was only after that happened that they started talking about the world being doomed and there not being anything that seemed like it would work for aligning AGI in time.
Closest thing I’m aware of is that at the time of the AlphaGo matches he bet people at like 3:2 odds, favourable to him, that Lee Sedol would win. Link here
My interpretation of various things Michael and co. have said is “Effective altruism in general (and MIRI / AI-safety in particular) is a memeplex optimizing to extract resources from people in a fraudulent way, which does include some degree of “straightforward fraud the way most people would interpret it”, but also, their worldview includes generally seeing a lot of things as fraudulent in ways/degrees that common parlance wouldn’t generally mean.
I predict they wouldn’t phrase things the specific way iceman phrased it (but, not confidently).
Yes, this is all reasonable, but as a description of Eliezer’s behavior as understood by him, and also as understood by, like, an ordinary person, “doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock” is just… totally wrong… isn’t it?
That is, that characterization doesn’t match what Eliezer sees himself as doing, nor does it match how an ordinary person (and one who had no particular antipathy toward Eliezer, and thus was not inclined to describe his behavior uncharitably, only impartially), speaking in ordinary English, would describe Eliezer as doing—correct?
Yes, that is my belief. (Sorry, should have said that concretely). I’m not sure what an ‘ordinary person’ should think because ‘AI is dangerous’ has a lot of moving pieces and I think most people are (kinda reasonably?) epistemically helpless about the situation. But I do think iceman’s summary is basically obviously false, yes.
My own current belief is “Eliezer/MIRI probably had something-like-a-plan around 2017, probably didn’t have much of a plan by 2019 that Eliezer himself believed in, but, ‘take a break, and then come back to the problem after thinking about it’ feels like a totally reasonable thing to me to do”. (and meanwhile there were still people at MIRI working on various concrete projects that at least at the people involved thought were worthwhile).
I do think, if you don’t share Eliezer’s worldview, it’s a reasonable position to be suspicious and hypothesize that MIRI’s current activities are some sort of motivated-cognition-y cope, but I think confidentlyasserting that seems wrong to me. (I also think there’s a variety of worldviews that aren’t Eliezer’s exact worldview that make his actions still pretty coherent, and if I think it’s a pretty sketchy position to assert all those nearby-worldviews are so obviously wrong as to make ‘motivated cope/fraud’ your primary frame)
(fwiw my overall take is that I think there is something to this line of thinking. My general experience is that when Michael/Benquo/Jessica say “something is fishy here”, there often turns out to be something I agree is fishy in some sense, but I find their claims overstated and running with some other assumptions I don’t believe that make the thing seem worse to them than it does to me)
For the first part, Yudkowsky has said that he doesn’t have a workable alignment plan, and nobody does, and we are all going to die. This is not blameworthy, I also do not have a workable alignment plan.
For the second part, he was recently on a sabbatical, presumably funded by prior income that was funded by charity, so one might say he was living off donations. Not blameworthy, I also take vacations.
For the third part, everyone who thinks that we are all going to die is in some sense running out the clock, be they disillusioned transhumanists or medieval serfs. Hopefully we make some meaning while we are alive. Not blameworthy, just the human condition.
Whether MIRI is a good place to donate is a very complicated question, but certainly “no” is a valid answer for many donors.
These are good points. But it does seem like what @iceman meant by the bit that I quoted at least has connotations that go beyond your interpretation, yes?
Whether MIRI is a good place to donate is a very complicated question, but certainly “no” is a valid answer for many donors.
Sure. I haven’t donated to MIRI in many years, so I certainly wouldn’t tell anyone else to do so. (It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
What accusation do you see in the connotations of that quote? Genuine question, I could guess but I’d prefer to know. Mostly the subtext I see from iceman is disappointment and grief and anger and regret. Which are all valid emotions for them to feel.
I think a lot of what might have been serious accusations in 2019 are now common knowledge, eg after Bankless, Death with Dignity, etc.
(It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
From the Bankless interview:
How do I put it… The saner outfits do have uses for money. They don’t really have scalable uses for money, but they do burn any money literally at all. Like, if you gave MIRI a billion dollars, I would not know how to...
Well, at a billion dollars, I might try to bribe people to move out of AI development, that gets broadcast to the whole world, and move to the equivalent of an island somewhere—not even to make any kind of critical discovery, but just to remove them from the system. If I had a billion dollars.
If I just have another $50 million, I’m not quite sure what to do with that, but if you donate that to MIRI, then you at least have the assurance that we will not randomly spray money on looking like we’re doing stuff and we’ll reserve it, as we are doing with the last giant crypto donation somebody gave us until we can figure out something to do with it that is actually helpful. And MIRI has that property. I would say probably Redwood Research has that property.
So, just to clarify, “serious accusation” is not a phrase that I have written in this discussion prior to this comment, which is what the use of quotes in your comment suggests. I did write something which has more or less the same meaning! So you’re not mis-ascribing beliefs to me. But quotes mean that you’re… quoting… and that’s not the case here.
Anyway, on to the substance:
What “serious accusation” do you see in the connotations of that quote?
So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.
The connotations are that Eliezer has consciously chosen to stop working on alignment, while pretending to work on alignment, and receiving money to allegedly work on alignment but instead just not doing so, knowing that there won’t be any consequences for perpetrating this clear and obvious scam in the classic sense of the word, because the world’s going to end and he’ll never be held to account.
Needless to say, it just does not seem to me like Eliezer or MIRI are doing anything remotely like that. Indeed I don’t think anyone (serious) has even suggested that they’re doing anything like that. (The usual horde of haters on Twitter / Reddit / etc. notwithstanding.)
Mostly the subtext I see from iceman is disappointment and grief and anger and regret. Which are all valid emotions for them to feel.
But of course this is largely nonsensical in the absence of any “serious accusations”. Grief over what, anger about what? Why should these things be “valid emotions … to feel”? (And it can’t just be “we’re all going to die”, because that’s not new; we didn’t just find that out from the OP—while iceman’s comment clearly implies that whatever is the cause of his reaction, it’s something that he just learned from Zack’s post.)
I think a lot of what might have been “serious accusations” in 2019 are now common knowledge, eg after Bankless, Death with Dignity, etc.
Which is precisely why iceman’s comment does not make sense as a reply to this post, now; nor is the characterization which I quoted an accurate one.
(It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
From the Bankless interview:
Yep, I would describe that state of affairs as “not funding constrained”.
I think emotions are not blame assignment tools, and have other (evolutionary) purposes. A classic example is a relationship break-up, where two people can have strong emotions even though nobody did anything wrong. So I do not interpret emotions as accusations in general. It sounds like you have a different approach, and I don’t object to that.
Grief over what, anger about what?
For example, grief over the loss of the $100k+ donation. Donated with the hope that it would reduce extinction risk, but with the benefit of hindsight the donor now thinks that the marginal donation had no counterfactual impact. It’s not blameworthy because no researcher can possibly promise that a marginal donation will have a large counterfactual impact, and MIRI did not so promise. But a donor can still grieve the loss without someone being to blame.
For example, anger that Yudkowsky realized he had no workable alignment plan, in his estimation, in 2015 (Bankless), and didn’t share that until 2022 (Death with Dignity). This is not blameworthy because people are not morally obliged to share their extinction risk predictions, and MIRI has a clear policy against sharing information by default. But a donor can still be angry that they were disadvantaged by known unknowns.
I hope these examples illustrate that a non-accusatory interpretation is sensical, even if you don’t think it plausible.
There’s a later comment from iceman, which is probably the place to discuss what iceman is alleging:
What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
I think emotions are not blame assignment tools, and have other (evolutionary) purposes. A classic example is a relationship break-up, where two people can have strong emotions even though nobody did anything wrong. So I do not interpret emotions as accusations in general. It sounds like you have a different approach, and I don’t object to that.
You misunderstand. I’m not “interpret[ing] emotions as accusations”; I’m simply saying that emotions don’t generally arise for no reason at all (if they do, we consider that to be a pathology!).
So, in your break-up example, the two people involved of course have strong emotions—because of the break-up! On the other hand, it would be very strange indeed to wake up one day and have those same emotions, but without having broken up with anyone, or anything going wrong in your relationships at all.
And likewise, in this case:
Grief over what, anger about what?
For example, grief over the loss of the $100k+ donation. Donated with the hope that it would reduce extinction risk, but with the benefit of hindsight the donor now thinks that the marginal donation had no counterfactual impact. It’s not blameworthy because no researcher can possibly promise that a marginal donation will have a large counterfactual impact, and MIRI did not so promise. But a donor can still grieve the loss without someone being to blame.
Well, it’s bit dramatic to talk of “grief” over the loss of money, but let’s let that pass. More to the point: why is it a “loss”, suddenly? What’s happened just now that would cause iceman to view it as a “loss”? It’s got to be something in Zack’s post, or else the comment is weirdly non-apropos, right? In other words, the implication here is that something in the OP has caused iceman to re-examine the facts, and gain a new “benefit of hindsight”. But that’s just what I’m questioning.
For example, anger that Yudkowsky realized he had no workable alignment plan, in his estimation, in 2015 (Bankless), and didn’t share that until 2022 (Death with Dignity). This is not blameworthy because people are not morally obliged to share their extinction risk predictions, and MIRI has a clear policy against sharing information by default. But a donor can still be angry that they were disadvantaged by known unknowns.
I do not read Eliezer’s statements in the Bankless interview as saying that he “realized he had no workable alignment plan” in 2015. As far as I know, at no time since starting to write the Sequences has Eliezer ever claimed to have, or thought that he had, a workable alignment plan. This has never been a secret, nor is it news, either to Eliezer in 2015 or to the rest of us in 2022.
I hope these examples illustrate that a non-accusatory interpretation is sensical, even if you don’t think it plausible.
They do not.
There’s a later comment from iceman, which is probably the place to discuss what iceman is alleging:
And presumably Louie got paid out since why would you pay for silence if the accusations weren’t at least partially true
FWIW, my current understanding is that this inference isn’t correct. I think it’s common practice to pay settlements to people, even if their claims are fallacious, since having an extended court battle is sometimes way worse.
It’s not exactly the point of your story, but...
Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren’t just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren’t at least partially true...or if someone were to go digging, they’d find things even more damning?
Ouch.
Really ouch.
So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock. I donated a six figure amount to MIRI over the years, working my ass off to earn to give...and that’s it?
Fuck.
That sounds like wise advice.
Louie Helm was behind MIRICult (I think as a result of some dispute where he asked for his job back after he had left MIRI and MIRI didn’t want to give him his job back). As far as I can piece together from talking to people, he did not get paid out, but there was a threat of a lawsuit which probably cost him a bunch of money in lawyers, and it was settled by both parties signing an NDA (which IMO was a dumb choice on MIRI’s part since the NDA has made it much harder to clear things up here).
Overall I am quite confident that he didn’t end up with more money than he started with after the whole miricult thing. Also, I don’t think the accusations are “at least partially true”. Like it’s not the case that literally every sentence of the miricult page is false, but basically all the salacious claims are completely made up.
So, I started off with the idea that Ziz’s claims about MIRI were frankly crazy...because Ziz was pretty clearly crazy (see their entire theory of hemispheres, “collapse the timeline,” etc.) so I marked most of their claims as delusions or manipulations and moved on, especially since their recounting of other events on the page where they talked about miricult (which is linked in OP) comes off as completely unhinged.
But Zack confirming this meeting happened and vaguely confirming its contents completely changes all the probabilities. I now need to go back and recalculate a ton of likelihoods here starting from “this node with Vassar saying this event happened.”
From Ziz’s page:
It’s obviously not defamation since Ziz believes its true.
Inasmuch as this is true, this is weak Bayesian evidence that Ziz’s accusations are more true than false because otherwise you would just post something like your above response to me in response to them. “No, actually official people can’t talk about this because there’s an NDA, but I’ve heard second hand there’s an NDA” clears a lot up, and would have been advantageous to post earlier, so why wasn’t it?
We’re veering dangerously close into dramaposting here, but just FYI habyka has already contested that they ever said this. I would like to know if the ban accusations are true, though.
Can confirm that I don’t believe I said anything about defamation, and in general continue to think that libel suits are really quite bad and do not think they are an appropriate tool in almost any circumstance.
We banned some of them for three months when they kept spamming the CFAR AMA a while ago: https://www.lesswrong.com/posts/96N8BT9tJvybLbn5z/we-run-the-center-for-applied-rationality-ama?commentId=5W86zzFy48WiLcSg6
I don’t think we ever took any other moderation action, though I would likely ban then again, since like, I really don’t want them around on LessWrong and they have far surpassed thresholds for acceptable behavior.
I would not ban anyone writing up details of the miricult stuff (including false accusations, and relatively strong emotions). Indeed somewhat recently I wrote like 3-5 pages of content here on a private Facebook thread with a lot of rationality community people on it. I would be up for someone extracting the parts that seem shareable more broadly. Seems good to finally have something more central and public.
Two points of order, without going into any specific accusations or their absence:
The post is transphobic, which anticorrelates with being correct/truthful/objective.
It seems optimized for smoothness/persuasion, which, based on my experience, also anticorrelates with both truth and objectivity.
What seems optimized for smoothness/persuasion?
The author shares how terrible it feels that X is true, without bringing arguments for X being true in the first place (based on me skimming the post). That can bypass the reader’s fact-check (because why would he write about how bad it made him feel that X is true if it wasn’t?).
It feels to me like he’s trying to combine an emotional exposition (no facts, talking about his feelings) with an expository blogpost (explaining a topic), while trying to grab the best of both worlds (the persuasiveness and emotions of the former and the social status of the latter) without the substance to back it up.
Sorry, you’re going to need to be more specific. What particular claim X have I asserted is true without bringing arguments for it? Reply!
I agree that I’m combining emotional autobiography with topic exposition, but the reason I’m talking about my autobiography at all is because I tried object-level topic exposition for years—in such posts as “The Categories Were Made for Man to Make Predictions” (2018), “Where to Draw the Boundaries?” (2019), “Unnatural Categories Are Optimized for Deception” (2021), and “Challenges to Yudkowsky’s Pronoun Reform Proposal” (2022)—and it wasn’t working. From my perspective, the only thing left to do was jump up a metal level and talk about why it wasn’t working. If your contention is that I don’t have the substance to back up my claims, I think you should be able to explain what I got wrong in those posts. Reply!
Reply!
The lack of comment from Eliezer and other MIRI personnel had actually convinced me in particular that the claims were true. This is the first I heard that there’s any kind of NDA preventing them from talking about it.
I think this means you had incorrect priors (about how often legal cases conclude with settlements containing nondisparagement agreements.)
They can presumably confirm whether or not there is a nondisparagement agreement and whether that is preventing them from commenting though right
You can confirm this if you’re aware that it’s a possibility, and interpret carefully-phrased refusals to comment in a way that’s informed by reasonable priors. You should not assume that anyone is able to directly tell you that an agreement exists.
Why not? Is it common for NDAs/non-disparagement agreements to also have a clause stating the parties aren’t allowed to tell anyone about it? I’ve never heard of this outside of super-injunctions which seems a pretty separate thing
Absolutely common. Most non-disparagement agreements are paired with non-disclosure agreements (or clauses in the non-disparagement wording) that prohibit talking about the agreement, as much as talking about the forbidden topics.
It’s pretty obvious to lawyers that “I would like to say this, but I have a legal agreement that I won’t” is equivalent, in many cases, to saying it outright.
my boilerplate severance agreement at a job included an NDA that couldn’t be acknowledged (I negotiated to change this).
“he didn’t end up with more money than he started with after the whole miricult thing” is such a weirdly specific way to phrase things.
My speculation from this is that MIRI paid Helm or his lawyers some money, but less money than Helm had spent on the harassment campaign, and among people who know the facts there is a semantic disagreement about whether this constitutes a “payout”. Some people say something like “it’s a financial loss for Helm, so game-theoretically it doesn’t provide an incentive to blackmail, therefore it’s fine” and others say something like “if you pay out money in response to blackmail, that’s a blackmail payout, you don’t get to move the bar like that”.
I would appreciate it if someone who knows what happened can confirm or deny this.
(AFAICT the only other possibility is that somewhere along the line, at least one of the various sources of contradictory-sounding rumors was just lying-or-so-careless-as-to-be-effectively-lying. Which is very possible, of course, that happens with rumors a lot.)
I sadly don’t know the answer to this. To open up the set of possibilities further, I have heard rumors that maybe Louie was demanding some donations back he had given MIRI previously, and if that happened, that might also complicate the definition of a “payout”.
I don’t understand the logic of this. Does seem like game-theoretically the net-payout is really what matters. What would be the argument for something else mattering?
BEORNWULF: A messenger from the besiegers!
WIGMUND: Send him away. We have nothing to discuss with the norsemen while we are at war.
AELFRED: We might as well hear them out. This siege is deadly dull. Norseman, deliver your message, and then leave so that we may discuss our reply.
MESSENGER: Sigurd bids me say that if you give us two thirds of the gold in your treasury, our army will depart. He reminds you that if this siege goes on, you will lose the harvest, and this will cost you more dearly than the gold he demands.
The messenger exits.
AELFRED: Ah. Well, I can’t blame him for trying. But no, certainly not.
BEORNWULF: Hold on, I know what you’re thinking, but this actually makes sense. When Sigurd’s army first showed up, I was the first to argue against paying him off. After all, if we’d paid right at the start, then he would’ve made a profit on the attack, and it would only encourage more. But the siege has been long and hard for us both. If we accept this deal *now*, he’ll take a net loss. We’ve spent most of the treasury resisting the siege—
WIGMUND: As we should! Millions for defense, but not one cent for tribute!
BEORNWULF: Certainly. But the gold we have left won’t even cover what they’ve already spent on their attack. Their net payout will still be negative, so game-theoretically, it doesn’t make sense to think of it as “tribute”. As long as we’re extremely sure they’re in the red, we should minimize our own costs, and missing the harvest would be a *huge* cost. People will starve. The deal is a good one.
WIGMUND: Never! if once you have paid him the danegeld, you never get rid of the Dane!
BEORNWULF: Not quite. The mechanism matters. The Dane has an incentive to return *only if the danegeld exceeds his costs*.
WIGMUND: Look, you can mess with the categories however you like, and find some clever math that justifies doing whatever you’ve already decided you want to do. None of that constrains your behavior and so none of that matters. What matters is, take away all the fancy definitions and you’re still just paying danegeld.
BEORNWULF: How can I put this in language you’ll understand—it doesn’t matter whether the definitions support what *I* want to do, it matters whether the definitions reflect the *norsemen’s* decision algorithm. *They* care about the net payout, not the gross payout.
AELFRED: Hold on. Are you modeling the norsemen as profit-maximizers?
BEORNWULF: More or less? I mean, no one is perfectly rational, but yeah, everyone *approximates* a rational profit-maximizer.
WIGMUND: They are savage, irrational heathens! They never even study game theory!
BEORNWULF: Come on. I’ll grant that they don’t use the same jargon we do, but they attack because they expect to make a profit off it. If they don’t expect to profit, they’ll stop. Surely they do *that* much even without explicit game theoretic proofs.
AELFRED: That affects their decision, yes, but it’s far from the whole story. The norsemen care about more than just gold and monetary profit. They care about pride. Dominance. Social rank and standing. Their average warrior is a young man in his teens or early twenties. When he decides whether to join the chief’s attack, he’s not sitting down with spreadsheets and a green visor to compute the expected value, he’s remembering that time cousin Guthrum showed off the silver chalice he looted from Lindisfarne. Remember, Sigurd brought the army here in the first place to avenge his brother’s death—
BEORNWULF: That’s a transparent pretext! He can’t possibly blame us for that, we killed Agnarr in self-defense during the raid on the abbey.
WIGMUND: You can tell that to Sigurd. If it had been my brother, I’d avenge him too.
AELFRED: Among their people, when a man is murdered, it’s not a *tragedy* to his family, it’s an *insult*. It can only be wiped away with either a weregeld payment from the murderer or a blood feud. Yes, Sigurd cares about gold, but he also cares tremendously about *personally knowing he defeated us*, in order to remove the shame we dealt him by killing Agnarr. Modeling his decisions as profit-maximizing will miss a bunch of his actual decision criteria and constraints, and therefore fail to predict the norsemen’s future actions.
WIGMUND: You’re overcomplicating this. If we pay, the norsemen will learn that we pay, and more will come. If we do not pay, they will learn that we do not pay, and fewer will come.
BEORNWULF: They don’t care if we *pay*, they care if it’s *profitable*. This is basic accounting.
AELFRED: They *do* care if we pay. Most of them won’t know or care what the net-payout is. If we pay tribute, this will raise Sigurd’s prestige in their eyes no matter how much he spent on the expedition, and he needs his warriors’ support more than he needs our gold. Taking a net loss won’t change his view on whether he’s avenged the insult to his family, and we do *not* want the Norsemen to think they can get away with coming here to avenge “insults” like killing their raiders in self-defense. On the other hand, if Sigurd goes home doubly shamed by failing to make us submit, they’ll think twice about trying that next time.
BEORNWULF: I don’t care about insults. I don’t care what Sigurd’s warriors think of him. I don’t care who can spin a story of glorious victory or who ends up feeling like they took a shameful defeat. I care about how many of our people will die on norse spears, and how many of our people will die of famine if we don’t get the harvest in. All that other stuff is trivial bullshit in comparison.
AELFRED: That all makes sense. You still ought to track those things instrumentally. The norsemen care about all that, and it affects their behavior. If you want a model of how to deter them, you have to model the trivial bullshit that they care about. If you abstract away what they *actually do* care about with a model of what you think they *ought* to care about, then your model *won’t work*, and you might find yourself surprised when they attack again because they correctly predict that you’ll cave on “trivial bullshit”. Henry IV could swallow his pride and say “Paris is well worth a mass”, but that was because he was *correctly modeling* the Parisians’ pride.
WIGMUND: Wait. That is *wildly* anachronistic. Henry converted to Catholicism in 1593. This dialogue is taking place in, what, probably the 9th century?
AELFRED: Hey, I didn’t make a fuss when you quoted Kipling.
This was fantastic, and you should post it as a top level post.
Suppose that Alice blackmails me and I pay her $1,000,000. Alice has spent $1,500,000 on lawyers in the process of extracting this payout from me. The result of this interaction is that I have lost $1,000,000, while Alice has lost $500,000. (Alice’s lawyers have made a lot of money, of course.)
Bob hears about this. He correctly realizes that I am blackmailable. He talks to his lawyer, and they sign a contract whereby the lawyer gets half of any payout that they’re able to extract from me. Bob blackmails me and I pay him $1,000,000. Bob keeps $500,000, and his lawyer gets the other $500,000. Now I have again lost $1,000,000, while Bob has gained $500,000.
(How might this happen? Well, Bob’s lawyer is better than Alice’s lawyers were. Bob’s also more savvy, and knows how to find a good lawyer, how to negotiate a good contract, etc.)
That is: once the fact that you’re blackmailable is known, the net payout (taking into account expenditures needed to extract it from you) is not relevant, because those expenditures cannot be expected to hold constant—because they can be optimized. And the fact that (as is now a known fact) money can be extracted from you by blackmail, is the incentive to optimize them.
Note that a lawyer who participated in that would be committing a crime. In the case of LH, there was (by my unreliable secondhand understanding) an employment-contract dispute and a blackmail scheme happening concurrently. The lawyers would have been involved only in the employment-contract dispute, not in the blackmail, and any settlement reached would have nominally been only for dropping the employment-contract-related claims. An ordinary employment dispute is a common-enough thing that each side’s lawyers would have experience estimating the other side’s costs at each stage of litigation, and using those estimates as part of a settlement negotiation.
(Filing lawsuits without merit is sometimes analogized to blackmail, but US law defines blackmail much more narrowly, in such a way that asking for payment to not allege statutory rape on a website is blackmail, but asking for payment to not allege unfair dismissal in a civil court is not.)
Then it sounds like the blackmailer in question spent $0 on perpetrating the blackmail, which is even more of an incentive for others to blackmail MIRI in the future.
No, that’s not what I said (and is false). To estimate the cost you have to compare the outcome of the legal case to the counterfactual baseline in which there was no blackmail happening on the side (that baseline is not zero), and you have to include other costs besides lawyers.
Seems wrong to me. Opportunity cost is not the same as expenditures.
Alright, and what costs were there?
Sorry, this isn’t a topic where I want to discuss with someone who’s being thick in the way that you’re being thick right now. Tapping out.
Sure, I agree that this is true. But as long as you run a policy that is sensitive to your counterparty optimizing expenditures, I think this no longer holds?
Like, I think in-general a policy I have for stuff like this is something like “ensure the costs to my counterparty were higher than their gains”, and then take actions appropriate to the circumstances. This seems like it wouldn’t allow for the kind of thing you describe above (and also seems like the most natural strategy for me in blackmail cases like this).
What would this look like…? It doesn’t seem to me to be the sort of thing which it’s at all feasible to do in practice. Indeed it’s hard to see what this would even mean; if the end result is that you pay out sometimes and refuse other times, all that happens is that external observers conclude “he pays out sometimes”, and keep blackmailing you.
Actions like what?
Like, let’s say that you’re MIRI and you’re being blackmailed. You don’t know how much your blackmailer is paying his lawyers (why would you, after all?). What do you do?
And for all you know, the contract your blackmailer’s got with his lawyers might be as I described—lawyers get some percent of payout, and nothing if there’s no payout. What costs do you impose on the blackmailer?
In short, I think the policy you describe is usually impossible to implement in practice.
But note that this is all tangential. It’s only relevant to the original question (about MIRI) if you claim that MIRI were attempting to implement a policy such as you describe. Do you claim this? If so, have you any evidence?
I mean, the policy here really doesn’t seem very hard. If you do know how much your opposing party is paying their lawyers, you optimize that hard. If you don’t know, you make some conservative estimate. I’ve run policies like this in lots of different circumstances, and it’s also pretty close to common sense as a response to blackmail and threats.
I’ve asked some MIRI people this exact question and they gave me this answer, with pretty strong confidence and relatively large error margins.
I have to admit that I still haven’t the faintest clue what concrete behavior you’re actually suggesting. I repeat my questions: “What would this look like…?” and “Actions like what?” (Indeed, since—as I understand it—you say you’ve done this sort of thing, can you give concrete examples from those experiences?)
Alright, and what has this looked like in practice for MIRI…?
It means you sit down, you make some fermi estimates of how much benefit the counterparty could be deriving from this threat/blackmail, then you figure out what you would need to do to roughly net out to zero, then you do those things. If someone asks you what your policy is, you give this summary.
In every specific instance this looks different. Sometimes this means you reach out to people they know and let them know about the blackmailing in a way that would damage their reputation. Sometimes it means you threaten to escalate to a legal battle where you are willing to burn resources to make the counterparty come out in the red.
Why would you condition any of this on how much they’re spending?
And how exactly would you calibrate it to impose a specific amount of cost on the blackmailer? (How do you even map some of these things to monetary cost…?)
Er… is anyone actually claiming this? This is quite the accusation, and if it were being made, I’d want to see some serious evidence, but… is it, in fact, being made?
(It does seem like OP is saying this, but… in a weird way that doesn’t seem to acknowledge the magnitude of the accusation, and treats it as a reasonable characterization of other claims made earlier in the post. But that doesn’t actually seem to make sense. Am I misreading, or what?)
The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...
MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won’t be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.
Ooops.
The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn. Yudkowsky’s dream of building the singleton Sysop is gone and was probably never achievable in the first place.
People double down with the “mesaoptimizer” frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that would be very easy to empower private interests over public interests. Enraging to me, there’s a pattern of engagement where it seems like AI Doomers will only interact with weak arguments instead of strong ones: Yud mostly argues with low quality e/accs on twitter where it’s easy to score Ws; it was mildly surprising when he even responded with “This is kinda long.” to Quinton Pope’s objection thread.
What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
So what am I supposed to extract from this pattern of behaviour?
I think you’ve updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.
If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they’re idiot-savants with skill gaps that prevent them from working independently, and no AI system has passed the litmus test I use for identifying good (human) programmers. They’re advancing in that direction pretty rapidly, but they’re unambiguously not there yet.
Similarly, if a treacherous turn happens, it happens no earlier than the point where AI systems can do strategic reasoning with long chains of inference; this again has an idiot-savant dynamic going on, which can create the false impression that this landmark has been reached, when in fact it hasn’t.
Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)
It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it’s my understanding that Eliezer still predicts this. Or is that false?)
It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
I think that at least some of the things you take to be obvious conclusions that Eliezer/MIRI should’ve drawn, are in fact not obvious, and some are even plausibly false.
You also make some good points. But there isn’t nearly so clear a pattern as you suggest.
As I understand the argument, it goes like the following:
For evolutionary methods, you can’t predict the outcome of changes before they’re made, and so you end up with ‘throw the spaghetti at the wall and see what sticks’. At some point, those changes accumulate to a mind that’s capable of figuring out what environment it’s in and then performing well at that task, so you get what looks like an aligned agent while you haven’t actually exerted any influence on its internal goals (i.e. what it’ll do once it’s out in the world).
For gradient-descent based methods, you can predict the outcome of changes before they’re made; that’s the gradient part. It’s overall less plausible that the system you’re building figures out generic reasoning and then applies that generic reasoning to a specific task, compared to figuring out the specific reasoning for the task that you’d like solved. Jumps in the loss look more like “a new cognitive capacity has emerged in the network” and less like “the system is now reasoning about its training environment”.
Of course, that “overall less plausible” is making a handwavy argument about what simplicity metric we should be using and which design is simpler according to that metric. Related, earlier research: Are minimal circuits deceptive?
IMO this should be somewhat persuasive but not conclusive. I’m much happier with a transformer shaped by a giant English text corpus than I am with whatever is spit out by a neural-architecture-search program pointed at itself! But for cognitive megaprojects, I think you probably have to have something-like-a-mind in there, even if you got to it by SGD.
It’s pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won’t FOOM, or we otherwise needn’t do anything inconvenient to get good outcomes. It’s proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
FWIW I’m considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected. There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible. That optimism just doesn’t take it under the stop worrying threshold.
This doesn’t seem consistent to me with MIRI having run a research program with a machine learning focus. IIRC (I don’t have links handy but I’m pretty sure there were announcements made) that they wound up declaring failure on that research program, and it was only after that happened that they started talking about the world being doomed and there not being anything that seemed like it would work for aligning AGI in time.
Incidentally, I don’t think I’m willing to trust a hearsay report on this without confirmation.
Do you happen to have any links to Eliezer making such a claim in public? Or, at least, any confirmation that the cited comment was made as described?
Closest thing I’m aware of is that at the time of the AlphaGo matches he bet people at like 3:2 odds, favourable to him, that Lee Sedol would win. Link here
My interpretation of various things Michael and co. have said is “Effective altruism in general (and MIRI / AI-safety in particular) is a memeplex optimizing to extract resources from people in a fraudulent way, which does include some degree of “straightforward fraud the way most people would interpret it”, but also, their worldview includes generally seeing a lot of things as fraudulent in ways/degrees that common parlance wouldn’t generally mean.
I predict they wouldn’t phrase things the specific way iceman phrased it (but, not confidently).
I think Jessicata’s The AI Timelines Scam is a pointer to the class of thing they might tend to mean. Some other relevant posts including Can crimes be discussed literally? and Approval Extraction Advertised as Production.
Yes, this is all reasonable, but as a description of Eliezer’s behavior as understood by him, and also as understood by, like, an ordinary person, “doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock” is just… totally wrong… isn’t it?
That is, that characterization doesn’t match what Eliezer sees himself as doing, nor does it match how an ordinary person (and one who had no particular antipathy toward Eliezer, and thus was not inclined to describe his behavior uncharitably, only impartially), speaking in ordinary English, would describe Eliezer as doing—correct?
Yes, that is my belief. (Sorry, should have said that concretely). I’m not sure what an ‘ordinary person’ should think because ‘AI is dangerous’ has a lot of moving pieces and I think most people are (kinda reasonably?) epistemically helpless about the situation. But I do think iceman’s summary is basically obviously false, yes.
My own current belief is “Eliezer/MIRI probably had something-like-a-plan around 2017, probably didn’t have much of a plan by 2019 that Eliezer himself believed in, but, ‘take a break, and then come back to the problem after thinking about it’ feels like a totally reasonable thing to me to do”. (and meanwhile there were still people at MIRI working on various concrete projects that at least at the people involved thought were worthwhile).
i.e. I don’t think MIRI “gave up”
I do think, if you don’t share Eliezer’s worldview, it’s a reasonable position to be suspicious and hypothesize that MIRI’s current activities are some sort of motivated-cognition-y cope, but I think confidently asserting that seems wrong to me. (I also think there’s a variety of worldviews that aren’t Eliezer’s exact worldview that make his actions still pretty coherent, and if I think it’s a pretty sketchy position to assert all those nearby-worldviews are so obviously wrong as to make ‘motivated cope/fraud’ your primary frame)
(fwiw my overall take is that I think there is something to this line of thinking. My general experience is that when Michael/Benquo/Jessica say “something is fishy here”, there often turns out to be something I agree is fishy in some sense, but I find their claims overstated and running with some other assumptions I don’t believe that make the thing seem worse to them than it does to me)
For the first part, Yudkowsky has said that he doesn’t have a workable alignment plan, and nobody does, and we are all going to die. This is not blameworthy, I also do not have a workable alignment plan.
For the second part, he was recently on a sabbatical, presumably funded by prior income that was funded by charity, so one might say he was living off donations. Not blameworthy, I also take vacations.
For the third part, everyone who thinks that we are all going to die is in some sense running out the clock, be they disillusioned transhumanists or medieval serfs. Hopefully we make some meaning while we are alive. Not blameworthy, just the human condition.
Whether MIRI is a good place to donate is a very complicated question, but certainly “no” is a valid answer for many donors.
These are good points. But it does seem like what @iceman meant by the bit that I quoted at least has connotations that go beyond your interpretation, yes?
Sure. I haven’t donated to MIRI in many years, so I certainly wouldn’t tell anyone else to do so. (It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
What accusation do you see in the connotations of that quote? Genuine question, I could guess but I’d prefer to know. Mostly the subtext I see from iceman is disappointment and grief and anger and regret. Which are all valid emotions for them to feel.
I think a lot of what might have been serious accusations in 2019 are now common knowledge, eg after Bankless, Death with Dignity, etc.
From the Bankless interview:
(Edited to fix misquote)
So, just to clarify, “serious accusation” is not a phrase that I have written in this discussion prior to this comment, which is what the use of quotes in your comment suggests. I did write something which has more or less the same meaning! So you’re not mis-ascribing beliefs to me. But quotes mean that you’re… quoting… and that’s not the case here.
Anyway, on to the substance:
And the quote in question, again, is:
The connotations are that Eliezer has consciously chosen to stop working on alignment, while pretending to work on alignment, and receiving money to allegedly work on alignment but instead just not doing so, knowing that there won’t be any consequences for perpetrating this clear and obvious scam in the classic sense of the word, because the world’s going to end and he’ll never be held to account.
Needless to say, it just does not seem to me like Eliezer or MIRI are doing anything remotely like that. Indeed I don’t think anyone (serious) has even suggested that they’re doing anything like that. (The usual horde of haters on Twitter / Reddit / etc. notwithstanding.)
But of course this is largely nonsensical in the absence of any “serious accusations”. Grief over what, anger about what? Why should these things be “valid emotions … to feel”? (And it can’t just be “we’re all going to die”, because that’s not new; we didn’t just find that out from the OP—while iceman’s comment clearly implies that whatever is the cause of his reaction, it’s something that he just learned from Zack’s post.)
Which is precisely why iceman’s comment does not make sense as a reply to this post, now; nor is the characterization which I quoted an accurate one.
Yep, I would describe that state of affairs as “not funding constrained”.
I edited out my misquote, my apologies.
I think emotions are not blame assignment tools, and have other (evolutionary) purposes. A classic example is a relationship break-up, where two people can have strong emotions even though nobody did anything wrong. So I do not interpret emotions as accusations in general. It sounds like you have a different approach, and I don’t object to that.
For example, grief over the loss of the $100k+ donation. Donated with the hope that it would reduce extinction risk, but with the benefit of hindsight the donor now thinks that the marginal donation had no counterfactual impact. It’s not blameworthy because no researcher can possibly promise that a marginal donation will have a large counterfactual impact, and MIRI did not so promise. But a donor can still grieve the loss without someone being to blame.
For example, anger that Yudkowsky realized he had no workable alignment plan, in his estimation, in 2015 (Bankless), and didn’t share that until 2022 (Death with Dignity). This is not blameworthy because people are not morally obliged to share their extinction risk predictions, and MIRI has a clear policy against sharing information by default. But a donor can still be angry that they were disadvantaged by known unknowns.
I hope these examples illustrate that a non-accusatory interpretation is sensical, even if you don’t think it plausible.
There’s a later comment from iceman, which is probably the place to discuss what iceman is alleging:
You misunderstand. I’m not “interpret[ing] emotions as accusations”; I’m simply saying that emotions don’t generally arise for no reason at all (if they do, we consider that to be a pathology!).
So, in your break-up example, the two people involved of course have strong emotions—because of the break-up! On the other hand, it would be very strange indeed to wake up one day and have those same emotions, but without having broken up with anyone, or anything going wrong in your relationships at all.
And likewise, in this case:
Well, it’s bit dramatic to talk of “grief” over the loss of money, but let’s let that pass. More to the point: why is it a “loss”, suddenly? What’s happened just now that would cause iceman to view it as a “loss”? It’s got to be something in Zack’s post, or else the comment is weirdly non-apropos, right? In other words, the implication here is that something in the OP has caused iceman to re-examine the facts, and gain a new “benefit of hindsight”. But that’s just what I’m questioning.
I do not read Eliezer’s statements in the Bankless interview as saying that he “realized he had no workable alignment plan” in 2015. As far as I know, at no time since starting to write the Sequences has Eliezer ever claimed to have, or thought that he had, a workable alignment plan. This has never been a secret, nor is it news, either to Eliezer in 2015 or to the rest of us in 2022.
They do not.
Well, you can see my response to that comment.
FWIW, my current understanding is that this inference isn’t correct. I think it’s common practice to pay settlements to people, even if their claims are fallacious, since having an extended court battle is sometimes way worse.