Said Achmiz comments on A Hill of Validity in Defense of Meaning

Said Achmiz 16 Jul 2023 0:39 UTC
22 points
9

So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.

Er… is anyone actually claiming this? This is quite the accusation, and if it were being made, I’d want to see some serious evidence, but… is it, in fact, being made?

(It does seem like OP is saying this, but… in a weird way that doesn’t seem to acknowledge the magnitude of the accusation, and treats it as a reasonable characterization of other claims made earlier in the post. But that doesn’t actually seem to make sense. Am I misreading, or what?)
What links here?
- Zack_M_Davis's comment on A Hill of Validity in Defense of Meaning by Zack_M_Davis (17 Jul 2023 6:07 UTC; 6 points)
- iceman 16 Jul 2023 2:06 UTC
  13 points
  −28
  Parent
  The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...
  
  MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won’t be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.
  
  Ooops.
  
  The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn. Yudkowsky’s dream of building the singleton Sysop is gone and was probably never achievable in the first place.
  
  People double down with the “mesaoptimizer” frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that would be very easy to empower private interests over public interests. Enraging to me, there’s a pattern of engagement where it seems like AI Doomers will only interact with weak arguments instead of strong ones: Yud mostly argues with low quality e/accs on twitter where it’s easy to score Ws; it was mildly surprising when he even responded with “This is kinda long.” to Quinton Pope’s objection thread.
  
  What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
  
  So what am I supposed to extract from this pattern of behaviour?
  - jimrandomh 16 Jul 2023 21:52 UTC
    62 points
    42
    Parent
    Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn.
    I think you’ve updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.
    If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they’re idiot-savants with skill gaps that prevent them from working independently, and no AI system has passed the litmus test I use for identifying good (human) programmers. They’re advancing in that direction pretty rapidly, but they’re unambiguously not there yet.
    Similarly, if a treacherous turn happens, it happens no earlier than the point where AI systems can do strategic reasoning with long chains of inference; this again has an idiot-savant dynamic going on, which can create the false impression that this landmark has been reached, when in fact it hasn’t.
  - Said Achmiz 16 Jul 2023 4:41 UTC
    35 points
    8
    Parent
    
    They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively.
    
    Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)
    
    They predict fast takeoff and FOOM. … Deep Learning systems don’t look like they FOOM.
    
    It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it’s my understanding that Eliezer still predicts this. Or is that false?)
    
    Stochastic Gradient Descent doesn’t look like it will treacherous turn.
    
    It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
    
    So what am I supposed to extract from this pattern of behaviour?
    
    I think that at least some of the things you take to be obvious conclusions that Eliezer/MIRI should’ve drawn, are in fact not obvious, and some are even plausibly false.
    
    You also make some good points. But there isn’t nearly so clear a pattern as you suggest.
    - Vaniver 17 Jul 2023 20:02 UTC
      14 points
      −1
      Parent
      It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
      As I understand the argument, it goes like the following:
      For evolutionary methods, you can’t predict the outcome of changes before they’re made, and so you end up with ‘throw the spaghetti at the wall and see what sticks’. At some point, those changes accumulate to a mind that’s capable of figuring out what environment it’s in and then performing well at that task, so you get what looks like an aligned agent while you haven’t actually exerted any influence on its internal goals (i.e. what it’ll do once it’s out in the world).
      For gradient-descent based methods, you can predict the outcome of changes before they’re made; that’s the gradient part. It’s overall less plausible that the system you’re building figures out generic reasoning and then applies that generic reasoning to a specific task, compared to figuring out the specific reasoning for the task that you’d like solved. Jumps in the loss look more like “a new cognitive capacity has emerged in the network” and less like “the system is now reasoning about its training environment”.
      Of course, that “overall less plausible” is making a handwavy argument about what simplicity metric we should be using and which design is simpler according to that metric. Related, earlier research: Are minimal circuits deceptive?
      IMO this should be somewhat persuasive but not conclusive. I’m much happier with a transformer shaped by a giant English text corpus than I am with whatever is spit out by a neural-architecture-search program pointed at itself! But for cognitive megaprojects, I think you probably have to have something-like-a-mind in there, even if you got to it by SGD.
  - FireStormOOO 16 Jul 2023 4:30 UTC
    14 points
    6
    Parent
    It’s pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won’t FOOM, or we otherwise needn’t do anything inconvenient to get good outcomes. It’s proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
    FWIW I’m considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected. There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible. That optimism just doesn’t take it under the stop worrying threshold.
  - jimrandomh 16 Jul 2023 23:45 UTC
    7 points
    3
    Parent
    This doesn’t seem consistent to me with MIRI having run a research program with a machine learning focus. IIRC (I don’t have links handy but I’m pretty sure there were announcements made) that they wound up declaring failure on that research program, and it was only after that happened that they started talking about the world being doomed and there not being anything that seemed like it would work for aligning AGI in time.
  - Said Achmiz 16 Jul 2023 4:45 UTC
    4 points
    0
    Parent
    
    Yudkowsky offers to bet Demis Hassabis that Go won’t be solved in the short term.
    
    Incidentally, I don’t think I’m willing to trust a hearsay report on this without confirmation.
    
    Do you happen to have any links to Eliezer making such a claim in public? Or, at least, any confirmation that the cited comment was made as described?
    - DanielFilan 16 Jul 2023 6:32 UTC
      9 points
      5
      Parent
      Closest thing I’m aware of is that at the time of the AlphaGo matches he bet people at like 3:2 odds, favourable to him, that Lee Sedol would win. Link here
- Raemon 16 Jul 2023 3:02 UTC
  7 points
  0
  Parent
  My interpretation of various things Michael and co. have said is “Effective altruism in general (and MIRI / AI-safety in particular) is a memeplex optimizing to extract resources from people in a fraudulent way, which does include some degree of “straightforward fraud the way most people would interpret it”, but also, their worldview includes generally seeing a lot of things as fraudulent in ways/degrees that common parlance wouldn’t generally mean.
  I predict they wouldn’t phrase things the specific way iceman phrased it (but, not confidently).
  I think Jessicata’s The AI Timelines Scam is a pointer to the class of thing they might tend to mean. Some other relevant posts including Can crimes be discussed literally? and Approval Extraction Advertised as Production.
  - Said Achmiz 16 Jul 2023 3:51 UTC
    21 points
    16
    Parent
    Yes, this is all reasonable, but as a description of Eliezer’s behavior as understood by him, and also as understood by, like, an ordinary person, “doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock” is just… totally wrong… isn’t it?
    
    That is, that characterization doesn’t match what Eliezer sees himself as doing, nor does it match how an ordinary person (and one who had no particular antipathy toward Eliezer, and thus was not inclined to describe his behavior uncharitably, only impartially), speaking in ordinary English, would describe Eliezer as doing—correct?
    - Raemon 16 Jul 2023 4:02 UTC
      14 points
      10
      Parent
      Yes, that is my belief. (Sorry, should have said that concretely). I’m not sure what an ‘ordinary person’ should think because ‘AI is dangerous’ has a lot of moving pieces and I think most people are (kinda reasonably?) epistemically helpless about the situation. But I do think iceman’s summary is basically obviously false, yes.
      My own current belief is “Eliezer/MIRI probably had something-like-a-plan around 2017, probably didn’t have much of a plan by 2019 that Eliezer himself believed in, but, ‘take a break, and then come back to the problem after thinking about it’ feels like a totally reasonable thing to me to do”. (and meanwhile there were still people at MIRI working on various concrete projects that at least at the people involved thought were worthwhile).
      i.e. I don’t think MIRI “gave up”
      I do think, if you don’t share Eliezer’s worldview, it’s a reasonable position to be suspicious and hypothesize that MIRI’s current activities are some sort of motivated-cognition-y cope, but I think confidently asserting that seems wrong to me. (I also think there’s a variety of worldviews that aren’t Eliezer’s exact worldview that make his actions still pretty coherent, and if I think it’s a pretty sketchy position to assert all those nearby-worldviews are so obviously wrong as to make ‘motivated cope/fraud’ your primary frame)
  - Raemon 16 Jul 2023 3:08 UTC
    4 points
    1
    Parent
    (fwiw my overall take is that I think there is something to this line of thinking. My general experience is that when Michael/Benquo/Jessica say “something is fishy here”, there often turns out to be something I agree is fishy in some sense, but I find their claims overstated and running with some other assumptions I don’t believe that make the thing seem worse to them than it does to me)
- Martin Randall 16 Jul 2023 3:20 UTC
  3 points
  4
  Parent
  For the first part, Yudkowsky has said that he doesn’t have a workable alignment plan, and nobody does, and we are all going to die. This is not blameworthy, I also do not have a workable alignment plan.
  
  For the second part, he was recently on a sabbatical, presumably funded by prior income that was funded by charity, so one might say he was living off donations. Not blameworthy, I also take vacations.
  
  For the third part, everyone who thinks that we are all going to die is in some sense running out the clock, be they disillusioned transhumanists or medieval serfs. Hopefully we make some meaning while we are alive. Not blameworthy, just the human condition.
  
  Whether MIRI is a good place to donate is a very complicated question, but certainly “no” is a valid answer for many donors.
  - Said Achmiz 16 Jul 2023 3:55 UTC
    6 points
    5
    Parent
    These are good points. But it does seem like what @iceman meant by the bit that I quoted at least has connotations that go beyond your interpretation, yes?
    
    Whether MIRI is a good place to donate is a very complicated question, but certainly “no” is a valid answer for many donors.
    
    Sure. I haven’t donated to MIRI in many years, so I certainly wouldn’t tell anyone else to do so. (It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
    - Martin Randall 19 Jul 2023 1:11 UTC
      8 points
      −2
      Parent
      What accusation do you see in the connotations of that quote? Genuine question, I could guess but I’d prefer to know. Mostly the subtext I see from iceman is disappointment and grief and anger and regret. Which are all valid emotions for them to feel.
      
      I think a lot of what might have been serious accusations in 2019 are now common knowledge, eg after Bankless, Death with Dignity, etc.
      
      (It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
      
      From the Bankless interview:
      
      How do I put it… The saner outfits do have uses for money. They don’t really have scalable uses for money, but they do burn any money literally at all. Like, if you gave MIRI a billion dollars, I would not know how to...
      
      Well, at a billion dollars, I might try to bribe people to move out of AI development, that gets broadcast to the whole world, and move to the equivalent of an island somewhere—not even to make any kind of critical discovery, but just to remove them from the system. If I had a billion dollars.
      
      If I just have another $50 million, I’m not quite sure what to do with that, but if you donate that to MIRI, then you at least have the assurance that we will not randomly spray money on looking like we’re doing stuff and we’ll reserve it, as we are doing with the last giant crypto donation somebody gave us until we can figure out something to do with it that is actually helpful. And MIRI has that property. I would say probably Redwood Research has that property.
      
      (Edited to fix misquote)
      - Said Achmiz 19 Jul 2023 4:40 UTC
        10 points
        4
        Parent
        So, just to clarify, “serious accusation” is not a phrase that I have written in this discussion prior to this comment, which is what the use of quotes in your comment suggests. I did write something which has more or less the same meaning! So you’re not mis-ascribing beliefs to me. But quotes mean that you’re… quoting… and that’s not the case here.
        
        Anyway, on to the substance:
        
        What “serious accusation” do you see in the connotations of that quote?
        
        And the quote in question, again, is:
        
        So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.
        
        The connotations are that Eliezer has consciously chosen to stop working on alignment, while pretending to work on alignment, and receiving money to allegedly work on alignment but instead just not doing so, knowing that there won’t be any consequences for perpetrating this clear and obvious scam in the classic sense of the word, because the world’s going to end and he’ll never be held to account.
        
        Needless to say, it just does not seem to me like Eliezer or MIRI are doing anything remotely like that. Indeed I don’t think anyone (serious) has even suggested that they’re doing anything like that. (The usual horde of haters on Twitter / Reddit / etc. notwithstanding.)
        
        Mostly the subtext I see from iceman is disappointment and grief and anger and regret. Which are all valid emotions for them to feel.
        
        But of course this is largely nonsensical in the absence of any “serious accusations”. Grief over what, anger about what? Why should these things be “valid emotions … to feel”? (And it can’t just be “we’re all going to die”, because that’s not new; we didn’t just find that out from the OP—while iceman’s comment clearly implies that whatever is the cause of his reaction, it’s something that he just learned from Zack’s post.)
        
        I think a lot of what might have been “serious accusations” in 2019 are now common knowledge, eg after Bankless, Death with Dignity, etc.
        
        Which is precisely why iceman’s comment does not make sense as a reply to this post, now; nor is the characterization which I quoted an accurate one.
        
        (It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
        
        From the Bankless interview:
        
        Yep, I would describe that state of affairs as “not funding constrained”.
        Martin Randall 19 Jul 2023 18:34 UTC
        −1 points
        −2
        Parent
        I edited out my misquote, my apologies.
        
        I think emotions are not blame assignment tools, and have other (evolutionary) purposes. A classic example is a relationship break-up, where two people can have strong emotions even though nobody did anything wrong. So I do not interpret emotions as accusations in general. It sounds like you have a different approach, and I don’t object to that.
        
        Grief over what, anger about what?
        
        For example, grief over the loss of the $100k+ donation. Donated with the hope that it would reduce extinction risk, but with the benefit of hindsight the donor now thinks that the marginal donation had no counterfactual impact. It’s not blameworthy because no researcher can possibly promise that a marginal donation will have a large counterfactual impact, and MIRI did not so promise. But a donor can still grieve the loss without someone being to blame.
        
        For example, anger that Yudkowsky realized he had no workable alignment plan, in his estimation, in 2015 (Bankless), and didn’t share that until 2022 (Death with Dignity). This is not blameworthy because people are not morally obliged to share their extinction risk predictions, and MIRI has a clear policy against sharing information by default. But a donor can still be angry that they were disadvantaged by known unknowns.
        
        I hope these examples illustrate that a non-accusatory interpretation is sensical, even if you don’t think it plausible.
        
        There’s a later comment from iceman, which is probably the place to discuss what iceman is alleging:
        
        What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
        
        Said Achmiz 19 Jul 2023 19:39 UTC
        2 points
        2
        Parent
        
        I think emotions are not blame assignment tools, and have other (evolutionary) purposes. A classic example is a relationship break-up, where two people can have strong emotions even though nobody did anything wrong. So I do not interpret emotions as accusations in general. It sounds like you have a different approach, and I don’t object to that.
        
        You misunderstand. I’m not “interpret[ing] emotions as accusations”; I’m simply saying that emotions don’t generally arise for no reason at all (if they do, we consider that to be a pathology!).
        
        So, in your break-up example, the two people involved of course have strong emotions—because of the break-up! On the other hand, it would be very strange indeed to wake up one day and have those same emotions, but without having broken up with anyone, or anything going wrong in your relationships at all.
        
        And likewise, in this case:
        
        Grief over what, anger about what?
        
        For example, grief over the loss of the $100k+ donation. Donated with the hope that it would reduce extinction risk, but with the benefit of hindsight the donor now thinks that the marginal donation had no counterfactual impact. It’s not blameworthy because no researcher can possibly promise that a marginal donation will have a large counterfactual impact, and MIRI did not so promise. But a donor can still grieve the loss without someone being to blame.
        
        Well, it’s bit dramatic to talk of “grief” over the loss of money, but let’s let that pass. More to the point: why is it a “loss”, suddenly? What’s happened just now that would cause iceman to view it as a “loss”? It’s got to be something in Zack’s post, or else the comment is weirdly non-apropos, right? In other words, the implication here is that something in the OP has caused iceman to re-examine the facts, and gain a new “benefit of hindsight”. But that’s just what I’m questioning.
        
        For example, anger that Yudkowsky realized he had no workable alignment plan, in his estimation, in 2015 (Bankless), and didn’t share that until 2022 (Death with Dignity). This is not blameworthy because people are not morally obliged to share their extinction risk predictions, and MIRI has a clear policy against sharing information by default. But a donor can still be angry that they were disadvantaged by known unknowns.
        
        I do not read Eliezer’s statements in the Bankless interview as saying that he “realized he had no workable alignment plan” in 2015. As far as I know, at no time since starting to write the Sequences has Eliezer ever claimed to have, or thought that he had, a workable alignment plan. This has never been a secret, nor is it news, either to Eliezer in 2015 or to the rest of us in 2022.
        
        I hope these examples illustrate that a non-accusatory interpretation is sensical, even if you don’t think it plausible.
        
        They do not.
        
        There’s a later comment from iceman, which is probably the place to discuss what iceman is alleging:
        
        Well, you can see my response to that comment.