Mikhail Samin comments on A case for AI alignment being difficult

Mikhail Samin 2 Jan 2024 12:32 UTC
−2 points
0
[retracted]
- jessicata 2 Jan 2024 20:33 UTC
  6 points
  1
  Parent
  I meant to say I’d be relatively good at it, I think it would be hard to find 20 people who are better than me at this sort of thing. The original ITT was about simulating “a libertarian” rather than “a particular libertarian”, so emulating Yudkowsky specifically is a difficulty increase that would have to be compensated for. I think replicating writing style isn’t the main issue, replicating the substance of arguments is, which is unfortunately harder to test. This post wasn’t meant to do this, as I said.
  
  I’m also not sure in particular what about the Yudkowskian AI risk models you think I don’t understand. I disagree in places but that’s not evidence of not understanding them.
- Steven Byrnes 2 Jan 2024 19:30 UTC
  2 points
  1
  Parent
  Yudkowsky has well-known idiosyncratic writing style, idiosyncratic go-to examples etc. So anyway, when OP writes “I think I would be pretty good at passing an ideological Turing test for Eliezer Yudkowsky on AGI alignment difficulty (but not AGI timelines)”, I don’t think we should interpret that as a claim that she can produce writing which appears to have been written by Yudkowsky specifically (as judged by a blinded third-party). I think it’s a weaker claim than that. See the ITT wiki entry.
  - Mikhail Samin 2 Jan 2024 20:24 UTC
    −4 points
    −1
    Parent
    I know what ITT is. I mean understanding Yudkowsky’s models, not reproducing his writing style. I was surprised to see this post in my mailbox, and I updated negatively about MIRI when I saw that OP was a research fellow there, as I didn’t previously expect that some at MIRI misunderstand their level of understanding Yudkowsky’s models.
    
    There’s one interesting thought in this post that I don’t remember actively having in a similar format until reading this post- that predictive models might get agency from having to achieve results with their cognition- but generally, I think both this post and a linked short story, e.g., have a flaw I’d expect people who’ve read the metaethics sequence to notice, and I don’t expect people to pass the ITT if they can write a post like this.
    - bideup 3 Jan 2024 8:59 UTC
      2 points
      1
      Parent
      Explaining my downvote:
      
      This comment contains ~5 negative statements about the post and the poster without explaining what it is that the commentor disagrees with.
      
      As such it seems to disparage without moving the conversation forward, and is not the sort of comment I’d like to see on LessWrong.
      - Mikhail Samin 3 Jan 2024 10:30 UTC
        2 points
        0
        Parent
        My comment was a reply to a comment on ITT. I made it in the hope someone would be up for the bet. I didn’t say I disagree with the OP’s claims on alignment; I said I don’t think they’d be able to pass an ITT. I didn’t want to talk about specifics of what the OP doesn’t seem to understand about Yudkowsky’s views, as the OP could then reread some of what Yudkowsky’s written more carefully, and potentially make it harder for me to distinguish them in an ITT.
        
        I’m sorry if it seemed disparaging.
        
        The comment explained what I disagree with in the post: the claim that the OP would be good at passing an ITT. It wasn’t intended as being negative about the OP, as, indeed, I think 20 people are on the right order of magnitude of the amount of people who’d be substantially better at it, which is the bar of being in the top 0.00000025% of Earth population at this specific thing. (I wouldn’t claim I’d pass that bar.)
        
        If people don’t want to do any sort of betting, I’d be up for a dialogue on what I think Yudkowsky thinks that would contradict some of what’s written in the post, but I don’t want to spend >0.5h on a comment no one will read