Mikhail Samin comments on A case for AI alignment being difficult

Mikhail Samin 2 Jan 2024 20:24 UTC
−1 points
0
I know what ITT is. I mean understanding Yudkowsky’s models, not reproducing his writing style. I was surprised to see this post in my mailbox, and I updated negatively about MIRI when I saw that OP was a research fellow there, as I didn’t previously expect that some at MIRI misunderstand their level of understanding Yudkowsky’s models.

There’s one interesting thought in this post that I don’t remember actively having in a similar format until reading this post- that predictive models might get agency from having to achieve results with their cognition- but generally, I think both this post and a linked short story, e.g., have a flaw I’d expect people who’ve read the metaethics sequence to notice, and I don’t expect people to pass the ITT if they can write a post like this.
- bideup 3 Jan 2024 8:59 UTC
  2 points
  1
  Parent
  Explaining my downvote:
  
  This comment contains ~5 negative statements about the post and the poster without explaining what it is that the commentor disagrees with.
  
  As such it seems to disparage without moving the conversation forward, and is not the sort of comment I’d like to see on LessWrong.
  - Mikhail Samin 3 Jan 2024 10:30 UTC
    5 points
    0
    Parent
    My comment was a reply to a comment on ITT. I made it in the hope someone would be up for the bet. I didn’t say I disagree with the OP’s claims on alignment; I said I don’t think they’d be able to pass an ITT. I didn’t want to talk about specifics of what the OP doesn’t seem to understand about Yudkowsky’s views, as the OP could then reread some of what Yudkowsky’s written more carefully, and potentially make it harder for me to distinguish them in an ITT.
    
    I’m sorry if it seemed disparaging.
    
    The comment explained what I disagree with in the post: the claim that the OP would be good at passing an ITT. It wasn’t intended as being negative about the OP, as, indeed, I think 20 people are on the right order of magnitude of the amount of people who’d be substantially better at it, which is the bar of being in the top 0.00000025% of Earth population at this specific thing. (I wouldn’t claim I’d pass that bar.)
    
    If people don’t want to do any sort of betting, I’d be up for a dialogue on what I think Yudkowsky thinks that would contradict some of what’s written in the post, but I don’t want to spend >0.5h on a comment no one will read