Wei Dai comments on AI #19: Hofstadter, Sutskever, Leike

Wei Dai 7 Jul 2023 0:32 UTC
23 points
0

I strongly claim that the amount of terminal utility from ‘simulate a billion fruit flies each in a state of orgasmic bliss’ is zero. Not epsilon. Zero.

I was with you up to this. I don’t see how anyone can have high justifiable confidence that it’s zero instead of epsilon, given our general state of knowledge/confusion about philosophy of mind/consciousness, axiology, metaethics, metaphilosophy.

When I look ahead, what I see is AIs increasingly presenting adversarial examples to humans.

I wrote a post also pointing this out. It seems like an obvious and neglected issue.

What do we know about Dario Amodei (CEO of Anthropic)?

A few years ago I saw a Google doc in which he was quite optimistic about how hard AI alignment would be (IIRC about level 2 in Sammy Martin’s helpful Ten Levels of AI Alignment Difficulty). I think he was intending to publish it but never did. This (excessive optimism from my perspective) was one of the reasons that I declined to invest in Anthropic when invited to. (Although interestingly Anthropic subsequently worked on Constitutional AI which is actually level 3 in Sammy’s scale. ETA: And also Debate which is level 5.)
- Sammy Martin 10 Jul 2023 16:01 UTC
  4 points
  2
  Parent
  I think that before this announcement I’d have said that OpenAI was at around a 2.5 and Anthropic around a 3 in terms of what they’ve actually applied to existing models (which imo is fine for now, I think that doing more to things at GPT-4 capability levels is mostly wasted effort in terms of current safety impacts), though prior to the superalignment announcement I’d have said openAI and anthropic were both aiming at a 5, i.e. oversight with research assistance, and Deepmind’s stated plan was the best at a 6.5 (involving lots of interpretability and some experiments). Now OpenAI is also aiming at a 6.5 and Anthropic now seems to be the laggard and still at a 5 unless I’ve missed something.
  However the best currently feasible plan is still slightly better than either. I think e.g. very close integration of the deception and capability evals from the ARC and Apollo research teams into an experiment workflow isn’t in either plan and should be, and would bump either up to a 7.
  I don’t see how anyone can have high justifiable confidence that it’s zero instead of epsilon, given our general state of knowledge/confusion about philosophy of mind/consciousness, axiology, metaethics, metaphilosophy.
  I tend to agree with Zvi’s conclusion although I also agree with you that I don’t know that it’s definitely zero. I think it’s unlikely (subjectively like under a percent) that the real truth about axiology says that insects in bliss are an absolute good, but I can’t rule it out like I can rule out winning the lottery because no-one can trust reasoning in this domain that much.
  What I’d say is just in general in ‘weird’ domains (AI Strategy, thinking about longtermist prioritization, metaethics) because the stakes are large and the questions so uncertain, you run into a really large number of “unlikely but not really unlikely enough to honestly call it a pascal’s mugging” considerations, things you’d subjectively say are under 1% likely but over one in a million or so. I think the correct response in these uncertain domains is to mostly just ignore them like you’d ignore things that are under one in a billion in a more certain and clear domain like construction engineering.
  - Wei Dai 12 Jul 2023 23:40 UTC
    11 points
    0
    Parent
    
    I tend to agree with Zvi’s conclusion although I also agree with you that I don’t know that it’s definitely zero. I think it’s unlikely (subjectively like under a percent) that the real truth about axiology says that insects in bliss are an absolute good, but I can’t rule it out like I can rule out winning the lottery because no-one can trust reasoning in this domain that much.
    
    I’m not aware of any reasoning that I’d trust enough to drive my subjective probability of insect bliss having positive value to <1%. Can you cite some works in this area that caused you to be that certain?
    
    What I’d say is just in general in ‘weird’ domains (AI Strategy, thinking about longtermist prioritization, metaethics) because the stakes are large and the questions so uncertain, you run into a really large number of “unlikely but not really unlikely enough to honestly call it a pascal’s mugging” considerations, things you’d subjectively say are under 1% likely but over one in a million or so. I think the correct response in these uncertain domains is to mostly just ignore them like you’d ignore things that are under one in a billion in a more certain and clear domain like construction engineering.
    
    What are some other specific examples of this, in your view?
- RomanHauksson 7 Jul 2023 6:38 UTC
  4 points
  0
  Parent
  I was also disappointed to read Zvi’s take on fruit fly simulations. “Figuring out how to produce a bunch of hedonium” is not an obviously stupid endeavor to me and seems completely neglected. Does anyone know if there are any organizations with this explicit goal? The closest ones I can think of are the Qualia Research Institute and the Sentience Institute, but I only know about them because they’re connected to the EA space, so I’m probably missing some.
  - Zvi 7 Jul 2023 21:22 UTC
    10 points
    3
    Parent
    I acknowledge that there are people who think this is an actually good idea rather than an indication of a conceptual error in need of fixing, and I’ve had some of them seek funding from places where I’m making decisions and it’s super weird. It definitely increases my worry that we will end up in a universe with no value.
    - cwillu 8 Jul 2023 10:13 UTC
      3 points
      0
      Parent
      Climbing the ladder of human meaning, ability and accomplishment for some, miniature american flags for others!