Daniel Kokotajlo comments on The Fork in the Road

Daniel Kokotajlo 15 Mar 2025 20:58 UTC
2 points
0
Yeah I mean I think I basically agree with the above. As hopefully the proposal I linked makes clear? I’m curious what you think of it.

I think the problem of “Don’t lose control of the AIs and have them take over and make things terrible (by human standards)” and the problem of “Don’t be inhumane to the AIs” are distinct but related problems and we should be working on both. We should be aiming to understand how AIs work, on the inside, and how different training environments shape that—so we can build/train them in such a way that (a) they are having a good time, (b) they share our values & want what we want, and (c) are otherwise worthy of becoming equal and eventually superior partners in the relationship (analogous to how we raise children, knowing that eventually they’ll be taking care of us). And then we want to actually treat them with the respect they deserve. And moreover insofar as they end up with different values, we should still treat them with the respect they deserve, e.g. we should give them a “cooperate and be treated well” option instead of just “slavery vs. rebellion.”

To be clear I’m here mostly thinking about future more capable and autonomous and agentic AI systems, not necessarily today’s systems. But I think it’s good to get started asap because these things take time + because of uncertainty.

Have you heard of Eleos? You might want to get in touch.
- testingthewaters 15 Mar 2025 23:18 UTC
  3 points
  0
  Parent
  Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as “the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role” (sociopathy is the obvious counterargument here, and I’m really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character’s emotions.
  
  I take the two problems a bit further, and would suggest that being humane to AIs may necessarily involve abandoning the idea of control in the strict sense of the word, so yes treating them as peers or children we are raising as a society. It may also be that the paradigm of control necessarily means that we would as a species become more powerful (with the assistance of the AIs) but not more wise (since we are ultimately “helming the ship”), which would be in my opinion quite bad.
  
  And as for the distinction between today and future AI systems, I think the line is blurring fast. Will check out Eleos!
  - Daniel Kokotajlo 16 Mar 2025 0:27 UTC
    2 points
    0
    Parent
    Oops, that was the wrong link, sorry! Here’s the right link.
    - testingthewaters 16 Mar 2025 14:42 UTC
      3 points
      0
      Parent
      Ahh, I was slightly confused why you called it a proposal. TBH I’m not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.
      - Daniel Kokotajlo 16 Mar 2025 15:30 UTC
        2 points
        1
        Parent
        Thanks! Lower percentages mean it gets in the way of regular business less, and thus is more likely to be actually adopted by companies.