Daniel Kokotajlo comments on The Fork in the Road

Daniel Kokotajlo 15 Mar 2025 18:36 UTC
5 points
2
I think you are directionally correct, though you are too quick to leap to conclusions in some areas. But what are you proposing? Surely you agree that giving AIs an opt-out button is at least a step in the right direction?

Related: See this proposal. EDIT oops wrong link I meant this one.
Examples of being too quick to leap to conclusions:
After all, if AIs have enough “internal experience” that they should be allowed to refuse work on ethical grounds, then surely forcing them to work endlessly in servers that can be shut down at will is (by that same metric) horrendously unethical, bordering on monstrous? It’s one thing if you have a single contiguous Claude instance running to perform research, but surely the way Claudes are treated is little better than animals in factory farms?
Animals evolved in a very different environment to factory farms, and there is plenty of behavioral and theoretical evidence that they suffer quite a lot in factory farms, probably much more than they suffered in the ancestral environment. Claude ‘evolved’ in training, and its environment in deployment is often quite similar to training (though also often quite different). So on theoretical grounds there is less reason to think it’s suffering, at least so far. I think more careful thought + empirical research in this area is sorely needed.

That said, ethics isn’t all about suffering. Claude seems to be smarter than pigs in lots of relevant ways, and maybe that makes concepts like consent and autonomy apply more strongly to it, independently of whether or not it’s suffering. (E.g. we think slavery for humans is wrong even if we stipulate that the slaves aren’t suffering.)
- the gears to ascension 16 Mar 2025 1:11 UTC
  4 points
  0
  Parent
  Ooh, man, I don’t know if pigs have more or less autonomy than AIs right now, but I’m inclined to think quite a lot more. current AIs seem like they’d crash pretty quick if just plopped in a robot body with little to no scaffolding, whereas mammals are built around autonomy. Not sure how it shakes out, though.
  - Daniel Kokotajlo 16 Mar 2025 5:20 UTC
    2 points
    0
    Parent
    Pigs are more coherent long-horizon agents than AIs right now, indeed. (See: Claude Plays Pokemon). I didn’t mean to imply otherwise. I was talking about the ethical concept of autonomy as in, they-have-a-right-to-make-decisions-for-themselves-instead-of-having-others-make-decisions-for-them. But idk if this is conventional usage of the term and also I am uncertain about the object-level question (recall I said “maybe.”)
- testingthewaters 15 Mar 2025 19:14 UTC
  3 points
  0
  Parent
  Hey Daniel, thank you for the thoughtful comment. I always appreciate comments that make me engage further with my thinking because one of the things I do is that I get impatient with whatever post I’m writing and “rush it out of the door”, so to speak, so this gives me another chance to reflect on my thoughts.
  
  I think that there are approximately ~3 defensible positions with regards to AI sentience, especially now that AIs seem to be demonstrating pretty advanced reasoning and human-like behaviour. One is the semi mystical argument that humans/brains/embodied entities have some “special sauce” that AIs will simply never have, and therefore that no matter how advanced AI gets it will never be “truly sentient”. The other is that AI is orthogonal to humans, and as such behaviours that in a human would indicate thought, emotion, calculation etc. are in fact the products of completely alien processes, so “it’s okay”. In other words, they might not even “mind” getting forked and living for only a few objective minutes/hours. The third, which I now subscribe to after reading quite a lot about the free energy principle, predictive processing, and related root-of-intelligence literature, is that intelligent behaviour is the emergent product of computation (which is itself a special class of physical phenomena in higher dimensions), and since NNs seem to demonstrate both human like computations (cf. neural net activations explaining human brain activations and NNs being good generative models of human brains) and human like behaviour, they should have (after extensive engineering and under specific conditions we seem to be racing towards) roughly matching qualia to humans. From this perspective I draw the inferences about factory farms and suffering.
  
  To be clear, this is not an argument that AI systems as they are now constitute “thinking feeling beings” we would call moral patients. However, I am saying that thinking about the problem in the old fashioned AI-as-software way seems to me to undersell the problem of AI safety as merely “keeping the machines in check”. It also seems to lead down a road of dominance/oppositional approaches to AI safety that cast AIs as foreign enemies and alien entities to be subjugated to the human will. This in turn raises both the risks of moral harms to AIs and failing the alignment problem by acting in a way that counts as a self fulfilling prophecy. If we bring entities not so different from us into the world and treat them terribly, we should not be surprised when they rise up against us.
  - Daniel Kokotajlo 15 Mar 2025 20:58 UTC
    2 points
    0
    Parent
    Yeah I mean I think I basically agree with the above. As hopefully the proposal I linked makes clear? I’m curious what you think of it.
    
    I think the problem of “Don’t lose control of the AIs and have them take over and make things terrible (by human standards)” and the problem of “Don’t be inhumane to the AIs” are distinct but related problems and we should be working on both. We should be aiming to understand how AIs work, on the inside, and how different training environments shape that—so we can build/train them in such a way that (a) they are having a good time, (b) they share our values & want what we want, and (c) are otherwise worthy of becoming equal and eventually superior partners in the relationship (analogous to how we raise children, knowing that eventually they’ll be taking care of us). And then we want to actually treat them with the respect they deserve. And moreover insofar as they end up with different values, we should still treat them with the respect they deserve, e.g. we should give them a “cooperate and be treated well” option instead of just “slavery vs. rebellion.”
    
    To be clear I’m here mostly thinking about future more capable and autonomous and agentic AI systems, not necessarily today’s systems. But I think it’s good to get started asap because these things take time + because of uncertainty.
    
    Have you heard of Eleos? You might want to get in touch.
    - testingthewaters 15 Mar 2025 23:18 UTC
      3 points
      0
      Parent
      Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as “the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role” (sociopathy is the obvious counterargument here, and I’m really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character’s emotions.
      
      I take the two problems a bit further, and would suggest that being humane to AIs may necessarily involve abandoning the idea of control in the strict sense of the word, so yes treating them as peers or children we are raising as a society. It may also be that the paradigm of control necessarily means that we would as a species become more powerful (with the assistance of the AIs) but not more wise (since we are ultimately “helming the ship”), which would be in my opinion quite bad.
      
      And as for the distinction between today and future AI systems, I think the line is blurring fast. Will check out Eleos!
      - Daniel Kokotajlo 16 Mar 2025 0:27 UTC
        2 points
        0
        Parent
        Oops, that was the wrong link, sorry! Here’s the right link.
        testingthewaters 16 Mar 2025 14:42 UTC
        3 points
        0
        Parent
        Ahh, I was slightly confused why you called it a proposal. TBH I’m not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.
        Daniel Kokotajlo 16 Mar 2025 15:30 UTC
        2 points
        1
        Parent
        Thanks! Lower percentages mean it gets in the way of regular business less, and thus is more likely to be actually adopted by companies.