Wei Dai comments on Unnatural Categories Are Optimized for Deception

Wei Dai 18 Jan 2024 8:12 UTC
9 points
I think there is at least one other, more benign objective that “unnatural categories” are sometimes optimized for. Consider this example. Today we have electrical and fuel burning fireplaces. One day someone invents a “neural fireplace”, a device that if installed in a home, remotely induces in everyone a realistic hallucination of a fireplace. Let’s say that most people agree that (ignoring costs) these are close substitutes as far as their utility functions are concerned, such that people regularly say to their architects “please include a fireplace in my house” and it’s assumed to mean putting in any one of the three types of devices in the building plans while minimizing overall cost. I think you’ll agree that “fireplace” here is “unnatural” but also there’s no deception happening?

To generalize from this, it seems that when different natural categories are close substitutes in many people’s utility functions, it would make sense to assigned them a common codeword, to aid communication efficiency when transmitting instructions. Given this, I think “trans women are women” isn’t necessarily motivated by deception of which sex cluster someone belongs to, but instead a signal of local values, trying to imply something like “people around here do not distinguish between trans women and cis women in their values, at least in most circumstances”. This could still be a deception (it’s costly to use the same codeword for two categories if they’re not actually close substitutes, but it could be worth paying this price in order to hide your real values), but it would be mainly a deception about values, not about sex, and it would require investigating people’s actual values to determine whether the signal is really deceptive or not.
- Zack_M_Davis 19 Jan 2024 5:01 UTC
  6 points
  Parent
  Right. What’s “natural” depends on which features you’re paying attention to, which can depend on your values. Electric, wood-burning, and neural fireplaces are similar if you’re only paying attention to the subjective experience, but electric and wood-burning fireplaces form a cluster that excludes neural fireplaces if you’re also considering objective light and temperature conditions.
  
  The thesis of this post is that people who think neural fireplaces are fireplaces should be arguing for that on the merits—that the decision-relevant thing is having the subjective experience of a fireplace, even if the hallucinations don’t provide heat or light. They shouldn’t be saying, “We prefer to draw our categories this way because otherwise the CEO of Neural Fireplaces, Inc. will be really sad, and he’s our friend.”
  - Wei Dai 19 Jan 2024 7:02 UTC
    7 points
    Parent
    
    The thesis of this post is that people who think neural fireplaces are fireplaces should be arguing for that on the merits—that the decision-relevant thing is having the subjective experience of a fireplace, even if the hallucinations don’t provide heat or light. They shouldn’t be saying, “We prefer to draw our categories this way because otherwise the CEO of Neural Fireplaces, Inc. will be really sad, and he’s our friend.”
    
    Hmm, what is the difference between these two types of arguments? I could recast the latter argument in terms of “features to pay attention to” or “what’s decision relevant”: If we pay attention to the feature of “things that the CEO of Neural Fireplaces wants people to treat as interchangeable to the greatest extent possible” then electric, fuel-burning, and neural fireplaces form a natural cluster. The CEO is our friend so it’s highly decision relevant to consider what things he wants us to treat as interchangeable.
    
    In the OP you talk about how redrawing categories for non-epistemic reasons would interfere with Bayesian reasoning, but that applies the former type of argument as well. If we decide to include neural fireplaces in the “fireplace” category based on that the decision-relevant thing is having the subjective experience of a fireplace, it equally interferes with Bayesian reasoning: we can no longer safely infer “generates objective light with high probability” upon hearing “fireplace”, and some people may well make erroneous inferences during a transition period before everyone got on the same page.
    
    So hopefully I’m not being willfully obtuse, but I’m not sure what principle you’re drawing on to say that the former type of argument is ok but the latter is not.
    - Zack_M_Davis 20 Jan 2024 2:08 UTC
      4 points
      Parent
      But presumably the reason the CEO would be sad if people didn’t consider neural fireplaces to be fireplaces is because he wants to be leading a successful company that makes things people want, not a useless company with a useless product. Redefining words “in the map” doesn’t help achieve goals “in the territory”.
      
      The OP discusses a similar example about wanting to be funny. If I think I can get away with changing the definition of the word “funny” such that it includes my jokes by definition, I’m less likely to try interventions that will make people want to watch my stand-up routine, which is one of the consequences I care about that the old concept of funny pointed to and the new concept doesn’t.
      
      Now, it’s true that, in all metaphysical strictness, the map is part of the territory. “what the CEO thinks” and “what we’ve all agreed to put in the same category” are real-world criteria that one can use to discriminate between entities.
      
      But if you’re not trying to deceive someone by leveraging ambiguity between new and old definitions, it’s hard to see why someone would care about such “thin” categories (simply defined by fiat, rather than pointing to a cluster in a “thicker”, higher-dimensional subspace of related properties). The previous post discusses the example of a “Vice President” job title that’s identical to a menial job in all but the title itself: if being a “Vice President” doesn’t imply anything about pay or authority or job duties, it’s not clear why I would particularly want to be a “Vice President”, except insofar as I’m being fooled by what the term used to mean.
      - Wei Dai 20 Jan 2024 3:18 UTC
        4 points
        Parent
        
        But presumably the reason the CEO would be sad if people didn’t consider neural fireplaces to be fireplaces is because he wants to be leading a successful company that makes things people want, not a useless company with a useless product. Redefining words “in the map” doesn’t help achieve goals “in the territory”.
        
        I see, I think this makes sense, but it depends on the CEO’s actual goals/values, right? What if the CEO wants to leverage his friendships to make money, and doesn’t mind people buying neural fireplaces partly or wholly out of care/sympathy for him? And everyone is (or most people are) happy to do this out of genuine care/sympathy for the CEO? In that case there is seemingly no deception involved, and redefining words “in the map” does help achieve goals “in the territory”.
        
        Which of these two analogies is closer to the transgender situation involves empirical questions that I lack the knowledge to discuss. But it occurs to me that maybe your disagreement with Eliezer/Scott is based on you thinking that the first analogy is closer, and them thinking that the second analogy is closer? In other words, maybe they think that trans people would be happy enough with people treating them as their preferred sex/gender out of care/sympathy, and not necessarily “on the merits” in some way?