We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society
These paragraphs from E. T. Jaynes’s Probability Theory: The Logic of Science (in §13.12.2, “Loss functions in human society”) are fascinating from the perspective of a regular reader of this website:
We note the sharp contrast between the roles of prior probabilities and loss functions in human relations. People with similar prior proabilities get along well together, because they have about the same general view of the world and philosophy of life. People with radically different prior probabilities cannot get along—this has been the root cause of all the religious wars and most of the political repressions throughout history.
Loss functions operate in just the opposite way. People with similar loss functions are after the same thing, and are in contention with each other. People with different loss functions get along well because each is willing to give something the other wants. Amicable trade or business transactions, advantageous to all, are possible only between parties with very different loss functions. We illustrated this by the example of insurance above.
(Jaynes writes in terms of loss functions for which lower values are better, whereas we more often speak of utility functions for which higher values are better, but the choice of convention doesn’t matter—as long as you’re extremely sure which one you’re using.)
The passage is fascinating because the conclusion looks so self-evidently wrong from our perspective. Agents with the same goals are in contention with each other? Agents with different goals get along? What!?
The disagreement stems from a clash of implicit assumptions. On this website, our prototypical agent is the superintelligent paperclip maximizer, with a utility function about the universe—specifically, the number of paperclips in it—not about itself. It doesn’t care who makes the paperclips. It probably doesn’t even need to trade with anyone.
In contrast, although Probability Theory speaks of programming a robot to reason as a rhetorical device[1], this passage seems to suggest that Jaynes hadn’t thought much about how ideal agents might differ from humans? Humans are built to be mostly selfish: we eat to satisfy our own hunger, not as part of some universe-spanning hunger-minimization scheme. Besides being favored by evolution, selfish goals do offer some conveniences of implementation: my own hunger can be computed as a much simpler function of my sense data than someone else’s. If one assumes that all goals are like that, then one reaches Jaynes’s conclusion: agents with similar goal specifications are in conflict, because the specified objective (for food, energy, status, whatever) binds to an agent’s own state, not a world-model.
But … the assumption isn’t true! Not even for humans, really—sometimes people have “similar loss functions” that point to goals outside of themselves, which benefit from more agents having those goals. Jaynes is being silly here.
That said—and no offense—the people who read this website are not E. T. Jaynes; if we can get this one right where he failed, it’s because our subculture happened to inherit an improved prior in at least this one area, not because of our innate brilliance or good sense. Which prompts the question: what other misconceptions might we be harboring, due to insufficiently general implicit assumptions?
- ↩︎
Starting from §1.4, “Introducing the Robot”:
In order to direct attention to constructive things and away from controversial irrelevancies, we shall invent an imaginary being. Its brain is to be designed by us, so that it reasons according to certain definite rules. These rules will be deduced from simple deciderata which, it appears to us, would be desirable in human brains; i.e. we think that a rational person, on discovering that they were violating one of these deciderata, would wish to revise their thinking.
While I agree with you that Jaynes’ description of how loss functions operate in people does not extend to agents in general, the specific passage you have quoted reads strongly to me as if it’s meant to be about humans, not generalized agents.
You claim that Jaynes’ conclusion is that “agents with similar goal specifications are in conflict, because the specified objective (for food, energy, status, whatever) binds to an agent’s own state, not a world-model.” But this isn’t true. His conclusion is specifically about humans.
I want to reinforce that I’m not disagreeing with you about your claims about generalized agents, or even about what Jaynes says elsewhere in the book. I’m only refuting the way you’ve interpreted the two paragraphs you quoted here. If you’re going to call a passage of ET Jaynes’ “silly,” you have to be right on the money to get away with it!
Thanks. We don’t seem to have a “That’s fair” or “Touché” react (which seems different and weaker than “Changed my mind”).
Here is a quote from the same text that I think is more apt to the point you are making about apparent shortcomings in ET Jaynes’ interpretation of more general agentic behavior:
a completion of what my brain spat out on seeing the title, adapted to context...
What’s more, even selfish agents with de dicto identical utility functions can trade: If I have two right shoes and you have two left shoes, we’d trade one shoe for another because of decreasing marginal utility.
I don’t see why Jaynes is wrong. I guess it depends on the interpretation? If two humans are chasing the same thing there is a limited amount of it, of course they are in conflict with each other. Isn’t that what Jaynes is pointing at?
The way Jaynes says it, looks like it is meant to be a more general property than something that applies only “If two humans are chasing the same thing there is a limited amount of”.
Even assuming perfect selfishness, sometimes the best way to get what you want (X) is to coordinate to change the world in a way that makes X plentiful, rather than fighting over the rare Xs that exist now, and in that way, your goals align with other people who want X.
Yall, I’m actually sorta confused about the binary between epistemic and instrumental rationality. In my brain I have this labeling scheme like “PLOS is about epistemic rationality”. I think of epistemic and instrumental as a fairly clean binary, because a typecheckerish view of expected value theory separates utilities/values and probabilities very explicitly. A measure forms a coefficient for a valuation, or the other way around.
But I’ve really had baked in that I shouldn’t conflate believing true things (“epistemics”: prediction, anticipation constraint, paying rent) with modifying the world (“instrumentals”: valuing stuff, ordering states of the world, steering the future). This has seemed deeply important, because is and ought are perpendicular.
But what if that’s just not how it is? what if there’s a fuzzy boundary? I feel weird.
But in hindsight I should probably have been confused ever since description length minimization = utility maximization
Is this actually wrong? It seems to be a more math flavored restatement of Girardian mimesis, and how mimesis minimizes distinction which causes rivalry and conflict.
(this post had “inline reacts” enabled by mistake, but we’re not rolling that out broadly yet, so I switch it to regular reacts)
It wasn’t a mistake; I was curious to see what it did. (And since I didn’t see any comments between when I logged out on Sunday and came back to the site today to see this, I still don’t know what “inline” reacts are.) If the team made a mistake by exposing a menu option that you didn’t actually want people to use, that’s understandable, but you shouldn’t call it user error when you don’t know that it wasn’t completely intentional on the user’s part.
Sorry, I meant that it was a mistake on our part. Was not user error! Check out the latest Open Thread to see the experiment there.
I think it’s worth considering that Jaynes may actually be right here about general agents. His argument does seem to work in practice for humans: it’s standard economic theory that trade works between cultures with strong comparative advantages. On the other hand, probably the most persistent and long running conflict between humans that I can think of is warfare over occupancy of Jerusalem. Of course there is an indexical difference in utility function here—cultures disagree about who should control Jerusalem. But I would have to say that under many metrics of similarity this conflict arises from highly similar loss/utility functions. Certainly I am not fighting for control of Jerusalem, because I just don’t care at all about who has it—my interests are orthogonal in some high dimensional space.
The standard “instrumental utility” argument holds that an unaligned AGI will have some bizarre utility function very different from ours, but the first step towards most such utility functions will be seizing control of resources, and that this will become more true the more powerful the AGI. But what if the resources we are bottlenecked by are only bottlenecks for our objectives and at our level of ability? After all, we don’t go around exterminating ants; we aren’t competing with them over food, we used our excess abilities to play politics and build rockets (I think Marcus Hutter was the first to bring this point to my attention in a lasting way). I think the standard response is that we just aren’t optimizing for our values hard enough, and if we didn’t intrinsically value ants/nature/cosmopolitanism, we would eventually tile the planet with solar panels and wipe them out. But why update on this hypothetical action that we probably will not in fact take? Is it not just as plausible that agents at a sufficiently high level of capability tunnel into some higher dimensional space of possibilities where lower beings can’t follow or interfere, and never again have significant impact on the world we currently experience?
I can imagine a few ways this might happen (energy turns out not to be conserved and deep space is the best place to build a performant computer, it’s possible to build a “portal” of some kind to a more resource rich environment (interpreted very widely), the most effective means of spreading through the stars turns out to be just skipping between stars and ignoring planets) but the point is that the actual mechanism would be something we can’t think of.