Cole Wyeth comments on We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society

Cole Wyeth 27 Apr 2024 17:55 UTC
1 point
0
I think it’s worth considering that Jaynes may actually be right here about general agents. His argument does seem to work in practice for humans: it’s standard economic theory that trade works between cultures with strong comparative advantages. On the other hand, probably the most persistent and long running conflict between humans that I can think of is warfare over occupancy of Jerusalem. Of course there is an indexical difference in utility function here—cultures disagree about who should control Jerusalem. But I would have to say that under many metrics of similarity this conflict arises from highly similar loss/utility functions. Certainly I am not fighting for control of Jerusalem, because I just don’t care at all about who has it—my interests are orthogonal in some high dimensional space.

The standard “instrumental utility” argument holds that an unaligned AGI will have some bizarre utility function very different from ours, but the first step towards most such utility functions will be seizing control of resources, and that this will become more true the more powerful the AGI. But what if the resources we are bottlenecked by are only bottlenecks for our objectives and at our level of ability? After all, we don’t go around exterminating ants; we aren’t competing with them over food, we used our excess abilities to play politics and build rockets (I think Marcus Hutter was the first to bring this point to my attention in a lasting way). I think the standard response is that we just aren’t optimizing for our values hard enough, and if we didn’t intrinsically value ants/nature/cosmopolitanism, we would eventually tile the planet with solar panels and wipe them out. But why update on this hypothetical action that we probably will not in fact take? Is it not just as plausible that agents at a sufficiently high level of capability tunnel into some higher dimensional space of possibilities where lower beings can’t follow or interfere, and never again have significant impact on the world we currently experience?

I can imagine a few ways this might happen (energy turns out not to be conserved and deep space is the best place to build a performant computer, it’s possible to build a “portal” of some kind to a more resource rich environment (interpreted very widely), the most effective means of spreading through the stars turns out to be just skipping between stars and ignoring planets) but the point is that the actual mechanism would be something we can’t think of.