In the soaking-up-extra-compute case? Yeah, for sure, I can only really picture it (a) on a very short-term basis, for example maybe while linking up tightly for important negotations (but even here, not very likely). Or (b) in a situation with high power asymmetry. For example maybe there’s a story where ‘lords’ delegate work to their ‘vassals’, but the workload intensity is variable, so the vassals have leftover compute, and the lords demand that they spend it on something like blockchain mining. To compensate for the vulnerability this induces, the lords would also provide protection.
PaulK
Yup, all that would certainly make it more complicated. In a regime where this kind of tightly-controlled delegation were really important, we might also demand our counterparties standardize their hardware so they can’t play tricks like this.
I was picturing a more power-asymmetric situation, more like a feudal lord giving his vassals lots of busywork so they don’t have time to plot anything.
We might develop schemes for auditable computation, where one party can come in at any time and check the other party’s logs. They should conform to the source code that the second party is supposed to be running; and also to any observable behavior that the second party has displayed. It’s probably possible to have logging and behavioral signalling be sufficiently rich that the first party can be convinced that that code is indeed being run (without it being too hard to check—maybe with some kind of probabilistically checkable proof).
However, this only provides a positive proof that certain code is being run, not a negative proof that no other code is being run at the same time. This part, I think, inherently requires knowing something about the other party’s computational resources. But if you can know about those, then
it’s possibleit might be possible. For a perhaps dystopian example, if you know your counterparty has compute A, and the program you want them to run takes compute B, then you could demand they do something (difficult but easily checkable) like invert hash functions, that’ll soak up around A-B of their compute, so they have nothing left over to do anything secret with.
Sorry, I guess I didn’t make the connection to your post clear. I substantially agree with you that utility functions over agent-states aren’t rich enough to model real behavior. (Except, maybe, at a very abstract level, a la predictive processing? (which I don’t understand well enough to make the connection precise)).
Utility functions over world-states—which is what I thought you meant by ‘states’ at first—are in some sense richer, but I still think inadequate.
And I agree that utility functions over agent histories are too flexible.
I was sort of jumping off to a different way to look at value, which might have both some of the desirable coherence of the utility-function-over-states framing, but without its rigidity.
And this way is something like, viewing ‘what you value’ or ‘what is good’ as something abstract, something to be inferred, out of the many partial glimpses of it we have in the form of our extant values.
Oh, huh, this post was on the LW front page, and dated as posted today, so I assumed it was fresh, but the replies’ dates are actually from a month ago.
(A somewhat theologically inspired answer:)
Outside the dichotomy of values (in the shard-theory sense) vs. immutable goals, we could also talk about valuing something that is in some sense fixed, but “too big” to fit inside your mind. Maybe a very abstract thing. So your understanding of it is always partial, though you can keep learning more and more about it (and you might shift around, feeling out different parts of the elephant). And your acted-on values would appear mutable, but there would actually be a, perhaps non-obvious, coherence to them.
It’s possible this is already sort of a consequence of shard theory? In the way learned values would have coherences to accord with (perhaps very abstract or complex) invariant structure in the environment?
I still don’t know exactly what parts of my comment you’re responding to. Maybe talking about a concrete sub-agent coordination problem would help ground this more.
But as a general response: in your example it sounds like you already have the problem very well narrowed down, to 3 possibilities with precise probabilities. What if there were 10^100 possibilities instead? Or uncertainty where the full real thing is not contained in the hypothesis space?
This is for logical coordination? How does it help you with that?
IMO, coordination difficulties among sub-agents can’t be waved away so easily. The solutions named, side-channel trades and counterfactual coordination, are both limited.
I would frame the nature of their limits, loosely, like this. In real minds (or at least the human ones we are familiar with), the stuff we care about lives in a high-dimensional space. A mind could be said to be, roughly, a network spanning such a space. A trade between elements (~sub-agents) that are nearby in this space will not be too hard to do directly. But for long-distance trades, side-channel reward will need to flow through a series of intermediaries—this might involve several changes of local currencies (including traded favors or promises). Each local exchange needs to be worthwhile to its participants, and not overload the relationships that it’s piggybacking on.
These long-distance trades can be really difficult to set up sometimes. The same way it would be hard for a random villager in the middle ages in France to send $10 to another random villager in China.
The difficulty depends on things like the size / dimensionality of the space; how well-connected it is; and how much slack is available in the relevant places in the system (for the intermediate elements to wiggle around enough to make all the local trades possible). Note that the need for slack makes this a holistic constraint: if you just have one really important trade to make, then sure, you can probably make it happen, by using up lots of slack (locking a lot of intermediate elements into orientations optimized for that big trade). But you can’t do that for every possible trade. So these issues really show up when you have a lot of heterogeneous trades to make.
Counterfactual (“logical” ) coordination has similar issues. If A and B want to counterfactually coordinate, but they’re far apart in this mind-space, then they can only communicate or understand one another in a limited way, via intermediaries (or via the small # of dimensions they do share). This just makes things harder—hard to get shared meaning, hard to agree on what’s fair, hard to find a solution together that will generalize well instead of being brittle.
BTW, I’m not denying that intelligence (whatever that might mean) helps with all this, but I am denying that it’s a panacea.
Probably some students will actually be quite bothered by this and be left with lingering, subtle confusion and discomfort. It is, in a sense, taking a shortcut past all the objections and alternatives that real humans had historically to these ideas. And IMO some students will be much better served by going the long way around, studying the ideas along with their history.
One response to frame-control-y situations is, instead of making accusations that as you say can lead to a he-said-she-said situation, to personally fall back to a more careful, defensive posture vis a vis framing, accepting that there seem to be strong framing differences among the people here, and communicating this posture to others. In other words, accepting when it seems to be too hard to directly create common knowledge about what is happening at the level of framing.
Random question, tangential to this post in particular (but not the series): should we expect genes to be doing something like geometric rationality in their propagation? When a new gene emerges and starts to spread, even if it greatly increases host fitness on average, its # of copies could easily drop to 0 by chance. So it “should want” to be cautious, like a kelly better, and maximize its growth geometrically rather than arithmetically.
Not sure quite how that logic should cash out though. For one, genes that make their hosts more cautious (reduce fitness variance) should be systematically advantaged by this effect, at least during their early growth phase. More speculatively, to take advantage of this effect optimally, genes should somehow suss out how large their population (# of copies) is and push their host to be risk-taking vs. cautious in a way that’s calibrated to that. Which is maybe biologically plausible?
I don’t actually know much about population genetics though, and would be curious to hear from anyone who does.
Is there an arithmetic vs. geometric rationality thing (a la Scott Garrabrant’s recent series) going on with genes?
Like, at equilibrium, the ratio of different genetic variants should be determined by the arithmetic expectation of the number of copies they pass on to the next generation. But for new variants just starting out, the population size (# of copies of that variant) could easily hit 0 and get wiped out, so it should be more cautious—the population should want to maximize the geometric expectation of its growth rate, like a kelly better.
Does this make sense? I don’t know actual population genetics math.
PaulK’s Shortform
Wow, I came here to say literally the same thing about commensurability: that perhaps AM is for what’s commensurable, and GM is for what’s incommensurable.
Though, one note is that to me it actually seems fine to consider different epistemic viewpoints as incommensurate. These might be like different islands of low K-complexity, that each get some nice traction on the world but in very different ways, and where the path between them goes through inaccessibly-high K-complexity territory.
Another setting that seems natural and gives rise to multiplicative utility is if we are trying to cover as much of a space as possible, and we divide it dimension-wise into subspace, each tracked by a subagent. To get the total size covered, we multiply together the sizes covered within each subspace.
We can kinda shoehorn unequal weighing in here if we have each sub-agent track not just the fractional or absolute coverage of their subspace, but the per-dimension geometric average of their coverage.
For example, say we’re trying to cover a 3D cube that’s 10x10x10, with subagent A minding dimension 1 and subagent B minding dimensions 2 and 3. A particular outcome might involve A having 4⁄10 coverage and B having 81⁄100 coverage, for a total coverage of (4/10)*(81/100), which we could also phrase as (4/10)*(9/10)^2.
I’m not sure how to make uncertainty work correctly within each factor though.
These are super interesting ideas, thanks for writing the sequence!
I’ve been trying to think of toy models where the geometric expectation pops out—here’s a partial one, which is about conjunctivity of values:
Say our ultimate goal is to put together a puzzle (U = 1 if we can, U = 0 if not), for which we need 2 pieces. We have sub-agents A and B who care about the two pieces respectively, each of whose utility for a state is its probability estimates for finding its piece there. Then our expected utility for a state is the product of their utilities (assuming this is a one-shot game, so we need to find both pieces at once), and so our decision-making will be geometrically rational.
This easily generalizes to an N-piece puzzle. But, I don’t know how to extend this interpretation to allow for unequal weighing of agents.
I also think that the fact that AI safety thinking is so much driven by these fear + distraction patterns, is what’s behind the general flail-y nature of so much AI safety work. There’s a lot of, “I have to do something! This is something! Therefore, I will do this!”
I think your diagnosis of the problem is right on the money, and I’m glad you wrote it.
As for your advice on what a person should do about this, it has a strong flavor of: quit doing what you’re doing and go in the opposite direction. I think this is going to be good for some people but not others. Sometimes it’s best to start where you are. Like, one can keep thinking about AI risk while also trying to become more aware of the distortions that are being introduced by these personal and collective fear patterns.
That’s the individual level though, and I don’t want that to deflect from the fact that there is this huge problem at the collective level. (I think rationalist discourse has a libertarian-derived tendency to focus on the former and ignore the latter.)
I wonder if you can recover Kelly from linear utility in money, plus a number of rounds unknown to you and chosen probabilistically from a distribution.