ERA fellow researching technical AI safety, July-August 2023.
Interested in prediction markets and AI alignment.
ERA fellow researching technical AI safety, July-August 2023.
Interested in prediction markets and AI alignment.
Yep I see what you mean, I’ve changed the setup back to what you wrote with V_1 and V_0. My main concern is the part where we quotient V_1 by an equivalence relation to get V, I found this not super intuitive to follow and I’d ideally love to have a simpler way to express it.
The main part I don’t get right now: I see that 1/(c(v+ + w−))*(v+ + w−) and 1/(c(v+ + w−))*(v- + w+) are convex combinations of elements of L and are therefore in L, however it seems to me that these two things being the same corresponds to v+ + w- = v- + w+, which is equivalent to v + v- + w- = v- + w + w- which is equivalent to v=w. So doesn’t that mean that v is equivalent to w iff v=w so our equivalence relation isn’t doing anything? I’m sure I’m misunderstanding something here. Also just wanted to check, v and w should be in V_1 right, rather than V_0? Or is that the source of my confusion.
Hopefully I can try and find a way to express it once I fully understand it, but if you think what you wrote on Overleaf is the simplest way to do it then we’ll use that.
You recognise this in the post and so set things up as follows: a non-myopic optimiser decides the preferences of a myopic agent. But this means your argument doesn’t vindicate coherence arguments as traditionally conceived. Per my understanding, the conclusion of coherence arguments was supposed to be: you can’t rely on advanced agents not to act like expected-utility-maximisers, because even if these agents start off not acting like EUMs, they’ll recognise that acting like an EUM is the only way to avoid pursuing dominated strategies. I think that’s false, for the reasons that I give in my coherence theorems post and in the paragraph above. But in any case, your argument doesn’t give us that conclusion. Instead, it gives us something like: a non-myopic optimiser of a myopic agent can shift probability mass from less-preferred to more-preferred outcomes by probabilistically precommitting the agent to take certain trades in a way that makes its preferences complete. That’s a cool result in its own right, and maybe your post isn’t trying to vindicate coherence arguments as traditionally conceived, but it seems worth saying that it doesn’t.
I might be totally wrong about this, but if you have a myopic agent with preferences A>B, B>C and C>A, it’s not totally clear to me why they would change those preferences to act like an EUM. Sure, if you keep offering them a trade where they can pay small amounts to move in these directions, they’ll go round and round the cycle and only lose money, but do they care? At each timestep, their preferences are being satisfied. To me, the reason you can expect a suitably advanced agent to not behave like this is that they’ve been subjected to a selection pressure / non-myopic optimiser that is penalising their losses.
If the non-myopic optimiser wants the probability of a dominated strategy lower than that, it has to make the agent non-myopic. And in cases where an agent with incomplete preferences is non-myopic, it can avoid pursuing dominated strategies by acting in accordance with the Caprice Rule.
This seems right to me. It feels weird to talk about an agent that has been sufficiently optimized for not pursuing dominated strategies but not for non-myopia. Doesn’t non-myopia dominate myopia in many reasonable setups?
Can you explain more how this might work?
Epistemic Status: Really unsure about a lot of this.
It’s not clear to me that the randomization method here is sufficient for the condition of not missing out on sure gains with probability 1.
Scenario: B is preferred to A, but preference gap between A & C and B & C, as in the post.
Suppose both your subagents agree that the only trades that will ever be offered are A->C and C->B. These trades occur with a Poisson distribution, with = 1 for the first trade and = 3 for the second. Any trade that is offered must be immediately declined or accepted. If I understand your logic correctly, this would mean randomizing the preferences such that
= 1⁄3,
= 1
In the world where one of each trade is offered, the agent always accepts A->C but will only accept C->B 1⁄3 of the time, thus the whole move from A->B only happens with probability 1⁄3. So the agent misses out on sure gains with probability 2⁄3.
In other words, I think you’ve sufficiently shown that this kind of contract can taken a strongly-incomplete agent and make them not-strongly-incomplete with probability >0 but this is not the same as making them not-strongly-incomplete with probability 1, which seems to me to be necessary for expected utility maximization.
Something I have a vague inkling about based on what you and Scott have written is that the same method by which we can rescue the Completeness axiom i.e. via contracts/commitments may also doom the Independence axiom. As in, you can have one of them (under certain premises) but not both?
This may follow rather trivially from the post I linked above so it may just come back to whether that post is ‘correct’, but it might also be a question of trying to marry/reconcile these two frameworks by some means. I’m hoping to do some research on this area in the next few weeks, let me know if you think it’s a dead end I guess!
Really enjoyed this post, my question is how does this intersect with issues stemming from other VNM axioms e.g. Independence as referenced by Scott Garrabrant?
https://www.lesswrong.com/s/4hmf7rdfuXDJkxhfg/p/Xht9swezkGZLAxBrd
It seems to me that you don’t get expected utility maximizers solely from not-strong-Incompleteness, as there are other conditions that are necessary to support that conclusion.
Hi EJT, I’m starting research on incomplete preferences / subagents and would love to see this entry too if possible!
Furthermore, human values are over the “true” values of the latents, not our estimates—e.g. I want other people to actually be happy, not just to look-to-me like they’re happy.
I’m not sure that I’m convinced of this. I think when we say we value reality over our perception it’s because we have no faith in our perception to stay optimistically detached from reality. If I think about how I want my friends to be happy, not just appear happy to me, it’s because of a built-in assumption that if they appear happy to me but are actually depressed, the illusion will inevitably break. So in this sense I care not just about my estimate of a latent variable, but what my future retroactive estimates will be. I’d rather my friend actually be happy than be perfectly faking it for the same reason I save money and eat healthy—I care about future me.
What about this scenario: my friend is unhappy for a year while I think they’re perfectly happy, then at the end of the year they are actually happy but they reveal to me they’ve been depressed for the last year. Why is future me upset in this scenario, why does current me want to avoid this? Well because latent variables aren’t time-specific, I care about the value of latent variables in the future and the past, albeit less so. To summarize: I care about my own happiness across time and future me cares about my friend’s happiness across time, so I end up caring about the true value of the latent variable (my friend’s happiness). But this is an instrumental value, I care about the true value because it affects my estimates, which I care about intrinsically.
Would it perhaps be helpful to think of agent-like behavior as that which takes abstractions as inputs, rather than only raw physical inputs? e.g. an inanimate object such as a rock only interacts with the world on the level of matter, not on the level of abstraction. A rock is affected by wind currents according to the same laws, regardless of the type of wind (breeze, tornado, hurricane), while an agent may take different actions or assume different states dependent on the abstractions the wind has been reduced to in its world model.
Example One: Good Cop / Bad Cop
The classic interrogation trope involves a suspect who is being interrogated by two police officers—one who is friendly and tries to offer to help the suspect, and one who is aggressive and threatens them with the consequences of not cooperating. We can think of the two cops as components of the Police Agent, and they are pursuing two goals: Trust and Fear. Both Trust and Fear are sub-goals for the ultimate goal which is likely a confession, plea deal or something of that nature, but the officers have uncertainty about how much Trust or Fear might motivate the suspect towards this ultimate goal, so they want to maximize both. Here, Trust is built by the “Good Cop” while Fear is built by the “Bad Cop”. Imagine both cops start out pretty evenly split between pursuing Trust and Fear, but the the first cop is naturally a bit more aggressive (this is the “Bad Cop”) and so he is following a slightly aggressive strategy (i.e. geared towards Fear). Vice versa for the second cop (“Good Cop”). Now because the suspect is already afraid of the Bad Cop, the Bad Cop gets more Fear from increasing their aggression than the Good Cop would, as it’s taken more seriously. Similarly, because the suspect is already slightly friendly towards the Good Cop, the Good Cop gets more Trust from increasing their friendliness than the Bad Cop would, because they’ve built rapport and it’s seen as more genuine. Thus, in order to maximize the total amount of Trust and Fear, the Good Cop should completely specialize in friendliness and the Bad Cop should completely specialize in aggression, due to their comparative advantages.
Example Two: Multi-Party Systems
In a multi-party system, you might have three political parties that are all broadly on the same side of the spectrum, with little doubt they’d form a governing coalition if they could. Let’s say there are n groups that these three parties can possible draw votes from, with each group representing a certain policy goal that all three parties would be happy or indifferent towards implementing (example for the left: unions, immigrants, students, renters). So we have three subsystems (each party) and n goals they are try to maximize. Now note that rhetoric / messaging is generally tied to what goals you focus on, and that a given voter may find different rhetoric more or less persuasive for a given political goal. If all three parties attempt to solve all n goals equally, they will likely have very similar rhetoric / messaging due to similar goal focus and composition of their political coalition. This means any given voter will likely be hearing only one version of a message, but hearing it from three different sources, which minimizes the chance of the message landing and maximizes confusion & conflict. If instead, each of the three parties focuses on a different subset of groups and policy goals (based on whatever they are randomly currently inclined towards) , any given voter will hear three distinct messages on similar goals and so will be likelier to be persuaded by one of them—furthermore the messaging is likelier to be more direct and less confused. So to maximize the likelihood of achieving any of the n goals—by maximizing the likelihood of these three parties winning a majority of seats—these parties should specialize in different types of rhetoric. The key aspect of comparative advantage here is that there’s a negative interaction term between all the different groups/goals (representing confusion of rhetoric and wasted messaging) so parties can achieve Pareto-optimal gains by focusing on a specific group/goal and leaving another to a different party. This may seem like a poor explanation for party specialization given the historical ways in which parties have formed, but parties are a revolving door of new members and a system where each party pulls in members from a subset of the n groups and those members then push that party towards advocating for that subset of n groups is stable and maximizes any one group’s chances of success. Friendly party leaders can also talk to each other to achieve some version of this coordination—I’ve seen this in practice in my own country. Note: this idea is not novel to me but the framing of it as comparative advantage is.
Example Three: Student Debt
Here the two sub-systems are you and future you. The multiple goals are money and time. As a student you might be able to get a $15/hr part-time job whereas in the future you could get a job where you expect to be paid twice that. Clearly to maximize money and time in a pareto-optimal way, you’ll refrain from working while being a student, and instead work slightly more when you’re older. This would generally entail taking on a lot of student debt. In this example there are costs since money is not fungible through time and taking on debt incurs interest, but if the comparative advantage is big enough, those can be overcome.
Sorry if this is a stupid question, but is it true that p(X,Y)^2 has the degrees of freedom as you described? If X=Y is a uniform variable on [0,1] then p(X,Y)^2 = 1 but P(f(X),g(Y))^2 =/= 1 for (most) non-linear f and g.
In other words, I thought Pearson correlation is specifically for linear relationships so its variant under non-linear transformations.
I’ve tried to give (see on the post) a different description of an equivalence relation that I find intuitive and I think gives the space V as we want it, but it may not be fully correct.