Lol. Somehow made it more clear that it was meant as a hyperbole than did.
Viktor Rehnberg
You might want to consider cross-posting this to EA forum to reach a larger audience.
I’ve been thinking about the Eliezer’s take on the Second Law of Thermodynamics and while I can’t think of a succint comment to drop with it. I think it could bring value to this discussion.
Well I’d say that the difference between your expectations of the future having lived a variant of it or not is only in degree not in kind. Therefore I think there are situations where the needs of the many can outweigh the needs of the one, even under uncertainty. But, I understand that not everyone would agree.
I agree with as a sufficient criteria to only sum over , the other steps I’ll have to think about before I get them.
I found this newer paper https://personal.lse.ac.uk/bradleyr/pdf/Unification.pdf and having skimmed it seemed like it had similar premises but they defined (instead of deriving it).
GovAI is probably one of the densest places to find that. You could also check out FHI’s AI Governance group.
There is no consensus about what constitutes a moral patient and I have seen nothing convincing to rule out that an AGI could be a moral patient.
However, when it comes to AGI some extreme measures are needed.
I’ll try with an analogy. Suppose that you traveled back in time to Berlin 1933. Hitler has yet to do anything significantly bad but you still expect his action to have some really bad consequences.
Now I guess that most wouldn’t feel terribly conflicted about removing Hitler’s right of privacy or even life to prevent Holocaust.
For a longtermist the risks we expect from AGI are order of magnitudes worse than the Holocaust.
Have these issues been discussed somewhere in the canon?
The closest thing of this being discussed that I can think of is when it comes to Suffering Risks from AGI. The most clear cut example (not necessarily probable) is if an AGI would spin up sub-processes that simulate humans that experience immense suffering. Might be that you find something if you search for that.
Didn’t you use that . I can see how to extend the derivation for more steps but only if . The sums
and
for arbitrary are equal if and only if .
The other alternative I see is if (and I’m unsure about this) we assume that and for .
What I would think that would mean is after we’ve updated probabilities and utilities from the fact that is certain.
I think that would be the first one but I’m not sure.I can’t tell which one that would be.
General (even if mutually exclusive) is tricky I’m not sure the expression is as nice then.
that was one of the premises, no? You expect utility from your prior.
Some first reflections on the results before I go into examining all the steps.
Hmm, yes my expression seems wrong when I look at it a second time. I think I still confused the timesteps and should have written
The extra negation comes from a reflex from when not using Jeffrey’s decision theory. With Jeffrey’s decision theory it reduces to your expression as the negated terms sum to . But, still I probably should learn not to guess at theorems and properly do all steps in the future. I suppose that is a point in favor for Jeffrey’s decision theory that the expressions usually are cleaner.
As for your derivation you used that in the derivation but that is not the case for general . This is a note to self to check whether this still holds for .
Edit: My writing is confused here disregard it. My conclusion is still
Your expression for is nice
and what I would have expected. The problem I had was that I didn’t realize that (which should have been obvious). Furthermore your expression checks out with my toy example (if remove the false expectation I had before).
Consider a lottery where you guess the sequence of 3 numbers and , and are the corresponding propositions that you guessed correctly and and . You only have preferences over whether you win or not .
Ah, those timestep subscripts are just what I was missing. I hadn’t realised how much I needed that grounding until I noticed how good it felt when I saw them.
So to summarise (below all sets have mutually exclusive members). In Jeffrey-ish notation we say have the axiom
and normally you would want to indicate what distribution you have over in the left-hand side. However, we always renormalize such that the distribution is our current prior. We can indicate this by labeling the utilities from what timestep (and agent should probably included as well, but lets skip this for now).
That way we don’t have to worry about being shifted during the sum in the right hand side or something. (I mean notationally that would just be absurd, but if I would sit down and estimate the consequences of possible actions I wouldn’t be able to not let this shift my expectation for what action I should take before I was done.).
We can also bring up the utility of an action to be
Furthermore, for most actions it is quite clear that we can drop the subscript as we know that we are considering the same timestep consistently for the same calculation
Now I’m fine with this because I will have those subscript s in the back of my mind.
I still haven’t commented on in general or . My intuition is that they should be able to be described from , and , but it isn’t immediately obvious to me how to do that while keeping .
I tried considering a toy case where and () and then
but I couldn’t see how it would be possible without assuming some things about how , and relate to each other which I can’t in general.
Well, deciding to do action would also make it utility 0 (edit: or close enough considering remaining uncertainties) even before it is done. At least if you’re committed to the action and then you could just as well consider the decision to be the same as the action.
It would mean that a “perfect” utility maximizer always does the action with utility (edit: but the decision can have positive utility(?)). Which isn’t a problem in any way except that it is alien to how I usually think about utility.
Put in another way. While I’m thinking about which possible action I should take the utilities fluctuate until I’ve decided for an action and then that action has utility . I can see the appeal of just considering changes to the status quo, but the part where everything jumps around makes it an extra thing for me to keep track of.
Oh, I think I see what confuses me. In the subjective utility framework the expected utilities are shifted to after each Bayesian update?
So then utility of doing action to prevent a Doom is . But when action has been done then the utility scale is shifted again.
Ok, so this is a lot to take in, but I’ll give you my first takes as a start.
My only disagreement prior to your previous comment seems to be in the legibility of the desirability axiom for which I think should contain some reference to the actual probabilities of and .
Now, I gather that this disagreement probably originates from the fact that I defined while in your framework .
Something that appears problematic to me is if we consider the tautology (in Jeffrey notation) . This would mean that reducing the risk of has net utility. In particular, certain and certain are equally preferable (). Which I don’t thing either of us agree with. Perhaps I’ve missed something.
What I found confusing with was that to me this reads as which should always(?) depend on but with this notation it is hidden to me. (Here I picked as the mutually exclusive event , but I don’t think it should remove much from the point).
That is also why I want some way of expressing that in the notation. I could imagine writing as that is the cleanest way I can come up with to satisfy both of us. Then with expected utility .
When we accept the expected utility hypothesis then we can always write it as a expectation/sum of its parts and then there is no confusion either.
Hmm, I usually don’t think too deeply about the theory so I had to refresh somethings to answer this.
First off, the expected utility hypothesis is apparently implied by the VNM axioms. So that is not something needed to add on. To be honest I usually only think of a coherent preference ordering and expected utilities as two seperate things and hadn’t realized that VNM combines them.
About notation, with I mean the utility of getting with certainty and with I mean the utility of getting with probability . If you don’t have the expected utility hypothesis I don’t think you can separate an event from its probability. I tried to look around to the usual notation but didn’t find anything great.
Wikipedia used something like
where is a random variable over the set of states . Then I’d say that the expected utility hypothesis is the step .
Having read some of your other comments. I expect you to ask if the top preference of a thermostat is it’s goal temperature? And to this I have no good answer.
For things like a thermostat and a toy robot you can obviously see that there is a behavioral objective which we could use to infer preferences. But, is the reason that thermostats are not included in utility calculations that behavioral objective does not actually map to a preference ordering or that their weight when aggregated is 0.
Perhaps for most they don’t have this in the back of their mind when they think of utility. But, for me this is what I’m thinking about. The aggregation is still confusing to me, but as a simple case example. If I want to maximise total utility and am in a situation that only impacts a single entity then increasing utility is the same to me as getting this entity in for them more preferable states.
Sure, I’ve found it to be an interesting framework to think in so I suppose someone else might too. You’re the one who’s done the heavy lifting so far so I’ll let you have an executive role.
If you want me to write up a first draft I can probably do it end of next week. I’m a bit busy for at least the next few days.