A rather condensed “clarification”: “Objective morality” is equivalent to the objectively optimal decision policy/theory, which my intuition says might warrant the label “objectively optimal” due to reasons hinted at in this thread, though it’s possible that “optimal” is the wrong word to use here and “justified” is a more natural choice. An oracle can be constructed from Chaitin’s omega, which allows for hypercomputation. A decision policy that didn’t make use of knowledge of ALL the bits of Chaitin’s omega is less optimal/justified than a decision policy that did make use of that knowledge. Such an omniscient (at least within the standard models of computation) decision policy can serve as an objective standard against which we can compare approximations in the form of imperfect human-like computational processes with highly ambiguous “belief”-”preference” mixtures. By hypothesis the implications of the “existence” of such an objective standard would seem to be subtle and far-reaching.
The decisions produced by any decision theory are not objectively optimal; at best they might be objectively optimal for a specific utility function. A different utility function will produce different “optimal” behavior, such as tiling the universe with paperclips. (Why do you think Eliezer et al are spending so much effort trying to figure out how to design a utility function for an AI?)
I see the connection between omega and decision theories related to Solomonoff induction, but as the choice of utility function is more-or-less arbitrary, it doesn’t give you an objective morality.
His point is that if I fix your goals (say, narrow self-interest) the defensible policies still don’t look much like short-sighted goal pursuit (in some environments, for some defensible notions of “defensible”). It may be that all sufficiently wise agents pursue the same goals because of decision theoretic considerations, by implicitly bargaining with each other and together pursuing some mixture of all of their values. Perhaps if you were wiser, you too would pursue this “overgoal,” and in return your self-interest would be served by other agents in the mixture.
While plausible, this doesn’t look super likely right now. Will would get a few Bayes points if it pans out, though the idea isn’t due to him. (A continuum of degrees of altruism have been conjectured to be justified from a self-interested perspective, if you are sufficiently wise. This is the most extreme, Drescher has proposed a narrower view which still captures many intuitions about morality, and weaker forms that still capture at least a few important moral intuitions, like cooperation on PD, seem well supported.)
The connection to omega isn’t so clear. It looks like it could just be concealing some basic intuitions about computability and approximation. It seems like a way of smuggling in mysticism, which is misleading by being superfluous rather than incoherent.
It may be that all sufficiently wise agents pursue the same goals because of decision theoretic considerations, by implicitly bargaining with each other and together pursuing some mixture of all of their values.
But how does an agent introduce its values in the mixture? The agent is the way it decides, so at least in one interpretation its values must be reflected in its decisions (reasons for its decisions), seen in them, even if in a different interpretation its decisions reflect the mixed values of all things (for that is one thing the agent might want to take into account, as it becomes more capable of doing so).
Why do I write this comment? I decided to do so, which tells something about the way I decide. Why do I write this comment? According to the laws of physics. There seems to be no interesting connection between such explanations, even though both of them hold, and there is peril in confusing them (for example, nihilist ethical ideas following form physical determinism).
Your comment did clarify for me what Will was talking about. This is an important confusion (to untangle).
Agent’s counterfactual actions feel like a wrong joint to me. I expect agent’s assertion of its own values has more to do with the interval between what’s known about the reasons for its decisions (including to itself, where introspection and mutual introspection is deep) and the decisions themselves, the same principle that doesn’t let it know its decisions in advance of whenever the decisions “actually” happen (as opposed to being enacted on precommitments). In particular, counterfactual behavior can also be taken as decided upon at some point visible to those taking that property (expression of values) into account.
But how does an agent introduce its values in the mixture?
I don’t accept Will’s overall position or reasoning but this particular part is relatively straightforward. It’s just the same as how anyone negotiates. In this case the negotiation is just a little… indirect. (Expanded below.)
The agent is the way it decides, so at least in one interpretation its values must be reflected in its decisions (reasons for its decisions), seen in them, even if in a different interpretation its decisions reflect the mixed values of all things (for that is one thing the agent might want to take into account, as it becomes more capable of doing so).
An agent’s decisions are determined by it’s values but this relationship is many to one. For any given circumstances that an agent could be in all sorts of preferences will end up resolving to the same decision. If you decided to throw away all that information by only considering what can be inferred from the resultant decision then you will end up wrong. More importantly this isn’t what the other agents will be doing so you will be wrong about them too.
Consider the coordination game as described by paulfchristiano, adopted as a metaphysics of morality by Will and that I’ll consider as a counterfactual:
There are a bunch of agents located beyond the range at which they can physically or causally interact. (This premise is sometimes includes altogether esoteric degrees of non-interaction.)
The values of each agent includes things that can be influenced by the other agents.
The agents have full awareness of both the values and decision procedures of all the other agents.
It is trivial* to see that this game reduces to equivalent to a simple two party prisoners dilemma with full mutual information. Each agent calculates the most efficient self interested bargains that could be made between them all and chooses to either act as if those bargains have been made or doesn’t depending on whether it (reliably) predicts the other agents do likewise.
For all the agents when we look at their behavior we see them all acting equivalently to whatever the negotiated outcome comes out to. That tells us little about their individual values—we’ve thrown that information away and just elected to keep “Cooperate with negotiated preferences”. But the individual preferences have been ‘thrown into the mix’ already back at the point where each of the agents considers the expected behavior of the others. (And there is no way that one of the agents will cooperate without it’s values in the mix and all the agents like to win, etc, etc, and a lot more ‘trivial’.)
I don’t accept the premises here and so definitely don’t accept any ‘universal morality’ but the “But how does an agent introduce its values in the mixture?” just isn’t the weakpoint of the reasoning. It’s tangent and to the extent that it is presented as objection it is a red herring.
It is trivial* to see that this game reduces to equivalent to a simple two party prisoners dilemma with full mutual information.
It only reduces to/is equivalent to a prisoner’s dilemma for certain utility functions (what you’re calling “values”). The prisoners’ dilemma is characterized by the fact that there is a dominant strategy equilibrium which is not Pareto optimal. But if the utility functions of the agents are such that the game is zero-sum, then this can’t be the case, as every outcome is Pareto optimal in a zero-sum game.
Furthermore, in a zero-sum game, no cooperation between all of the agents is possible. So it’s crazy to believe that an arbitrary set of sufficiently intelligent agents will cooperate to achieve a single “overgoal”. Collaboration is only possible if the agents’ preferences are such that collaboration can be mutually beneficial.
Yes, this entire scenario is based around scenarios where there is benefit to cooperation. In the edge case where such benefit is ‘0 expected utilons’ the behavior of the agents will, unsurprisingly, not be changed at all by the considerations we are talking about.
So I should interpret Will’s “Omega = objective morality” comment as meaning “sufficiently wise agents sometimes cooperate, when cooperation is the best way to achieve their ends”? I don’t think so.
So I should interpret Will’s “Omega = objective morality” comment as meaning “sufficiently wise agents sometimes cooperate, when cooperation is the best way to achieve their ends”?
No. Will thinks thought along these lines then goes ahead and bites imaginary bullets.
I didn’t intend to suggest throwing out information: a “public” decision, the action, is not the only decision that happens, and there is also “a priori” of agent’s whole initial construction. Rather, my point was that there is more to agent’s values than just the agent as it’s initially presented, with its future decisions marking the points where additional information (for the purposes of other decisions that coordinate) is being revealed, even if those decisions follow deterministically from agent’s initial construction.
I’m not sure if I’d get many Bayes points for my beliefs, rather than just my intuitions; after taking into account others’ intuitions I don’t think I think it’s that much more plausible than others think it is.
I wish I could respond to the rest of your comment but am too flustered; hopefully I’ll be able to later. What stands out as a possible miscontrual-with-a-different-idea is that I’m not sure if this idea of selfness as in narrow self interest even makes sense. If it does make sense then my intuition is probably wrong for the same reason various universal instrumental value hypotheses are probably wrong.
I’m not sure if this idea of selfness as in narrow self interest even makes sense.
Well, one can certainly talk about agents who have what we might describe as “narrow self-interest,” though I don’t really care about the distinction between self-interest and paperclipping and so on, which do seem to be well-defined.
E.g., whenever I experience something I add it to a list of experiences. I get a distribution over infinite lists of experiences by applying Solomonoff induction. At each moment I define my values in terms of that, and then try and maximize them (this is reflectively inconsistent—I’ll quickly modify to have copy-altruistic values, but still to something that looks pretty self-interested).
Are you claiming that this sort of definition is incoherent, or just that such agents appear to act in service of universal values once they are wise enough?
Are you claiming that this sort of definition is incoherent, or just that such agents appear to act in service of universal values once they are wise enough?
If “wise enough” is taken to mean “not instantaneously implosive/self-defeating” and “universal values” is taken to mean “decision problem representations robust against instantaneous implosion/self-defeat”, then the latter option, but in practice that amounts to a claim of incoherence; in other words the described agent is incoherent/inconsistent and thus its description is implicitly incoherent upon naive interpretation. Or to put it differently, I’m still not convinced it’s possible to build an AGI with a naive “prove the Goldbach conjecture”-style utility function, and so I’m hesitant to accept the validity of admittedly common sense reasoning that takes Goedel machine or AIXI-style architectures at face value as premises.
This carving up of the problem in such a way that “universal values” stands out as a thing seems wrong to me; the most obvious way of interpreting “universal values” is in some object level way that connotes deluding ones “self” into seeing Pareto improvements that don’t exist or deluding ones “self” into locating/defining ones “self” via some normatively unjustified process.
I can write down TDT agents who have preferences about abstract worlds, e.g. in which the agent is instantiated on an ideal Turing machine and utility is just defined in terms of mathematical properties of the output (say, whether it is a proof of the Goldbach conjecture) and the running time.
Is the objection before or after this point?
I can write down TDT agents who care about the number of 1s in universally distributed sequences agreeing with their observations so far (as I remarked above). Do you think this agent definition implodes, or that the resulting agents just don’t act as self-interested as they look like they would? (Particularly I’m talking about the ones who actually are in simple universes, so who can quickly rule out concerns about simulators, and who don’t rely on others’ generosity).
(I’m trying to repeat things in many different ways so as to increase the chance that I’m understood; apologies if the repetition is needless.)
Is the objection before or after this point?
Before, but again my objection is sort of orthogonal to the way you’ve set up the scenario. When you say you can write down TDT “agents” I don’t believe you. I believe you can write down specifications of syntax-manipulating algorithms that will solve tic tac toe or other narrow problems just fine, and I of course believe that it’s physically possible to call such algorithms “agents” if such a fancy appeals to you, but I don’t confidently believe that they are or could ever be real agents in the way that word is commonly interpreted. (“Intelligence, to be useful, must be used for something other than defeating itself.”) You can interpret such a syntax manipulator as an agent to the extent that you can interpret the planet Saturn as an agent, but this is qualitatively different from talking about real agentic things like humans or gods, and I’m worried about pivoting on this word “agent” as if conclusions drawn in one domain can be routinely expected to work for the other. There is some math about an abstract thing called expected utility, and we can use roughly that conceptual scheme to conveniently label certain syntax-manipulating algorithms or to roughly carve up the world as we see it, but this doesn’t mean that things like “beliefs” or “preferences” actually exist out there in the world in any reliable metaphysical sense such that we can be confident of our application of them beyond their intended purview. So when you say:
Do you think this agent definition implodes, or that the resulting agents just don’t act as self-interested as they look like they would?
I don’t know how to interpret this question in a way that I’m confident makes sense. I certainly want to know how to interpret it but would have to think about it a lot longer. Perhaps if I was more familiar with both the relevant arguments from the formal epistemology literature and the philosophy of mind literature then I would be able to confidently interpret it.
So I can write down these formal symbol-manipulating algorithms, that look to a naive onlooker like they will do things like keep to themselves and prove the Goldbach conjecture. We can talk about the question of fact: if we run such an algorithm on a Turing machine (made of math), would it in fact output a proof of the Goldbach conjecture? And then we can talk about the other question of fact, which seems to be equivalent unless you dispute some very fundamental claims: if we simulate that computation on a real computer, will it in fact output a proof of the Goldbach conjecture?
It seems like one could try and cut this sort of reasoning at three points, if you accept it so far: either it breaks down when the goals get complicated, it breaks down when the reasoning gets hard, or it breaks down when the algorithm’s embedding in the environment is too complicated.
If you accept that these algorithms systematically do things that lead to their apparent “goals” being satisfied (so that we can predict outcomes using this sort of reasoning), then I don’t know what exactly you are arguing.
The connection to omega isn’t so clear. It looks like it could just be concealing some basic intuitions about computability and approximation. It seems like a way of smuggling in mysticism, which is misleading by being superfluous rather than incoherent.
I thought about it some more and remembered one connection. I’ll post it to the discussion section if it makes sense upon reflection. The basic idea is that Agent X can manipulate the prior of Agent Y but not its preferences, so Agent X gives Agent Y a perverse prior that forces it to optimize for the preferences of Agent X. Running this in reverse gives us a notion of an objectively false preference.
Unfortunately I think you’ll have to familiarize yourself more with the existent decision theory literature here on LW or on the decision theory mailing list in order to understand what I’m getting at. I’m already rather familiar with the standard arguments for FAI. If you’re already a member of the decision theory list then the most relevant thing to read would be Nesov’s talking about decision processes splitting off into coordinated subagents upon making observations. That at least hints in the right direction.
(I am mildly surprised that you have no idea what I’m talking about even after having read the thread I linked to that hints at the intuitions behind a creatorless decision theory. It’s not a very complicated idea, even if it might look uncomfortably like some hidden agenda promoting values deathism.)
(I still don’t see how that note could be recognized from the information you provided. Thank you for some clarity, I only wish you’d respect it more. I also remain ignorant about how the note relates to what you were discussing, but here’s an excuse to revisit that construction.)
The note in question mostly talks about a way in which an observation can shift agent’s focus of attention without changing its decision problem or potential state of knowledge. Agent’s preference stays the same.
The decision in question is where an agent focuses on seeing the implications of a particular observation (that is, gets to infer more in a particular direction, using the original premises), while mostly ignoring the implications of alternative observations (that is, inferring less from the same premises in other directions), thus mostly losing track of the counterfactual worlds where the observation turns out differently, leaving those worlds to its alternative versions. In doing so, the agent loses coordination with its versions in those alternative worlds, so its decisions will now be more about its own individual actions and not (or less) about the strategy coordinating it with the counterfactual versions of itself. In return, it gains more computational resources to devote to its particular subproblem.
This is one sense in which observations can act like knowledge (something to update on, focus on implications of) without getting more directly involved in agent’s reasoning algorithm, so that we can keep an agent updateless, in principle able to take counterfactuals into account. In this case, an agent is rather more computationally restricted than what UDT plays with, and it’s this restriction that motivates using observations in an updating-like manner, which is possible to do in this way without actually updating away the counterfactuals.
It is a compulsion of mine that given a choice between giving zero information and giving a small amount of information I must give a small amount or feel guilty for not even having tried to do the right thing. Likely leads to Goodhartian problems. I don’t have introspective access to the utility calculus that resulted in this compulsion.
E.g. in this case: Bla bla additive utility versus multiplicativeish “belief” self-coordination versus coordination with others computational complexity bla. Philosophy PSR and CFAI causal validity blah, Markovian causality includes formal/final causes. Extracting bits of Chaitin’s constant from “environment” bla. Bla don’t know if at equilibrium with respect to optimization after infinite time, unclear whether to act as if stars are twenty dollar bill on busy street or not.
Re friendliness, Loebian problems might cause collapse of recursive Bayes AI architectures via wireheading and so on, Goedel machine limits with strength of axioms, stronger axiom sets have self-reference problems. If true this would change singularity strategy, don’t have to worry as much about scary AIs unless they can solve Loebian problems indirectly.
ETA: Accidentally hit comment before editing/finishing but I’ll accept that as a sign from God.
It is a compulsion of mine that given a choice between giving zero information and giving a small amount of information I must give a small amount or feel guilty for not even having tried to do the right thing.
False dichotomy. In the same number of words you could be communicating much more clearly.
I’m very confused* about the alleged relationship between objective morality and Chaitin’s omega. Could you please clarify?
*Or rather, if I’m to be honest, I suspect that you may be confused.
A rather condensed “clarification”: “Objective morality” is equivalent to the objectively optimal decision policy/theory, which my intuition says might warrant the label “objectively optimal” due to reasons hinted at in this thread, though it’s possible that “optimal” is the wrong word to use here and “justified” is a more natural choice. An oracle can be constructed from Chaitin’s omega, which allows for hypercomputation. A decision policy that didn’t make use of knowledge of ALL the bits of Chaitin’s omega is less optimal/justified than a decision policy that did make use of that knowledge. Such an omniscient (at least within the standard models of computation) decision policy can serve as an objective standard against which we can compare approximations in the form of imperfect human-like computational processes with highly ambiguous “belief”-”preference” mixtures. By hypothesis the implications of the “existence” of such an objective standard would seem to be subtle and far-reaching.
The decisions produced by any decision theory are not objectively optimal; at best they might be objectively optimal for a specific utility function. A different utility function will produce different “optimal” behavior, such as tiling the universe with paperclips. (Why do you think Eliezer et al are spending so much effort trying to figure out how to design a utility function for an AI?)
I see the connection between omega and decision theories related to Solomonoff induction, but as the choice of utility function is more-or-less arbitrary, it doesn’t give you an objective morality.
His point is that if I fix your goals (say, narrow self-interest) the defensible policies still don’t look much like short-sighted goal pursuit (in some environments, for some defensible notions of “defensible”). It may be that all sufficiently wise agents pursue the same goals because of decision theoretic considerations, by implicitly bargaining with each other and together pursuing some mixture of all of their values. Perhaps if you were wiser, you too would pursue this “overgoal,” and in return your self-interest would be served by other agents in the mixture.
While plausible, this doesn’t look super likely right now. Will would get a few Bayes points if it pans out, though the idea isn’t due to him. (A continuum of degrees of altruism have been conjectured to be justified from a self-interested perspective, if you are sufficiently wise. This is the most extreme, Drescher has proposed a narrower view which still captures many intuitions about morality, and weaker forms that still capture at least a few important moral intuitions, like cooperation on PD, seem well supported.)
The connection to omega isn’t so clear. It looks like it could just be concealing some basic intuitions about computability and approximation. It seems like a way of smuggling in mysticism, which is misleading by being superfluous rather than incoherent.
But how does an agent introduce its values in the mixture? The agent is the way it decides, so at least in one interpretation its values must be reflected in its decisions (reasons for its decisions), seen in them, even if in a different interpretation its decisions reflect the mixed values of all things (for that is one thing the agent might want to take into account, as it becomes more capable of doing so).
Why do I write this comment? I decided to do so, which tells something about the way I decide. Why do I write this comment? According to the laws of physics. There seems to be no interesting connection between such explanations, even though both of them hold, and there is peril in confusing them (for example, nihilist ethical ideas following form physical determinism).
Presumably by what its action would have been, if not for the relationship between its actions and the actions of the other agents in the mixture.
I agree that the situation is confused at best, but it seems like this is a coherent picture of behavior, if the mechanics remain muddy.
Your comment did clarify for me what Will was talking about. This is an important confusion (to untangle).
Agent’s counterfactual actions feel like a wrong joint to me. I expect agent’s assertion of its own values has more to do with the interval between what’s known about the reasons for its decisions (including to itself, where introspection and mutual introspection is deep) and the decisions themselves, the same principle that doesn’t let it know its decisions in advance of whenever the decisions “actually” happen (as opposed to being enacted on precommitments). In particular, counterfactual behavior can also be taken as decided upon at some point visible to those taking that property (expression of values) into account.
I don’t accept Will’s overall position or reasoning but this particular part is relatively straightforward. It’s just the same as how anyone negotiates. In this case the negotiation is just a little… indirect. (Expanded below.)
An agent’s decisions are determined by it’s values but this relationship is many to one. For any given circumstances that an agent could be in all sorts of preferences will end up resolving to the same decision. If you decided to throw away all that information by only considering what can be inferred from the resultant decision then you will end up wrong. More importantly this isn’t what the other agents will be doing so you will be wrong about them too.
Consider the coordination game as described by paulfchristiano, adopted as a metaphysics of morality by Will and that I’ll consider as a counterfactual:
There are a bunch of agents located beyond the range at which they can physically or causally interact. (This premise is sometimes includes altogether esoteric degrees of non-interaction.)
The values of each agent includes things that can be influenced by the other agents.
The agents have full awareness of both the values and decision procedures of all the other agents.
It is trivial* to see that this game reduces to equivalent to a simple two party prisoners dilemma with full mutual information. Each agent calculates the most efficient self interested bargains that could be made between them all and chooses to either act as if those bargains have been made or doesn’t depending on whether it (reliably) predicts the other agents do likewise.
For all the agents when we look at their behavior we see them all acting equivalently to whatever the negotiated outcome comes out to. That tells us little about their individual values—we’ve thrown that information away and just elected to keep “Cooperate with negotiated preferences”. But the individual preferences have been ‘thrown into the mix’ already back at the point where each of the agents considers the expected behavior of the others. (And there is no way that one of the agents will cooperate without it’s values in the mix and all the agents like to win, etc, etc, and a lot more ‘trivial’.)
I don’t accept the premises here and so definitely don’t accept any ‘universal morality’ but the “But how does an agent introduce its values in the mixture?” just isn’t the weakpoint of the reasoning. It’s tangent and to the extent that it is presented as objection it is a red herring.
* In the come back 20 minutes later and say “Oh, it’s trivial” sense.
It only reduces to/is equivalent to a prisoner’s dilemma for certain utility functions (what you’re calling “values”). The prisoners’ dilemma is characterized by the fact that there is a dominant strategy equilibrium which is not Pareto optimal. But if the utility functions of the agents are such that the game is zero-sum, then this can’t be the case, as every outcome is Pareto optimal in a zero-sum game.
Furthermore, in a zero-sum game, no cooperation between all of the agents is possible. So it’s crazy to believe that an arbitrary set of sufficiently intelligent agents will cooperate to achieve a single “overgoal”. Collaboration is only possible if the agents’ preferences are such that collaboration can be mutually beneficial.
Yes, this entire scenario is based around scenarios where there is benefit to cooperation. In the edge case where such benefit is ‘0 expected utilons’ the behavior of the agents will, unsurprisingly, not be changed at all by the considerations we are talking about.
So I should interpret Will’s “Omega = objective morality” comment as meaning “sufficiently wise agents sometimes cooperate, when cooperation is the best way to achieve their ends”? I don’t think so.
No. Will thinks thought along these lines then goes ahead and bites imaginary bullets.
I don’t think that’s a very good model. Also, I’m curious: what’s your impression of this quote?
Worse than useless.
I didn’t intend to suggest throwing out information: a “public” decision, the action, is not the only decision that happens, and there is also “a priori” of agent’s whole initial construction. Rather, my point was that there is more to agent’s values than just the agent as it’s initially presented, with its future decisions marking the points where additional information (for the purposes of other decisions that coordinate) is being revealed, even if those decisions follow deterministically from agent’s initial construction.
I’m not sure if I’d get many Bayes points for my beliefs, rather than just my intuitions; after taking into account others’ intuitions I don’t think I think it’s that much more plausible than others think it is.
I wish I could respond to the rest of your comment but am too flustered; hopefully I’ll be able to later. What stands out as a possible miscontrual-with-a-different-idea is that I’m not sure if this idea of selfness as in narrow self interest even makes sense. If it does make sense then my intuition is probably wrong for the same reason various universal instrumental value hypotheses are probably wrong.
Well, one can certainly talk about agents who have what we might describe as “narrow self-interest,” though I don’t really care about the distinction between self-interest and paperclipping and so on, which do seem to be well-defined.
E.g., whenever I experience something I add it to a list of experiences. I get a distribution over infinite lists of experiences by applying Solomonoff induction. At each moment I define my values in terms of that, and then try and maximize them (this is reflectively inconsistent—I’ll quickly modify to have copy-altruistic values, but still to something that looks pretty self-interested).
Are you claiming that this sort of definition is incoherent, or just that such agents appear to act in service of universal values once they are wise enough?
If “wise enough” is taken to mean “not instantaneously implosive/self-defeating” and “universal values” is taken to mean “decision problem representations robust against instantaneous implosion/self-defeat”, then the latter option, but in practice that amounts to a claim of incoherence; in other words the described agent is incoherent/inconsistent and thus its description is implicitly incoherent upon naive interpretation. Or to put it differently, I’m still not convinced it’s possible to build an AGI with a naive “prove the Goldbach conjecture”-style utility function, and so I’m hesitant to accept the validity of admittedly common sense reasoning that takes Goedel machine or AIXI-style architectures at face value as premises.
This carving up of the problem in such a way that “universal values” stands out as a thing seems wrong to me; the most obvious way of interpreting “universal values” is in some object level way that connotes deluding ones “self” into seeing Pareto improvements that don’t exist or deluding ones “self” into locating/defining ones “self” via some normatively unjustified process.
I can write down TDT agents who have preferences about abstract worlds, e.g. in which the agent is instantiated on an ideal Turing machine and utility is just defined in terms of mathematical properties of the output (say, whether it is a proof of the Goldbach conjecture) and the running time.
Is the objection before or after this point?
I can write down TDT agents who care about the number of 1s in universally distributed sequences agreeing with their observations so far (as I remarked above). Do you think this agent definition implodes, or that the resulting agents just don’t act as self-interested as they look like they would? (Particularly I’m talking about the ones who actually are in simple universes, so who can quickly rule out concerns about simulators, and who don’t rely on others’ generosity).
(I’m trying to repeat things in many different ways so as to increase the chance that I’m understood; apologies if the repetition is needless.)
Before, but again my objection is sort of orthogonal to the way you’ve set up the scenario. When you say you can write down TDT “agents” I don’t believe you. I believe you can write down specifications of syntax-manipulating algorithms that will solve tic tac toe or other narrow problems just fine, and I of course believe that it’s physically possible to call such algorithms “agents” if such a fancy appeals to you, but I don’t confidently believe that they are or could ever be real agents in the way that word is commonly interpreted. (“Intelligence, to be useful, must be used for something other than defeating itself.”) You can interpret such a syntax manipulator as an agent to the extent that you can interpret the planet Saturn as an agent, but this is qualitatively different from talking about real agentic things like humans or gods, and I’m worried about pivoting on this word “agent” as if conclusions drawn in one domain can be routinely expected to work for the other. There is some math about an abstract thing called expected utility, and we can use roughly that conceptual scheme to conveniently label certain syntax-manipulating algorithms or to roughly carve up the world as we see it, but this doesn’t mean that things like “beliefs” or “preferences” actually exist out there in the world in any reliable metaphysical sense such that we can be confident of our application of them beyond their intended purview. So when you say:
I don’t know how to interpret this question in a way that I’m confident makes sense. I certainly want to know how to interpret it but would have to think about it a lot longer. Perhaps if I was more familiar with both the relevant arguments from the formal epistemology literature and the philosophy of mind literature then I would be able to confidently interpret it.
This does help with clarity.
So I can write down these formal symbol-manipulating algorithms, that look to a naive onlooker like they will do things like keep to themselves and prove the Goldbach conjecture. We can talk about the question of fact: if we run such an algorithm on a Turing machine (made of math), would it in fact output a proof of the Goldbach conjecture? And then we can talk about the other question of fact, which seems to be equivalent unless you dispute some very fundamental claims: if we simulate that computation on a real computer, will it in fact output a proof of the Goldbach conjecture?
It seems like one could try and cut this sort of reasoning at three points, if you accept it so far: either it breaks down when the goals get complicated, it breaks down when the reasoning gets hard, or it breaks down when the algorithm’s embedding in the environment is too complicated.
If you accept that these algorithms systematically do things that lead to their apparent “goals” being satisfied (so that we can predict outcomes using this sort of reasoning), then I don’t know what exactly you are arguing.
I thought about it some more and remembered one connection. I’ll post it to the discussion section if it makes sense upon reflection. The basic idea is that Agent X can manipulate the prior of Agent Y but not its preferences, so Agent X gives Agent Y a perverse prior that forces it to optimize for the preferences of Agent X. Running this in reverse gives us a notion of an objectively false preference.
Unfortunately I think you’ll have to familiarize yourself more with the existent decision theory literature here on LW or on the decision theory mailing list in order to understand what I’m getting at. I’m already rather familiar with the standard arguments for FAI. If you’re already a member of the decision theory list then the most relevant thing to read would be Nesov’s talking about decision processes splitting off into coordinated subagents upon making observations. That at least hints in the right direction.
(I have no idea what Will is talking about; I don’t even see which things I wrote on the list he is referring to.)
Edit: Both issues now resolved, with Paul clarifying Will’s point and Will explicitly linking to the decision theory list post.
(“A note on observation and logical uncertainty”, January 20, 2011.)
(I am mildly surprised that you have no idea what I’m talking about even after having read the thread I linked to that hints at the intuitions behind a creatorless decision theory. It’s not a very complicated idea, even if it might look uncomfortably like some hidden agenda promoting values deathism.)
(I still don’t see how that note could be recognized from the information you provided. Thank you for some clarity, I only wish you’d respect it more. I also remain ignorant about how the note relates to what you were discussing, but here’s an excuse to revisit that construction.)
The note in question mostly talks about a way in which an observation can shift agent’s focus of attention without changing its decision problem or potential state of knowledge. Agent’s preference stays the same.
The decision in question is where an agent focuses on seeing the implications of a particular observation (that is, gets to infer more in a particular direction, using the original premises), while mostly ignoring the implications of alternative observations (that is, inferring less from the same premises in other directions), thus mostly losing track of the counterfactual worlds where the observation turns out differently, leaving those worlds to its alternative versions. In doing so, the agent loses coordination with its versions in those alternative worlds, so its decisions will now be more about its own individual actions and not (or less) about the strategy coordinating it with the counterfactual versions of itself. In return, it gains more computational resources to devote to its particular subproblem.
This is one sense in which observations can act like knowledge (something to update on, focus on implications of) without getting more directly involved in agent’s reasoning algorithm, so that we can keep an agent updateless, in principle able to take counterfactuals into account. In this case, an agent is rather more computationally restricted than what UDT plays with, and it’s this restriction that motivates using observations in an updating-like manner, which is possible to do in this way without actually updating away the counterfactuals.
It is a compulsion of mine that given a choice between giving zero information and giving a small amount of information I must give a small amount or feel guilty for not even having tried to do the right thing. Likely leads to Goodhartian problems. I don’t have introspective access to the utility calculus that resulted in this compulsion.
E.g. in this case: Bla bla additive utility versus multiplicativeish “belief” self-coordination versus coordination with others computational complexity bla. Philosophy PSR and CFAI causal validity blah, Markovian causality includes formal/final causes. Extracting bits of Chaitin’s constant from “environment” bla. Bla don’t know if at equilibrium with respect to optimization after infinite time, unclear whether to act as if stars are twenty dollar bill on busy street or not.
Re friendliness, Loebian problems might cause collapse of recursive Bayes AI architectures via wireheading and so on, Goedel machine limits with strength of axioms, stronger axiom sets have self-reference problems. If true this would change singularity strategy, don’t have to worry as much about scary AIs unless they can solve Loebian problems indirectly.
ETA: Accidentally hit comment before editing/finishing but I’ll accept that as a sign from God.
False dichotomy. In the same number of words you could be communicating much more clearly.