TRIGGER WARNING: PHILOSOPHY. All those who believe in truly rigorous, scientifically-grounded reasoning should RUN AWAY VERY QUICKLY.
Abstract: Human beings want to make rational decisions, but their decision-making processes are often inefficient, and they don’t possess direct knowledge of anything we could call their utility functions. Since it is much easier to detect a bad world state than a good one (there are vastly more of them, so less information is needed to classify accurately), humans tend to have an easy time detecting bad states, but this emotional regret is no more useful for formal reasoning about human rationality, since we don’t possess a causal model of it in terms of decision histories and outcomes. We tackle this problem head-on, assuming only that humans can reason over a set of beliefs and a perceived state of the world to generate a probability distribution over actions.
Consider that being one of those poor sods must totally suck. We believe this provides sufficient motivation for wanting to help them out a bit. Unfortunately, doing so is not very simple: since they didn’t evolve as rational creatures, it’s very easy to propose an alternate set of values that captures absolutely nothing of what they actually want out of life. In fact, since they didn’t even evolve as 100% self-aware creatures, their emotional qualia are not even reliable indicators of anything we would call a proper utility function. They know there’s something they want out of life, and they know they don’t know what it is, but that doesn’t help because they still don’t know what it is, and knowledge of ignorance does not magically reduce the ignorance.
So! How can we help them without just overriding them or enslaving them to strange and alien cares? Well, one barest rudiment of rationality with which evolution did manage to bless them is that they don’t always end up “losing”, or suffering. Sometimes, even if only seemingly by luck or by elaborate and informed self-analysis, they do seem to end up pretty happy with themselves, sometimes even over the long term. We believe that with the door to generating Good Ideas For Humans left open even just this tiny crack, we can construct models of what they ought to be doing.
Let’s begin by assuming away the thing we wish we could construct: the human utility function. We are going to reason as if we have no valid grounds to believe there is any such thing, and make absolutely no reference to anything like one. This will ensure that our reasoning doesn’t get circular. Instead of modelling humans as utility maximizers, even flawed ones, we will model them simply as generating a probability distribution over potential actions (from which they would choose their real action) given a set of beliefs and a state of the real world. We will not claim to know or care what causes the probability distribution of potential choices: we just want to construct an algorithm for helping humans know which ones are good.
We can then model human decision making as a two-player game: the human does something, and Nature responds likewise. Lots of rational agents work this way, so it gives us a more-or-less reasonable way of talking algorithmically about how humans live. For any given human at any given time, we could take a decent-sized Maximegalor Ubercomputer and just run the simulation, yielding a full description of how the human lives.
The only step where we need to do anything “weird” is in abstracting the human’s mind and knowledge of the world from the particular state and location of its body at any given timestep in the simulation. This doesn’t mean taking it out of the body, but instead considering what the same state of the mind might do if placed in multiple place-times and situations, given everything they’ve experienced previously. We need this in order to let our simulated humans be genuinely affected and genuinely learn from the consequences of their own actions.
Our game between the simulated human and simulated Nature thus generates a perfectly ordinary game-tree up to some planning horizon H, though it is a probabilistic game tree. Each edge represents a conditional probability of the human or Nature making some move given their current state. The multiplication of all probabilities for all edges along a path from the root-node to a leaf-node represents the conditional probability of that leaf node given the root node. The conditional probabilities attached to all edges leaving an inner node of the tree must sum to 1.0, though there might be a hell of a lot of child nodes. We assume that an actual human would actually execute the most likely action-edge.
Here is where we actually manage a neat trick for defying the basic human irrationality. We mentioned earlier that while humans are usually pretty bummed out about their past decisions, sometimes they’re not. If we can separate bummed-out from not-bummed-out in some formal way, we’ll have a rigorous way of talking about what it would mean for a given action or history to be good for the human in question.
Our proposal is to consider what a human would say if taken back in time and given the opportunity to advise their past self. Or, in simpler simulation terms, we consider how a human’s choices would be changed by finding out the leaf-node consequences of their root- or inner-node actions, simply by transferring the relevant beliefs and knowledge directly into our model of their minds. If, upon being given this leaf-node knowledge, the action yielded as most likely changes, or if the version of the human at the leaf-node would, themselves, were they taken back in time, select another action as most likely, then we take a big black meta-magic marker and scribble over that leaf node as suffering from regret. After all, the human in question could have done something their later self would agree with.
The magic is thus done: coulda + woulda = shoulda. By coloring some (inevitably: most) leaf-nodes as suffering from regret, we can then measure a probability of regret in any human-versus-Nature game-tree up to any planning horizon H: it’s just the sum of all conditional probabilities for all paths from the root node which arrive to a regret-colored leaf-node at or before time H.
We should thus advise the humans to simply treat the probability of arriving to a regret-colored leaf-node as a loss function and minimize it. By construction, this will yield a rational optimization criterion guaranteed not to make the humans run screaming from their own choices, at least not at or before time-step H.
The further out into time we extend H, the better our advice becomes, as it incorporates a deeper and wider sample of the apparent states which a human life can occupy, thus bringing different motivational adaptations to conscious execution, and allowing their reconciliation via reflection. Over sufficient amounts of time, this reflection could maybe even quiet down to a stable state, resulting in the humans selecting their actions in a way that’s more like a rational agent and less like a pre-evolved meat-ape. This would hopefully help their lives be much, much nicer, though we cannot actually formally prove that the limit of the human regret probability converges as the planning horizon grows to plus-infinity—not even to 1.0!
We can also note a couple of interesting properties our loss-function for humans has, particularly its degenerate values and how they relate to the psychology of the underlying semi-rational agent, ie: humans. When the probability of regret equals 1.0, no matter how far out we extend the planning horizon H, it means we are simply dealing with a totally, utterly irrational mind-design: there literally does not exist a best possible world for that agent in which they would never wish to change their former choices. They always regret their decisions, which means they’ve probably got a circular preference or other internal contradiction somewhere. Yikes, though they could just figure out which particular aspect of their own mind-design causes that and eliminate it, leading to an agent design that can potentially ever like its life. The other degenerate probability is also interesting: a chance of regret equalling 0.0 means that the agent is either a completely unreflective idiot, or is God. Even an optimal superintelligence can suffer loss due to not knowing about its environment; it just rids itself of that ignorance optimally as data comes in!
The interesting thing about these degenerate probabilities is that they show our theory to be generally applicable to an entire class of semi-rational agents, not just humans. Anything with a non-degenerate regret probability, or rather, any agent whose regret probability does not converge to a degenerate value in the limit, can be labelled semi-rational, and can make productive use of the regret probabilities our construction calculates regarding them to make better decisions—or at least, decisions they will still endorse when asked later on.
Dropping the sense of humor: This might be semi-useful. Have similar ideas been published in the literature before? And yes, of course I’m human, but it was funnier that way in what would otherwise have been a very dull, dry philosophy post.
From “Coulda” and “Woulda” to “Shoulda”: Predicting Decisions to Minimize Regret for Partially Rational Agents
TRIGGER WARNING: PHILOSOPHY. All those who believe in truly rigorous, scientifically-grounded reasoning should RUN AWAY VERY QUICKLY.
Abstract: Human beings want to make rational decisions, but their decision-making processes are often inefficient, and they don’t possess direct knowledge of anything we could call their utility functions. Since it is much easier to detect a bad world state than a good one (there are vastly more of them, so less information is needed to classify accurately), humans tend to have an easy time detecting bad states, but this emotional regret is no more useful for formal reasoning about human rationality, since we don’t possess a causal model of it in terms of decision histories and outcomes. We tackle this problem head-on, assuming only that humans can reason over a set of beliefs and a perceived state of the world to generate a probability distribution over actions.
Consider rationality: optimizing the world to better and better match a utility function, which is itself complete, transitive, continuous, and gives results which are independent of irrelevant alternatives. Now consider actually existing human beings: creatures who can often and easily be tricked into taking Dutch Book bets through exploitation of their cognitive structure, without even having to go to the trouble of actually deceiving them with regards to specific information.
Consider that being one of those poor sods must totally suck. We believe this provides sufficient motivation for wanting to help them out a bit. Unfortunately, doing so is not very simple: since they didn’t evolve as rational creatures, it’s very easy to propose an alternate set of values that captures absolutely nothing of what they actually want out of life. In fact, since they didn’t even evolve as 100% self-aware creatures, their emotional qualia are not even reliable indicators of anything we would call a proper utility function. They know there’s something they want out of life, and they know they don’t know what it is, but that doesn’t help because they still don’t know what it is, and knowledge of ignorance does not magically reduce the ignorance.
So! How can we help them without just overriding them or enslaving them to strange and alien cares? Well, one barest rudiment of rationality with which evolution did manage to bless them is that they don’t always end up “losing”, or suffering. Sometimes, even if only seemingly by luck or by elaborate and informed self-analysis, they do seem to end up pretty happy with themselves, sometimes even over the long term. We believe that with the door to generating Good Ideas For Humans left open even just this tiny crack, we can construct models of what they ought to be doing.
Let’s begin by assuming away the thing we wish we could construct: the human utility function. We are going to reason as if we have no valid grounds to believe there is any such thing, and make absolutely no reference to anything like one. This will ensure that our reasoning doesn’t get circular. Instead of modelling humans as utility maximizers, even flawed ones, we will model them simply as generating a probability distribution over potential actions (from which they would choose their real action) given a set of beliefs and a state of the real world. We will not claim to know or care what causes the probability distribution of potential choices: we just want to construct an algorithm for helping humans know which ones are good.
We can then model human decision making as a two-player game: the human does something, and Nature responds likewise. Lots of rational agents work this way, so it gives us a more-or-less reasonable way of talking algorithmically about how humans live. For any given human at any given time, we could take a decent-sized Maximegalor Ubercomputer and just run the simulation, yielding a full description of how the human lives.
The only step where we need to do anything “weird” is in abstracting the human’s mind and knowledge of the world from the particular state and location of its body at any given timestep in the simulation. This doesn’t mean taking it out of the body, but instead considering what the same state of the mind might do if placed in multiple place-times and situations, given everything they’ve experienced previously. We need this in order to let our simulated humans be genuinely affected and genuinely learn from the consequences of their own actions.
Our game between the simulated human and simulated Nature thus generates a perfectly ordinary game-tree up to some planning horizon H, though it is a probabilistic game tree. Each edge represents a conditional probability of the human or Nature making some move given their current state. The multiplication of all probabilities for all edges along a path from the root-node to a leaf-node represents the conditional probability of that leaf node given the root node. The conditional probabilities attached to all edges leaving an inner node of the tree must sum to 1.0, though there might be a hell of a lot of child nodes. We assume that an actual human would actually execute the most likely action-edge.
Here is where we actually manage a neat trick for defying the basic human irrationality. We mentioned earlier that while humans are usually pretty bummed out about their past decisions, sometimes they’re not. If we can separate bummed-out from not-bummed-out in some formal way, we’ll have a rigorous way of talking about what it would mean for a given action or history to be good for the human in question.
Our proposal is to consider what a human would say if taken back in time and given the opportunity to advise their past self. Or, in simpler simulation terms, we consider how a human’s choices would be changed by finding out the leaf-node consequences of their root- or inner-node actions, simply by transferring the relevant beliefs and knowledge directly into our model of their minds. If, upon being given this leaf-node knowledge, the action yielded as most likely changes, or if the version of the human at the leaf-node would, themselves, were they taken back in time, select another action as most likely, then we take a big black meta-magic marker and scribble over that leaf node as suffering from regret. After all, the human in question could have done something their later self would agree with.
The magic is thus done: coulda + woulda = shoulda. By coloring some (inevitably: most) leaf-nodes as suffering from regret, we can then measure a probability of regret in any human-versus-Nature game-tree up to any planning horizon H: it’s just the sum of all conditional probabilities for all paths from the root node which arrive to a regret-colored leaf-node at or before time H.
We should thus advise the humans to simply treat the probability of arriving to a regret-colored leaf-node as a loss function and minimize it. By construction, this will yield a rational optimization criterion guaranteed not to make the humans run screaming from their own choices, at least not at or before time-step H.
The further out into time we extend H, the better our advice becomes, as it incorporates a deeper and wider sample of the apparent states which a human life can occupy, thus bringing different motivational adaptations to conscious execution, and allowing their reconciliation via reflection. Over sufficient amounts of time, this reflection could maybe even quiet down to a stable state, resulting in the humans selecting their actions in a way that’s more like a rational agent and less like a pre-evolved meat-ape. This would hopefully help their lives be much, much nicer, though we cannot actually formally prove that the limit of the human regret probability converges as the planning horizon grows to plus-infinity—not even to 1.0!
We can also note a couple of interesting properties our loss-function for humans has, particularly its degenerate values and how they relate to the psychology of the underlying semi-rational agent, ie: humans. When the probability of regret equals 1.0, no matter how far out we extend the planning horizon H, it means we are simply dealing with a totally, utterly irrational mind-design: there literally does not exist a best possible world for that agent in which they would never wish to change their former choices. They always regret their decisions, which means they’ve probably got a circular preference or other internal contradiction somewhere. Yikes, though they could just figure out which particular aspect of their own mind-design causes that and eliminate it, leading to an agent design that can potentially ever like its life. The other degenerate probability is also interesting: a chance of regret equalling 0.0 means that the agent is either a completely unreflective idiot, or is God. Even an optimal superintelligence can suffer loss due to not knowing about its environment; it just rids itself of that ignorance optimally as data comes in!
The interesting thing about these degenerate probabilities is that they show our theory to be generally applicable to an entire class of semi-rational agents, not just humans. Anything with a non-degenerate regret probability, or rather, any agent whose regret probability does not converge to a degenerate value in the limit, can be labelled semi-rational, and can make productive use of the regret probabilities our construction calculates regarding them to make better decisions—or at least, decisions they will still endorse when asked later on.
Dropping the sense of humor: This might be semi-useful. Have similar ideas been published in the literature before? And yes, of course I’m human, but it was funnier that way in what would otherwise have been a very dull, dry philosophy post.