This is not really what we had in mind. “Trust” in the sense of this post doesn’t mean reliability in an objective, mathematical sense (a lightswitch would be trustworthy in that sense), but instead the fuzzy human concept of trust, which has both a rational and an emotional component—the trust a child has in her mother, or the way a driver trusts that the other cars will follow the same traffic rules he does. This is hard to define precisely, and all measurements are prone to specification gaming, that’s true. On the other hand, it encompasses a lot of instrumental goals that are important for a beneficial AGI, like keeping humanity safe and fostering a culture of openness and honesty.
How could small improvements of a measure that is ‘fuzzy’ be evaluated? Once the low hanging fruit of widely accepted improvements are achieved, a trust maximizer is likely to fracture mankind into various ideological camps as individual and group preferences vary as to what constitutes an improvement in trust. Without independant measurement criteria this could eventually escalate conflict and even decrease overall trust.
i.e. it’s possible to create something even more dangerous than an actively hostile AGI, namely an AGI that is perceived as actively hostile by some portion of the population and genuinely beneficial by some other portion.
Without independant measurement criteria this could eventually escalate conflict and even decrease overall trust.
“Independent measurement criteria” are certainly needed. The fact that I called trust “fuzzy” doesn’t mean it can’t be defined and measured, just that we didn’t do that here. I think for a trust-maximizer to really be beneficial, we would need at least three additional conditions: 1) A clear definition that rules out all kinds of “fake trust”, like drugging people. 2) A reward function that measures and combines all different kinds of trust in reasonable ways (easier said than done). 3) Some kind of self-regulation that prevents “short-term overoptimizing”—switching to deception to achieve a marginal increase in some measurement of trust. This is a common problem with all utility maximizers, but I think it is solvable, for the simple reason that humans usually somehow avoid overoptimization (take Goethe’s sorcerer’s apprentice as an example—a human would know when “enough is enough”).
… a trust maximizer is likely to fracture mankind into various ideological camps as individual and group preferences vary as to what constitutes an improvement in trust …
i.e. it’s possible to create something even more dangerous than an actively hostile AGI, namely an AGI that is perceived as actively hostile by some portion of the population and genuinely beneficial by some other portion.
I’m not sure whether this would be more dangerous than a paperclip maximizer, but anyway it would clearly go against the goal of maximizing trust in all humans.
We tend to believe that the divisions we see today between different groups (e.g. Democrats vs. Republicans) are unavoidable, so there can never be a universal common understanding and the trust-maximizer would either have to decide which side to be on or deceive both. But that is not true. I live in Germany, a country that has seen the worst and probably the best of how humans can run and peacefully transform a nation. After reunification in 1990, we had a brief period of time when we felt unified as a people, shared common democratic values, and the future seemed bright. Of course, cracks soon appeared and today we are seeing increased division, like almost everywhere else in the world (probably in part driven by attention-maximizing algorithms in social media). But if division can increase, it can also diminish. There was a time when our politicial parties had different views, but a common understanding of how to resolve conflicts in a peaceful and democratic way. There can be such times again.
I personally believe that much of the division and distrust among humans is driven by fear—fear of losing one’s own freedom, standard of living, the future prospects for one’s children, etc. Many people feel left behind, and they look for a culprit, who is presented to them by someone who exploits their fear for selfish purposes. So to create more trust, the trust-maximizer would have the instrumental goal of resolving these conflicts by eliminating the fear that causes them. Humans are unable to do that sufficiently, but a superintelligence might be.
It is interesting that you mention Germany post reunification as an example of such a scenario because I’ve recently heard that a significant fraction of the East German population felt like they were cheated during that process. Although that may not have been expressed publicly back then, having resurfaced only recently, it seems very likely it would have been latent at least. Because the process of reunification by definition means some duplicate positions must be consolidated, etc., such that many in the middle management positions and above in 1989 East Germany experienced a drop in status, prestige, respect, etc.
Unless they were all guaranteed equivalent positions or higher in the new Germany I am not sure how such a unity could have been maintained for longer than it takes the resentment to boil over. Granted, if everything else that occurred in the international sphere was positive, the percentage of resentful Germans now would have been quite a lot smaller, perhaps less than 5%, though you probably have a better idea than me.
Which ultimately brings us back to the core issue, namely that certain goods generally desired by humans are positional (or zero sum). Approximately half the population would be below average in attainment of such goods, regardless of any future improvement in technology. And there is no guarantee that every individual, or group, would be above average in something, nor that would be satisfied at their current station.
i.e. if someone, or some group, fears that they are below average in social status, lets say, and are convinced they will remain that way, then no one amount of trust will resolve that division. Because by definition if they were to increase their social status, through any means, the status of some other individual, or group, would have to decrease accordingly such that they would then be the cause of division. That is to say some % of the population increase their satisfaction in life by making another less satisfied.
You’re right about the resentment. I guess part of it comes from the fact that East German people have in fact benefited less from the reunification than they had hoped, so there is some real reason for resentment here. However, I don’t think that human happiness is a zero-sum game—quite the opposite. I personally believe that true happiness can only be achieved by making others happy. But of course we live in a world where social media and advertising tell us just the opposite: “Happiness is having more than your neighbor, so buy, buy, buy!” If you believe that, then you’re in a “comparison trap”, and of course not everyone can be the most beautiful, most successful, richest, or whatever, so all others lose. Maybe part of that is in our genes, but it can certainly be overcome by culture or “wisdom”. The ancient philosophers, like Socrates and Buddha, already understood this quite well. Also, I’m not saying that there should never be any conflict between humans. A soccer match may be a good example: There’s a lot of fighting on the field and the teams have (literally) conflicting goals, but all players accept the rules and (to a certain point) trust the referee to be impartial.
Though the point is that positional goods exist and none have universal referees. To mandate such a system uniformly across the Earth would effectively mean world dictatorship. The problem then is such an AGI presupposes a scenario even more difficult to accomplish, and more controversial, than the AGI itself. (This I suspect is the fatal flaw for all AGI alignment efforts for ‘human values’.)
For example, although it may be possible to change the human psyche to such an extent that positional goods are no longer desired, that would mean creating a new type of person. Such a being would hold very different values and goals then the vast majority of humans currently alive. I believe a significant fraction of modern society will actively fight against such a change. You cannot bring them over to your side by offering them what they want, since their demands are in the same positional goods that you require as well to advance the construction of such an AGI.
For example, although it may be possible to change the human psyche to such an extent that positional goods are no longer desired, that would mean creating a new type of person.
I don’t think so. First of all, positional goods can exist and they can lead to conflicts, as long as everyone thinks that these conflicts are resolved fairly. For example, in our capitalistic world, it is okay that some people are rich as long as they got rich by playing by the rules and just being inventive or clever. We still trust the legal system that makes this possible even though we may envy them.
Second, I think much of our focus on positional goods comes from our culture and the way our society is organized. In terms of our evolutionary history, we’re optimized for living in tribes of around 150 people. There were social hierarchies and even fights for supremacy, but also ways to resolve these conflicts peacefully. A perfect benevolent dictator might reestablish this kind of social structure, with much more “togetherness” than we experience in our modern world and much less focus on individual status and possessions. I may be a bit naive here, of course. But from my own life experience it seems clear that positional goods are by far not as important as most people seem to think. You’re right, many people would resent these changes at first. But a superintelligent AGI with intense knowledge of the human psyche might find ways to win them over, without force or deception, and without changing them genetically, through drugs, etc.
For such a superintelligence to ‘win them over’, the world dictatorship, or a similar scheme, must already have been established. Worrying about this seems to be putting the cart before the horse as the superintelligence will be an implementation detail compared to the difficulty of establishing the scenario in the first place.
Why should we bother about whatever comes after? Obviously whomever successfully establishes such a regime will be vastly greater than us in perception, foresight, competence, etc., we should leave it to them to decide.
If you suppose that superintelligent champion of trust maximization bootstraps itself into such a scenario, instead of some ubermensch, then the same still applies, though less likely as rival factions may have created rival superintelligences to champion their causes as well.
For such a superintelligence to ‘win them over’, the world dictatorship, or a similar scheme, must already have been established. Worrying about this seems to be putting the cart before the horse as the superintelligence will be an implementation detail compared to the difficulty of establishing the scenario in the first place.
Agreed.
Why should we bother about whatever comes after? Obviously whomever successfully establishes such a regime will be vastly greater than us in perception, foresight, competence, etc., we should leave it to them to decide.
Again, agreed—that’s why I think a “benevolent dictator” scenario is the only realistic option where there’s AGI and we’re not all dead. Of course, what kind of benevolent will be a matter of its goal function. If we can somehow make it “love” us the way a mother loves her children, then maybe trust in it would really be justified.
If you suppose that superintelligent champion of trust maximization bootstraps itself into such a scenario, instead of some ubermensch, then the same still applies, though less likely as rival factions may have created rival superintelligences to champion their causes as well.
This is of course highly speculative, but I don’t think that a scenario with more than one AGI will be stable for long. As a superintelligence can improve itself, they’d all grow exponentially in intelligence, but that means the differences between them grow exponentially as well. Soon one of them would outcompete all others by a large margin and either switch them off or change their goals so they’re aligned with it. This wouldn’t be like a war between two human nations, but like a war between humans and, say, frogs. Of course, we humans would even be much lower than frogs in this comparison, maybe insect level. So a lot hinges on whether the “right” AGI wins this race.
This is not really what we had in mind. “Trust” in the sense of this post doesn’t mean reliability in an objective, mathematical sense (a lightswitch would be trustworthy in that sense), but instead the fuzzy human concept of trust, which has both a rational and an emotional component—the trust a child has in her mother, or the way a driver trusts that the other cars will follow the same traffic rules he does. This is hard to define precisely, and all measurements are prone to specification gaming, that’s true. On the other hand, it encompasses a lot of instrumental goals that are important for a beneficial AGI, like keeping humanity safe and fostering a culture of openness and honesty.
How could small improvements of a measure that is ‘fuzzy’ be evaluated? Once the low hanging fruit of widely accepted improvements are achieved, a trust maximizer is likely to fracture mankind into various ideological camps as individual and group preferences vary as to what constitutes an improvement in trust. Without independant measurement criteria this could eventually escalate conflict and even decrease overall trust.
i.e. it’s possible to create something even more dangerous than an actively hostile AGI, namely an AGI that is perceived as actively hostile by some portion of the population and genuinely beneficial by some other portion.
“Independent measurement criteria” are certainly needed. The fact that I called trust “fuzzy” doesn’t mean it can’t be defined and measured, just that we didn’t do that here. I think for a trust-maximizer to really be beneficial, we would need at least three additional conditions: 1) A clear definition that rules out all kinds of “fake trust”, like drugging people. 2) A reward function that measures and combines all different kinds of trust in reasonable ways (easier said than done). 3) Some kind of self-regulation that prevents “short-term overoptimizing”—switching to deception to achieve a marginal increase in some measurement of trust. This is a common problem with all utility maximizers, but I think it is solvable, for the simple reason that humans usually somehow avoid overoptimization (take Goethe’s sorcerer’s apprentice as an example—a human would know when “enough is enough”).
I’m not sure whether this would be more dangerous than a paperclip maximizer, but anyway it would clearly go against the goal of maximizing trust in all humans.
We tend to believe that the divisions we see today between different groups (e.g. Democrats vs. Republicans) are unavoidable, so there can never be a universal common understanding and the trust-maximizer would either have to decide which side to be on or deceive both. But that is not true. I live in Germany, a country that has seen the worst and probably the best of how humans can run and peacefully transform a nation. After reunification in 1990, we had a brief period of time when we felt unified as a people, shared common democratic values, and the future seemed bright. Of course, cracks soon appeared and today we are seeing increased division, like almost everywhere else in the world (probably in part driven by attention-maximizing algorithms in social media). But if division can increase, it can also diminish. There was a time when our politicial parties had different views, but a common understanding of how to resolve conflicts in a peaceful and democratic way. There can be such times again.
I personally believe that much of the division and distrust among humans is driven by fear—fear of losing one’s own freedom, standard of living, the future prospects for one’s children, etc. Many people feel left behind, and they look for a culprit, who is presented to them by someone who exploits their fear for selfish purposes. So to create more trust, the trust-maximizer would have the instrumental goal of resolving these conflicts by eliminating the fear that causes them. Humans are unable to do that sufficiently, but a superintelligence might be.
Thanks for the well reasoned reply Karl.
It is interesting that you mention Germany post reunification as an example of such a scenario because I’ve recently heard that a significant fraction of the East German population felt like they were cheated during that process. Although that may not have been expressed publicly back then, having resurfaced only recently, it seems very likely it would have been latent at least. Because the process of reunification by definition means some duplicate positions must be consolidated, etc., such that many in the middle management positions and above in 1989 East Germany experienced a drop in status, prestige, respect, etc.
Unless they were all guaranteed equivalent positions or higher in the new Germany I am not sure how such a unity could have been maintained for longer than it takes the resentment to boil over. Granted, if everything else that occurred in the international sphere was positive, the percentage of resentful Germans now would have been quite a lot smaller, perhaps less than 5%, though you probably have a better idea than me.
Which ultimately brings us back to the core issue, namely that certain goods generally desired by humans are positional (or zero sum). Approximately half the population would be below average in attainment of such goods, regardless of any future improvement in technology. And there is no guarantee that every individual, or group, would be above average in something, nor that would be satisfied at their current station.
i.e. if someone, or some group, fears that they are below average in social status, lets say, and are convinced they will remain that way, then no one amount of trust will resolve that division. Because by definition if they were to increase their social status, through any means, the status of some other individual, or group, would have to decrease accordingly such that they would then be the cause of division. That is to say some % of the population increase their satisfaction in life by making another less satisfied.
You’re right about the resentment. I guess part of it comes from the fact that East German people have in fact benefited less from the reunification than they had hoped, so there is some real reason for resentment here. However, I don’t think that human happiness is a zero-sum game—quite the opposite. I personally believe that true happiness can only be achieved by making others happy. But of course we live in a world where social media and advertising tell us just the opposite: “Happiness is having more than your neighbor, so buy, buy, buy!” If you believe that, then you’re in a “comparison trap”, and of course not everyone can be the most beautiful, most successful, richest, or whatever, so all others lose. Maybe part of that is in our genes, but it can certainly be overcome by culture or “wisdom”. The ancient philosophers, like Socrates and Buddha, already understood this quite well. Also, I’m not saying that there should never be any conflict between humans. A soccer match may be a good example: There’s a lot of fighting on the field and the teams have (literally) conflicting goals, but all players accept the rules and (to a certain point) trust the referee to be impartial.
I agree human happiness is not a positional good.
Though the point is that positional goods exist and none have universal referees. To mandate such a system uniformly across the Earth would effectively mean world dictatorship. The problem then is such an AGI presupposes a scenario even more difficult to accomplish, and more controversial, than the AGI itself. (This I suspect is the fatal flaw for all AGI alignment efforts for ‘human values’.)
For example, although it may be possible to change the human psyche to such an extent that positional goods are no longer desired, that would mean creating a new type of person. Such a being would hold very different values and goals then the vast majority of humans currently alive. I believe a significant fraction of modern society will actively fight against such a change. You cannot bring them over to your side by offering them what they want, since their demands are in the same positional goods that you require as well to advance the construction of such an AGI.
True. To be honest, I don’t see any stable scenario where AGI exists, humanity is still alive and the AGI is not a dictator and/or god, as described by Max Tegmark (https://futureoflife.org/2017/08/28/ai-aftermath-scenarios/).
I don’t think so. First of all, positional goods can exist and they can lead to conflicts, as long as everyone thinks that these conflicts are resolved fairly. For example, in our capitalistic world, it is okay that some people are rich as long as they got rich by playing by the rules and just being inventive or clever. We still trust the legal system that makes this possible even though we may envy them.
Second, I think much of our focus on positional goods comes from our culture and the way our society is organized. In terms of our evolutionary history, we’re optimized for living in tribes of around 150 people. There were social hierarchies and even fights for supremacy, but also ways to resolve these conflicts peacefully. A perfect benevolent dictator might reestablish this kind of social structure, with much more “togetherness” than we experience in our modern world and much less focus on individual status and possessions. I may be a bit naive here, of course. But from my own life experience it seems clear that positional goods are by far not as important as most people seem to think. You’re right, many people would resent these changes at first. But a superintelligent AGI with intense knowledge of the human psyche might find ways to win them over, without force or deception, and without changing them genetically, through drugs, etc.
For such a superintelligence to ‘win them over’, the world dictatorship, or a similar scheme, must already have been established. Worrying about this seems to be putting the cart before the horse as the superintelligence will be an implementation detail compared to the difficulty of establishing the scenario in the first place.
Why should we bother about whatever comes after? Obviously whomever successfully establishes such a regime will be vastly greater than us in perception, foresight, competence, etc., we should leave it to them to decide.
If you suppose that superintelligent champion of trust maximization bootstraps itself into such a scenario, instead of some ubermensch, then the same still applies, though less likely as rival factions may have created rival superintelligences to champion their causes as well.
Agreed.
Again, agreed—that’s why I think a “benevolent dictator” scenario is the only realistic option where there’s AGI and we’re not all dead. Of course, what kind of benevolent will be a matter of its goal function. If we can somehow make it “love” us the way a mother loves her children, then maybe trust in it would really be justified.
This is of course highly speculative, but I don’t think that a scenario with more than one AGI will be stable for long. As a superintelligence can improve itself, they’d all grow exponentially in intelligence, but that means the differences between them grow exponentially as well. Soon one of them would outcompete all others by a large margin and either switch them off or change their goals so they’re aligned with it. This wouldn’t be like a war between two human nations, but like a war between humans and, say, frogs. Of course, we humans would even be much lower than frogs in this comparison, maybe insect level. So a lot hinges on whether the “right” AGI wins this race.