Futarchy and Unfriendly AI
We have a reasonably clear sense of what “good” is, but it’s not perfect. Suffering is bad, pleasure is good, more people living enjoyable lives is good, yes, but tradeoffs are hard. How much worse is it to go blind than to lose your leg? [1] How do we compare the death of someone at eighty to the death of someone at twelve? If you wanted to build some automated system that would go from data about the world to a number representing how well it’s doing, where you would prefer any world that scored higher to any world scoring lower, that would be very difficult.
Say, however, that you’ve built a metric that you think matches your values well and you put some powerful optimizer to work maximizing that metric. This optimizer might do many things you think are great, but it might be that the easiest ways to maximize the metric are the ones that pull it apart from your values. Perhaps after it’s in place it turns out your metric included many things that only strongly correlated with what you cared about, where the correlation breaks down under maximization.
What confuses me is that the people who warn about this scenario with respect to AI are often the same people in favor of futarchy. They both involve trying to define your values and then setting an indifferent optimizer to work on them. If you think AI would be very dangerous but futarchy would be very good, why?
I also posted this on my blog.
[1] This is a question people working in public health try to answer with Disability Weights for DALYs.
Futarchy has less potential for perverse instantiation because it has fewer degrees of freedom to work with, because it operates on a human time scale but no faster, and because its outputs can be (and will be) ignored if they’re sufficiently ridiculous.
I think the crucial difference between AI and futarchy is that in AI the utility function is decided once an for all. Once a superintelligence is out there, there is no stopping it. On the other hand in futarchy the utility function is determined by some sort of democratic mechanism which operates continuously, that can introduce corrections it if things start going awry.
Can you suggest a scenario in which futarchy would result in a clear negative outcome, something analogous to turning the universe into paper clips?
That’s a low bar: it’s an intentionally silly example. No one actually thinks we’re likely to accidentally create a paperclip maximizer AI anymore than we’re likely to accidentally include a “number of paperclips in the world” term in a futarchy metric. But something as clearly negative would be mandatory wireheading to maximize a “human pleasure” term.
A less extreme (and less clearly negative, but also more likely) example would be maximizing GDP. Hanson often uses GDP as an example of something you could include in a futarchy metric. GDP only counts market work, however, which means you can increase GDP by moving tasks from “do them yourself” to “hire someone”. For example, if I watch my kid that doesn’t count towards GDP, but if I pay you to watch them, and you pay me to do whatever you would otherwise have done during that time, it does.
GDP/person is one of the best metrics for “how is a country doing”, often doing much better than explicit attempts to measure things closer to what we care about, but put a big optimizing push behind it and soon all the tiny tasks we do over the course of the day are pressured into market work.
Can you suggest a scenario in which futarchy would fail to prevent the universe from being turned into paperclips?
Feedback controls. Futarchy is transparent,carried out in real time, and gives plenty of room to adjust values and change strategies if the present ones prove defective. On the other hand, a superintelligent AI would basically run as a black box. The operators would set the values, then the AI would use some method to optimize and then spit out the optimal strategy (and presumably implement it). There’s no room for human feedback between setting the values and implementing the optimal strategy.
This relates to my previous post on confounding in Prediction Markets. In my analysis, if we allow human feedback between setting the values and implementing the strategy, you break the causal interpretation of the prediction market and therefore lose the ability to use it for optimization. This is obviously a trade-off between other considerations that may be more important, but you will run into big problems if the market participants expect there is a significant probability that humans will override the market
Here and elsewhere I’ve advocated* that, rather than using Hanson’s idea of target-values that are objectively verifiable like GDP, futarchy would do better to add human feedback in the stage of the process where it gets decided whether the goals were met or not. Whoever proposed the goal would decide after the prediction deadline expired, and thus could respond to any improper optimizing by refusing to declare the goal “met” even if it technically was met.
[ * You can definitely do better than the ideas on that blog post, of course.]
I can’t say much about the consequences of this, but it appears to me that both democracy and futarchy are efforts to more closely approximate something along the lines of a CEV for humanity. They have the same problems, in fact. How do you reconcile mutually exclusive goals of the people involved?
In any case, that isn’t directly relevant, but linking futarchy with AI caused me to notice that. Perhaps that sort of optimization style, of getting at what we “truly want” once we’ve cleared up all the conflicting meta-levels of “want-to-want”, is something that the same sorts of people tend to promote.
I’m not a big fan of decision making by conditional prediction markets (btw, “futarchy” is an obscure, non-descriptive name. Better call it something like “prophetocracy”), but I think that proponents like Robin Hanson propose that the value system is not set once and for all but regularly updated by a democratically elected government. This should avoid the failure mode you are talking about.
“Futarchy” is the standard term for this governmental system. Perhaps Hanson should have chosen a different name, but that’s the name its been going under for about a decade and I don’t think “prophetocracy” would be an improvement.
It’s not a very well know word, anyway. Would the cost of changing it outweigh the benefit of a relatively self-descriptive word?
What is the alternative? Futarchy is unfriendly, but so is the current government.
Think about the laws that govern these things, and how to use them to make these things better for us.
This is kinda like how futarchy works… STAR WARS or STAR TREK… we let the swarm decide! The difference is that the outcome would be a lot more accurate with futarchy. Why? Because people would be putting their money where their mouths are.
As I pointed out here… AI Safety vs Human Safety… nobody, that I know of, has applied the best method we have for controlling humans (the market) to robots. Which isn’t too surprising since AI largely falls under the scope of computer science. But it’s the “safety” aspect that also falls under the scope of economics. The development of an evil AI is most definitely an inefficient allocation of society’s limited resources.
With futarchy we could bet on which organization/company is most likely to develop harmful AI. We could also bet on which organization is most likely to develop beneficial AI. Then we could shift our money from the former to the latter.
Don’t Give Evil Robots A Leg To Stand On!
On a related point, here’s a post about using swarms to build morality into intelligent systems:
http://unanimousai.com/building-moral/
First using the term “evil” here is a good way to show that you don’t know what you are talking about. We are talking about “unfriendly”.
That said, there are reasons to believe that people who build AGI are overoptimistic in their own creations and might think they produce a useful AGI but actually produce UFAI. As a result there no reason to expect that nobody funds the relevant research.
“Unfriendly” is a tribal signal. The proper term is “unsafe”, but I think that “evil” is a better approximation than “unfriendly” in its standard usage, as opposed to the non-standard usage invented by Yudkowsky.
I always though that “evil” means a malicious intention, while “unfriendly” does harm but not with the intention of doing harm. Compare a standard B-movie rogue robot who hunts humans because of murderous “feelings” it developed out of revenge, fear, envy, or other anthropomorphic qualities, with the paperclip maximizer.
Calling something “evil” applies anthropomorphism to it.
It’s signals that you are talking about the thing this tribe is talking about.
No, it’s a mere signal of allegiance, which you are using to try to shut up the outgroup.
It’s like talking religion with a theist who complains that unless you are referring specifically to Elohim/Jesus/Allah/whatever then you couldn’t possibly say anything meaningful about their religion.
I’m not criticizing semantics out of context to the argument he makes it’s a strawman to claim that everyone who says “evil AI” hasn’t anything meaningful to say.
He speaks about how it’s obvious that nobody funds a evil AI. For some values of evils that’s true. On the other hand it’s not the cases we worry about.
Not sure how you missed it… but I speak about how people should be able to choose where their taxes go. Maybe you missed it because I get swamped with downvotes?
Right now the government engages in activities that some people consider to be immoral. For example, pacifists consider war to be immoral. You think that there’s absolutely nothing wrong with pacifists being forced to fund war. Instead of worrying about how pacifists currently have to give war a leg to stand on… you want to worry about how we’re going to prevent robots from being immoral.
When evilness, like beauty, is in the eye of the beholder… it’s just as futile to try and prevent AIs from being immoral as it is to try and prevent humans from being immoral. What isn’t futile however is to fight for people’s freedom not to invest in immorality.
Any case you worry about is a case where an AI that you consider to be immoral ends up with too many resources at its disposal. Because you’re really not going to worry about...
… a moral AI with significant resources at its disposal
… an immoral AI with insignificant resources at its disposal
So you worry about a case where an immoral AI ends up with too many resources at its disposal. But that’s exactly the same thing that I worry about with humans. And if it’s exactly the same thing that I worry about with humans… then it’s a given that my worry is the same regardless of whether the immoral individual is human, AI, alien or other.
In other words, you have this bizarre double standard for humans and AI. You want to prevent immoral AIs from coming into existence yet you think nothing of forcing humans to give immoral humans a leg to stand on.
Oh gods, you’re doing that again. “How dare you be talking about something other than my pet issue! That proves you’re on the wrong side of my pet issue, which proves you’re inconsistent and insincere!”
There is a reason why you keep getting “swamped with downvotes”. That reason is that you are wasting other people’s time and attention, and appear not to care. As long as you continue to behave in this obnoxious and antisocial fashion, you will continue to get swamped with downvotes. And, not coincidentally, your rudeness and obtuseness will incline people to think less favourably of your proposal. If someone else more reasonable comes along with an economic proposal like yours, the first reaction of people who’ve interacted with you here is likely to be that bit more negative because they’ll associate the idea with rudeness and obtuseness.
Please consider whether that is really what you want.
In the comment that you replied to, I calmly and rationally explained with exceptionally sound logic why my “pet issue” (the efficient allocation of resources) is relevant to the subject of “unfriendly” AI.
Did you calmly and rationally explain why the efficient allocation of resources is not relevant to “unfriendly” AI? Nope.
Nobody on this forum is forced to read or respond to my comments. And obviously I’m not daunted by criticism. So unlike this guy, I’m not going to bravely run away from an abundance of economic ignorance.
And if my calm and rational comments are driving you so crazy… then perhaps it would behoove you to find the bias in your bonnet.
As Eliezer is fond of saying: “A fanatic is someone who can’t change his mind and won’t change the subject.” At least try to be able to change the subject.
Quotation commonly attributed to Churchill, but here’s some weak evidence that he didn’t say it, or at least wasn’t the first to.