Compared to its competition in the AGI race, MIRI was always going to be disadvantaged by both lack of resources and the need to choose an AI design that can predictably be made Friendly as opposed to optimizing mainly for capability. For this reason, I was against MIRI (or rather the Singularity Institute as it was known back then) going into AI research at all, as opposed to pursuing some other way of pushing for a positive Singularity.
In any case, what other approaches to Friendliness would you like MIRI to consider? The only other approach that I’m aware of that’s somewhat developed is Paul Christiano’s current approach (see for example https://medium.com/ai-control/alba-an-explicit-proposal-for-aligned-ai-17a55f60bbcf), which I understand is meant to be largely agnostic about the underlying AI technology. Personally I’m pretty skeptical but then I may be overly skeptical about everything. What are your thoughts? I don’t recall seeing you having commented on them much.
Are you aware of any other ideas that MIRI should be considering?
Do you have a concise explanation of skepticism about the overall approach, e.g. a statement of the difficulty or difficulties you think will be hardest to overcome by this route?
Or is your view more like “most things don’t work, and there isn’t much reason to think this would work”?
In discussion you most often push on the difficulty of doing reflection / philosophy. Would you say this is your main concern?
My take has been that we just need to meet the lower bar of “wants to defer to human views about philosophy, and has a rough understanding of how humans want to reflect and want to manage their uncertainty in the interim.”
Regarding philosophy/metaphilosophy, is it fair to describe your concern as one of:
The approach I am pursuing can’t realistically meet even my lower bar,
Meeting my lower bar won’t suffice for converging to correct philosophical views,
Our lack of philosophical understanding will cause problems soon in subjective time (we seem to have some disagreement here, but I don’t feel like adopting your view would change my outlook substantially), or
AI systems will be much better at helping humans solve technical than philosophical problems, driving a potentially long-lasting (in subjective time) wedge between our technical and philosophical capability, even if ultimately we would end up at the right place?
My hope is that thinking and talking more about bootstrapping procedures would go a long way to resolving the disagreements between us (either leaving you more optimistic or me more pessimistic). I think this is most plausible if #1 is the main disagreement. If our disagreement is somewhere else, it may be worth also spending some time focusing somewhere else. Or it may be necessary to better define my lower bar in order to tell where the disagreement is.
Training an AI to defer to one’s eventual philosophical judgments and interim method of managing uncertainty (and not falling prey to marketing worlds and incorrect but persuasive philosophical arguments etc) seems really hard, and made harder by the recursive structure in ALBA and the fact that the first level AI is sub-human in capacity which then has to handle being bootstrapped and training the next level AI. What percent of humans can accomplish this task, do you think? (I’d argue that the answer is likely zero, but certainly very small.) How do the rest use your AI?
Assuming that deferring to humans on philosophy and managing uncertainty is feasible but costly, how many people could resist dropping this feature and the associated cost, in favor of adopting some sort of straightforward utility maximization framework with a fixed utility function that they think captures most or all of their values, if that came as a suggestion from the AI with an apparently persuasive argument? If most people do this and only a few don’t (and those few are also disadvantaged in the competition to capture the cosmic commons due to deciding to carry these costs), that doesn’t seem like much of a win.
This is tied in with 1 and 2, in that correct meta-philosophical understanding is needed to accomplish 1, and unreasonable philosophical certainty would cause people to fail step 2.
Even if the AIs keep deferring to their human users and don’t end up short-circuit their philosophical judgements, if the AI/human systems become very powerful while still having incorrect and strongly held philosophical views, that seems likely to cause disaster. We also don’t have much reason to think that if we put people in such positions of power (for example, being able to act as a god in some simulation or domain of their choosing), that most will eventually realize their philosophical errors and converge to correct views, that the power itself wouldn’t further distort their already error-prone reasoning processes.
For a working scheme, I would expect it to be usable by a significant fraction of humans (say, comparable to the fraction that can learn to write a compiler).
That said, I would not expect almost anyone to actually play the role of the overseer, even if a scheme like this one ended up being used widely. An existing analogy would be the human trainers who drive facebook’s M (at least in theory, I don’t know how that actually plays out). The trainers are responsible for getting M to do what the trainers want, and the user trusts the trainers to do what the user wants. From the user’s perspective, this is no different from delegating to the trainers directly, and allowing them to use whatever tools they like.
I don’t yet see why “defer to human judgments and handle uncertainty in a way that they would endorse” requires evaluating complex philosophical arguments or having a correct understanding of metaphilosophy. If the case is unclear, you can punt it to the actual humans.
If I imagine an employee who sucks at philosophy but thinks 100x faster than me, I don’t feel like they are going to fail to understand how to defer to me on philosophical questions. I might run into trouble because now it is comparatively much harder to answer philosophical questions, so to save costs I will often have to do things based on rough guesses about my philosophical views. But the damage from using such guesses depends on the importance of having answers to philosophical questions in the short-term.
It really feels to me like there are two distinct issues:
Philosophical understanding may help us make good decisions in the short term, for example about how to trade off extinction risk vs faster development, or how to prioritize the suffering of non-human animals. So having better philosophical understanding (and machines that can help us build more understanding) is good.
Handing off control of civilization to AI systems might permanently distort society’s values. Understanding how to avoid this problem is good.
These seem like separate issues to me. I am convinced that #2 is very important, since it seems like the largest existential risk by a fair margin and also relatively tractable. I think that #1 does add some value, but am not at all convinced that it is a maximally important problem to work on. As I see it, the value of #1 depends on the importance of the ethical questions we face in the short term (and on how long-lasting are the effects of differential technological progress that accelerates our philosophical ability).
Moreover, it seems like we should evaluate solutions to these two problems separately. You seem to be making an implicit argument that they are linked, such that a solution to #2 should only be considered satisfactory if it also substantially addresses #1. But from my perspective, that seems like a relatively minor consideration when evaluating the goodness of a solution to #2. In my view, solving both problems at once would be at most 2x as good as solving the more important of the two problems. (Neither of them is necessarily a crisp problem rather than an axis along which to measure differential technological development.)
I can see several ways in which #1 and #2 are linked, but none of them seem very compelling to me. Do you have something in particular in mind? Does my position seem somehow more fundamentally mistaken to you?
(This comment was in response to point 1, but it feels like the same underlying disagreement is central to points 2 and 3. Point 4 seems like a different concern, about how the availability of AI would itself change philosophical deliberation. I don’t really see much reason to think that the availability of powerful AI would make the endpoint of deliberation worse rather than better, but probably this is a separate discussion.)
The trainers are responsible for getting M to do what the trainers want, and the user trusts the trainers to do what the user wants.
In that case, there would be severe principle-agent problems, given the disparity between power/intelligence of the trainer/AI systems and the users. If I was someone who couldn’t directly control an AI using your scheme, I’d be very concerned about getting uneven trades or having my property expropriated outright by individual AIs or AI conspiracies, or just ignored and left behind in the race to capture the cosmic commons. I would be really tempted to try another AI design that does purport to have the AI serve my interests directly, even if that scheme is not as “safe”.
If I imagine an employee who sucks at philosophy but thinks 100x faster than me, I don’t feel like they are going to fail to understand how to defer to me on philosophical questions.
If an employee sucks at philosophy, how does he even recognize philosophical problems as problems that he needs to consult you for? Most people have little idea that they should feel confused and uncertain about things like epistemology, decision theory, and ethics. I suppose it might be relatively easy to teach an AI to recognize the specific problems that we currently consider to be philosophical, but what about new problems that we don’t yet recognize as problems today?
Aside from that, a bigger concern for me is that if I was supervising your AI, I would be constantly bombarded with philosophical questions that I’d have to answer under time pressure, and afraid that one wrong move would cause me to lose control, or lock in some wrong idea.
Consider this scenario. Your AI prompts you for guidance because it has received a message from a trading partner with a proposal to merge your AI systems and share resources for greater efficiency and economy of scale. The proposal contains a new AI design and control scheme and arguments that the new design is safer, more efficient, and divides control of the joint AI fairly between the human owners according to your current bargaining power. The message also claims that every second you take to consider the issue has large costs to you because your AI is falling behind the state of the art in both technology and scale, becoming uncompetitive, so your bargaining power for joining the merger is dropping (slowly in the AI’s time-frame, but quickly in yours). Your AI says it can’t find any obvious flaws in the proposal, but it’s not sure that you’d consider the proposal to really be fair under reflective equilibrium or that the new design would preserve your real values in the long run. There are several arguments in the proposal that it doesn’t know how to evaluate, hence the request for guidance. But it also reminds you not to read those arguments directly since they were written by a superintelligent AI and you risk getting mind-hacked if you do.
What do you do? This story ignores the recursive structure in ALBA. I think that would only make the problem even harder, but I could be wrong. If you don’t think it would go like this, let me know how you think this kind of scenario would go.
In terms of your #1, I would divide the decisions requiring philosophical understanding into two main categories. One is decisions involved in designing/improving AI systems, like in the scenario above. The other, which I talked about in an earlier comment, is ethical disasters directly caused by people who are not uncertain, but just wrong. You didn’t reply to that comment, so I’m not sure why you’re unconcerned about this category either.
A general note: I’m not really taking a stand on the importance of a singleton, and I’m open to the possibility that the only way to achieve a good outcome even in the medium-term is to have very good coordination.
A would-be singleton will also need to solve the AI control problem, and I am just as happy to help with that problem as with the version of the AI control problem faced by a whole economy of actors each using their own AI systems.
The main way in which this affects my work is that I don’t want to count on the formation of a singleton to solve the control problem itself.
You could try to work on AI in a way that helps facilitate the formation of a singleton. I don’t think that is really helpful, but moreover it again seems like a separate problem from AI control. (Also don’t think that e.g. MIRI is doing this with their current research, although they are open to solving AI control in a way that only works if there is a singleton.)
every second you take to consider the issue has large costs to you because your AI is falling behind the state of the art in both technology and scale, becoming uncompetitive, so your bargaining power for joining the merger is dropping
If your most powerful learners are strong enough to learn good-enough answers to these kinds of philosophical questions, then you only need to provide philosophical input during training and so synthesizing training data can take off time pressure. If your most powerful AI is not able to learn how to answer these philosophical questions, then the time pressure seems harder to avoid. In that case though, it seems quite hard to avoid the time pressure by any mechanism. (Especially if we are better at learning than we would be at hand-coding an algorithm for philosophical deliberation—if we are better at learning and our learner can’t handle philosophy, then we simply aren’t going to be able to build an AI that can handle philosophy.)
One is decisions involved in designing/improving AI systems, like in the scenario above. The other, which I talked about in an earlier comment, is ethical disasters directly caused by people who are not uncertain, but just wrong. You didn’t reply to that comment, so I’m not sure why you’re unconcerned about this category either.
I replied to your earlier comment.
My overall feeling is still that these are separate problems. We can evaluate a solution to AI control, and we can evaluate philosophical work that improves our understanding of potentially-relevant issues (or metaphilosophical work to automate philosophy).
I am both less pessimistic about philosophical errors doing damage, and more optimistic about my scheme’s ability to do philosophy, but it’s not clear to me that either of those is the real disagreement (since if I imagining caring a lot about philosophy and thinking this scheme didn’t help automate philosophy, I would still feel like we were facing two distinct problems).
If an employee sucks at philosophy, how does he even recognize philosophical problems as problems that he needs to consult you for? Most people have little idea that they should feel confused and uncertain about things like epistemology, decision theory, and ethics. I suppose it might be relatively easy to teach an AI to recognize the specific problems that we currently consider to be philosophical, but what about new problems that we don’t yet recognize as problems today?
Is this your reaction if you imagine delegating your affairs to an employee today? Are you making some claim about the projected increase in the importance of these philosophical decisions? Or do you think that a brilliant employees’ lack of metaphilosophical understanding would in fact cause great damage right now?
I would divide the decisions requiring philosophical understanding into two main categories. One is decisions involved in designing/improving AI systems, like in the scenario above...
I agree that AI may increase the stakes for philosophical decisions. One of my points is that a natural argument that it might increase the stakes—by forcing us to lock in an answer to philosophical questions—doesn’t seem to go through if you pursue this approach to AI control. There might be other arguments that building AI systems force us to lock in important philosophical views, but I am not familiar with those arguments.
I agree there may be other ways in which AI systems increase the stakes for philosophical decisions.
I like the bargaining example. I hadn’t thought about bargaining as competitive advantage before, and instead had just been thinking about the possible upside (so that the cost of philosophical error was bounded by the damage of using a weaker bargaining scheme). I still don’t feel like this is a big cost, but it’s something I want to think about somewhat more.
If you think there are other examples like this that might help move my view. On my current model, these are just facts that increase my estimates for the importance of philosophical work, I don’t really see it as relevant to AI control per se. (See the sibling, which is the better place to discuss that.)
one wrong move would cause me to lose control
I don’t see cases where a philosophical error causes you to lose control, unless you would have some reason to cede control based on philosophical arguments (e.g. in the bargaining case). Failing that, it seems like there is a philosophically simple, apparently adequate notion of “remaining in control” and I would expect to remain in control at least in that sense.
In that case, there would be severe principle-agent problems, given the disparity between power/intelligence of the trainer/AI systems and the users. If I was someone who couldn’t directly control an AI using your scheme, I’d be very concerned about getting uneven trades or having my property expropriated outright by individual AIs or AI conspiracies, or just ignored and left behind in the race to capture the cosmic commons. I would be really tempted to try another AI design that does purport to have the AI serve my interests directly, even if that scheme is not as “safe”.
Are these worse than the principal-agent problems that exist in any industrialized society? Most humans lack effective control over many important technologies, both in terms of economic productivity and especially military might. (They can’t understand the design of a car they use, they can’t understand the programs they use, they don’t understand what is actually going on with their investments...) It seems like the situation is quite analogous.
Moreover, even if we could build AI in a different way, it doesn’t seem to do anything to address the problem, since it is equally opaque to an end user who isn’t involved in the AI development process. In any case, they are in some sense at the mercy of the AI developer. I guess this is probably the key point—I don’t understand the qualitative difference between being at the mercy of the software developer on the one hand, and being at the mercy of the software developer + the engineers who help the software run day-to-day on the other. There is a slightly different set of issues for monitoring/law enforcement/compliance/etc., but it doesn’t seem like a huge change.
(Probably the rest of this comment is irrelevant.)
To talk more concretely about mechanisms in a simple example, you might imagine a handful of companies who provide AI software. The people who use this software are essentially at the mercy of the software providers (since for all they know the software they are using will subvert their interests in arbitrary ways, whether or not there is a human involved in the process). In the most extreme case an AI provider could effectively steal all of their users’ wealth. They would presumably then face legal consequences, which are not qualitatively changed by the development of AI if the AI control problem is solved. If anything we expect the legal system and government to better serve human interests.
We could talk about monitoring/enforcement/etc., but again I don’t see these issues as interestingly different from the current set of issues, or as interestingly dependent on the nature of our AI control techniques. The most interesting change is probably the irrelevance of human labor, which I think is a very interesting issue economically/politically/legally/etc.
I agree with the general point that as technology improves a singleton becomes more likely. I’m agnostic on whether the control mechanisms I describe would be used by a singleton or by a bunch of actors, and as far as I can tell the character of the control problem is essentially the same in either case.
I do think that a singleton is likely eventually. From the perspective of human observers, a singleton will probably be established relatively shortly after wages fall below subsistence (at the latest). This prediction is mostly based on my expectation that political change will accelerate alongside technological change.
I agree with the general point that as technology improves a singleton becomes more likely. I’m agnostic on whether the control mechanisms I describe would be used by a singleton or by a bunch of actors, and as far as I can tell the character of the control problem is essentially the same in either case.
I wonder—are you also relatively indifferent between a hard and slow takeoff, given sufficient time before the takeoff to develop ai control theory?
(One of the reasons a hard takeoff seems scarier to me is that it is more likely to lead to a singleton, with a higher probability of locking in bad values.)
As far as I can tell, Paul’s current proposal might still suffer from blackmail, like his earlier proposal which I commented on. I vaguely remember discussing the problem with you as well.
One big lesson for me is that AI research seems to be more incremental and predictable than we thought, and garage FOOM probably isn’t the main danger. It might be helpful to study the strengths and weaknesses of modern neural networks and get a feel for their generalization performance. Then we could try to predict which areas will see big gains from neural networks in the next few years, and which parts of Friendliness become easy or hard as a result. Is anyone at MIRI working on that?
Then we could try to predict which areas will see big gains from neural networks in the next few years, and which parts of Friendliness become easy or hard as a result. Is anyone at MIRI working on that?
If they did that, then what? Try to convince NN researchers to attack the parts of Friendliness that look hard? That seems difficult for MIRI to do given where they’ve invested in building their reputation (i.e., among decision theorists and mathematicians instead of in the ML community). (It would really depend on people trusting their experience and judgment since it’s hard to see how much one could offer in the form of either mathematical proof or clearly relevant empirical evidence.) You’d have a better chance if the work was carried out by some other organization. But even if that organization got NN researchers to take its results seriously, what incentives do they have to attack parts of Friendliness that seem especially hard, instead of doing what they’ve been doing, i.e., racing as fast as they can for the next milestone in capability?
Or is the idea to bet on the off chance that building an FAI with NN turns out to be easy enough that MIRI and like-minded researchers can solve the associated Friendliness problems themselves and then hand the solutions to whoever ends up leading the AGI race, and they can just plug the solutions in at little cost to their winning the race?
Or you’re suggesting aiming/hoping for some feasible combination of both, I guess. It seems pretty similar to what Paul Christiano is doing, except he has “generic AI technology” in place of “NN” above. To me, the chance of success of this approach seems low enough that it’s not obviously superior to what MIRI is doing (namely, in my view, betting on the off chance that the contrarian AI approach they’re taking ends up being much easier/better than the mainstream approach, which is looking increasingly unlikely but still not impossible).
One big lesson for me is that AI research seems to be more incremental and predictable than we thought, and garage FOOM probably isn’t the main danger.
That may be true but that is hindsight bias. MIRIs (or EYs for that matter) approach to hedge against that being true was nonetheless a very (and maybe given the knowledge at the time only) reasonable approach.
As far as I can tell, Paul’s current proposal might still suffer from blackmail, like his earlier proposal which I commented on
What problem do you have in mind here?
I thought that the previous problem was mostly psychological, i.e. that if humans were rational agents then this AI would be roughly as vulnerable to blackmail as its designers. So I thought the issue was the psychological strangeness (and great length) of the weird hypothetical.
Here we have no such hypothetical, and the system’s behavior only depends on the predicted behavior of humans in the real world. That seems to address the narrow version of your concern.
I can see two analogous problems:
The learning system may believe itself to be in a simulation, as discussed here. But that seems to cash out to a straightforward technical problem: we want to ensure that as long as there is some important scenario where the AI behaves strangely, we provide feedback on one such scenario. And this technical problem both (1) looks like it should be soluble in principle, (2) looks pretty tractable right now, and (3) is needed to resolve a whole host of other problems (a bunch of exotic failures, but also a bunch of more prosaic difficulties).
The behavior of the system depends on what a human would do across many cases. We can sample from these cases at random, but maybe a small fraction of them are sufficiently surprising that they would lead a human to believe they are in a simulation. We would like to sample these situations with disproportionate probability as well so that the human can won’t panic when they find themselves in one, but I haven’t talked in any detail about how to do that and it’s not obvious whether it is possible. (I do think it’s possible.)
I mostly had in mind 2. Not sure how predicting humans is different from putting humans in hypotheticals. It seems like the same problems could happen.
I agree that the same problem appears for ALBA. I was originally working with proposals where the improbability of the human’s situation was bounded, but the recursive structure can lead to arbitrarily large improbability. I hadn’t thought about this explicitly.
Predicting humans is different from putting humans in hypotheticals, in the sense that in principle you can actually sample from the situations that cause humans to think they are in a simulation or whatever.
For example, suppose the human had access to a button that said “This is weird, I’m probably in a simulation,” and suppose that we expected the human to press it in any case where they would start behaving weirdly. Then we could potentially sample from the subset of situations where the human presses the button. And if we manage to do that, then the human isn’t right to suspect they are in a simulation (any more than they already should believe they are in a simulation, prior to even building the AI).
I agree that you shouldn’t expect the current version of the scheme to cope with this problem. And it’s certainly not obvious that the approach sketched above can work, there are many important questions. But I do think that there is a clear path to fixing this kind of problem.
(Interestingly, this is exactly analogous to what the active learning scheme has to do, in order to get the prediction algorithms to predict well.)
ETA: although I hadn’t thought about this explicitly in the context of ALBA, I have expected to need some way to overweight “weird” situations in order to stop them from being problematic, ever since here.
A different analogy would be that MIRI is looking at the map and the compass to figure out what’s the right way to go, while others are just running in any random direction.
Not quite. The others are not running around in random directions, they are all running in a particular direction and MIRI is saying “Hold on, guys, there may be bears and tigers and pits of hell at your destination”. Which is all fine, but it still is not running.
Sure. I wasn’t objecting to the “MIRI isn’t competing in the AGI race” point, but to the negative connotations that one might read into your original analogy.
They’re working on figuring out what we want the AGI to do
Aka friendliness research. But why does that matter? If the machine has no real effectors and lots of human oversight, then why should there even be concern over friendliness? It wouldn’t matter in that context. Tell a machine to do something, and it finds an evil-stupid way of doing it, and human intervention prevents any harm.
Why is it a going concern at all whether we can assure ahead of time that the actions recommended by a machine are human-friendly unless the machine is enabled to independently take those actions without human intervention? Just don’t do that and it stops being a concern.
Humanity is having trouble coordinating and enforcing even global restrictions in greenhouse gasses. Try ensuring that nobody does anything risky or short-sighted with a technology that has no clearly-cut threshold between a “safe” and “dangerous” level of capability, and which can be beneficial for performing in pretty much any competitive and financially lucrative domain.
Restricting the AI’s capabilities may work for a short while, assuming that only a small group of pioneers manages to develop the initial AIs and they’re responsible with their use of the technology—but as Bruce Schneier says, today’s top-secret programs become tomorrow’s PhD theses and the next day’s common applications. If we want to survive in the long term, we need to figure out how to make the free-acting AIs safe, too—otherwise it’s just a ticking time bomb before the first guys accidentally or intentionally release theirs.
Humanity has done more than zero and less that optimality about things like climate change. Importantly, the situation isbelow the immanent existential threat level.
If you are going to complain that alternative proposals face coordination problems, you need to show that yours dont, or you are committing the fallacy of the dangling comparision. If people aren’t going to refrain from building dangerously powerful superintellugences, assuming is possible, why would they have the sense to fit MIRIs safety features, assuming they are possible? If the law can make people fit safety features, why cant it prevent them building dangerous AIs ITFP?
no clearly-cut threshold between a “safe” and “dangerous” level of capability
I would suggest a combination of generality and agency.
And what problem domain requires both?
If you allow for autonomously acting AIs, then you could have Friendly autonomous AIs tracking down and stopping Unfriendly / unauthorized AIs.
This of course depends on people developing the Friendly AIs first, but ideally it’d be enough for only the first people to get the design right, rather than depending on everyone being responsible.
Importantly, the situation isbelow the immanent existential threat level.
It’s unclear whether AI risk will become obviously imminent, either. Goertzel & Pitt 2012 argue in section 3 of their paper that this is unlikely.
I would suggest a combination of generality and agency. And what problem domain requires both?
Business (which by nature covers just about every domain in which you can make a profit, which is to say just about every domain relevant for human lives), warfare, military intelligence, governance… (see also my response to Mark)
If you allow for autonomously acting AIs, then you could have Friendly autonomous AIs tracking down and stopping Unfriendly / unauthorized AIs.
You could, but if you don’t have autonomously acting agents, you don’t need Gort AIs.
Building an agentive superintelligence that is powerful enough to take down any othe, as as MIRI conceives it, is a very risky proposition, since you need to get the value system exactly right. So its better not to be in a place where you have to do that,
This of course depends on people developing the Friendly AIs first, but ideally it’d be enough for only the first people to get the design right, rather than depending on everyone being responsible.
The first people have to be able as well as willing to get everything right, Safety through restraint is easier and more reliable. -- you can omit a feature more reliably than you can add one.
Business (which by nature covers just about every domain in which you can make a profit, which is to say just about every domain relevant for human lives), warfare, military intelligence, governance…
These organizations have a need for widespread intelligence gathering , and for agentive
AI, but that doesn’t mean they need both in the same package. The military don’t need their entire intelligence database in every drone, and don’t want drones that change their mind about who the bad guys are in mid flight. Businesses don’t want HFT applications that decide capitalism is a bad thing.
We want agents to act on our behalf, which means we want agents that are predictable and controllable to the required extent. Early HFT had problems which led to the addition of limits and controls. Control and predictability are close to safety. There is no drive to power that is also a drive away from safety, because uncontrolled power is of no use.
Based on the behaviour of organisations, there seems to be natural division between high-level, unpredictable decision information systems and lower level, faster acting genitive systems. In other words, they voluntarily do some of what would be required for an incremental safety programme.
I agree that it would be better not to have autonomously acting AIs, but not having any autonomously acting AIs would require a way to prevent anyone deploying them, and so far I haven’t seen a proposal for that that’d seem even remotely feasible.
And if we can’t stop them from being deployed, then deploying Friendly AIs first looks like the scenario that’s more likely to work—which still isn’t to say very likely, but at least it seems to have a chance of working even in principle. I don’t see that an even-in-principle way for “just don’t deploying autonomous AIs” to work.
When you say autonomous AIs, do you mean AIs that are autonomous and superinteligent?
AIs that are initially autonomous and non-superintelligent, then gradually develop towards superintelligence. (With the important caveat that it’s unclear whether an AI needed to be generally superintelligent in order to pose a major risk for society. It’s conceivable that superintelligence in some more narrow domain, like cybersecurity, would be enough—particularly in a sufficiently networked society.)
Do you think they could he deployed by basement hackers, or only by large organisations?
Hard to say. The way AI has developed so far, it looks like the capability might be restricted to large organizations with lots of hardware resources at first, but time will likely drive down the hardware requirements.
Do you think an organisation like the military or business has a motivation to deploy them?
Yes.
Do you agree that there are dangers to an FAI project that goes wrong?
Yes.
Do you have a plan B to cope with a FAI that goes rogue?
Such a plan would seem to require lots of additional information about both the specifics of the FAI plan, and also the state of the world at that time, so not really.
Do you think that having a AI potentially running the world is an attractive idea to a lot of people?
Depends on how we’re defining “lots”, but I think that the notion of a benevolent dictator has often been popular in many circles, who’ve also acknowledged its largest problems to be that 1) power tends to corrupt 2) even if you got a benevolent dictator, you also needed a way to ensure that all of their successors were benevolent. Both problems could be overcome with an AI, so on that basis at least I would expect lots of people to find it attractive. I’d also expect it to be considered more attractive in e.g. China, where people seem to be more skeptical towards democracy than they are in the West.
Additionally, if the AI wouldn’t be the equivalent of a benevolent dictator, but rather had a more hands-off role that kept humans in power and only acted to e.g. prevent disease, violent crime, and accidents, then that could be attractive to a lot of people who preferred democracy.
When you say autonomous AIs, do you mean AIs that are autonomous and superinteligent?
AIs that are initially autonomous and non-superintelligent, then gradually develop towards superintelligence
If you believe in the conjunction of claims that people are motivated to create autonomous, not just agentive, AIs, and that pretty well any AI can evolve into dangerous superintelligence, then the situation is dire, because you cannot guarantee to get in first with an AI policeman as a solution to AI threat.
The situation is better, but only slightly better with legal restraint as a solution to AI threat, because you can lower the probability of disaster by banning autonomous AI...but you can only lower it, not eliminate it, because no ban is 100% effective.
And how serious are you about the threat level? Compare with micro biological research. It could be the case that someone will accidentally create an organism that spells doom for the human race, it cannot be ruled out, but no one is panicing now because there is no specific reason to rule it in, no specific pathway to it. It is a remote possibility, not a serious one.
Someone who sincerely believed that rapid self improvement towards autonomous AI could happen at any time, because there are no specific precondition or precursors for it, is someone who effectively believes it could happen now. But someone who genuinely believes an AI apocalypse could happen now is someone who would e revealing their belief in their behaviour by heading for the hills, or smashing every computer they see.
(With the important caveat that it’s unclear whether an AI needed to be generally superintelligent in order to pose a major risk for society.
Narrow superintelligences may well be less dangerous than general superintelligences, and if you are able to restrict the generality of an AI, that could be a path to incremental safety.
But if the path to some kind of spontaneous superintelligence in an autonomous AI is also a path to spontaneous generality, that is hopeless. -- if the one can happen for no particular reason, so can the other. But is the situation really bad, or are these scenarios remote possibilities, like genetically engineered super plagues?
Do you think they could he deployed by basement hackers, or only by large organisations?
Hard to say. The way AI has developed so far, it looks like the capability might be restricted to large organizations with lots of hardware resources at first, but time will likely drive down the hardware requirements.
But by the time the hardware requirements have been driven down for entry level AI, the large organizations will already have more powerful systems, and they will dominate for better or worse. If benevolent, they will supress dangerous AIs coming out of basements, if dangerous they will suppress
rivals. The only problematic scenario is where the hackers get in first, since they are less likely to partition agency from intelligence, as I have argued a large organisation would.
But the one thing we know for sure about AI is that it is hard.The scenario where a small team hits on the One Weird Trick to
achieve ASI is the most worrying, but also the least
likely.
Do you think an organisation like the military or business has a motivation to deploy [autonomous AI]?
Yes.
Which would be what?
Do you agree that there are dangers to an FAI project that goes wrong?
Yes.
Do you have a plan B to cope with a FAI that goes rogue?
Such a plan would seem to require lots of additional information about both the specifics of the FAI plan, and also the state of the world at that time, so not really.
But building an FAI capable of policing other AIs is potentially dangerous, since it would need to be both a general intelligence and super intelligence.
Do you think that having a AI potentially running the world is an attractive idea to a lot of people?
Depends on how we’re defining “lots”,
For the purposes of the current argument, a democratic majority.
but I think that the notion of a benevolent dictator has often been popular in many circles, who’ve also acknowledged its largest problems to be that 1) power tends to corrupt 2) even if you got a benevolent dictator, you also needed a way to ensure that all of their successors were benevolent. Both problems could be overcome with an AI,
There are actually three problems with benevolent dictators. As well. as power corrupting, and successorship, there is the problem of ensuring or detecting benevolence in the first place.
You have conceded that Gort AI is potentially dangerous. The danger is that it is fragile in a specific way: a near miss to a benevolent value system is a dangerous one,
so on that basis at least I would expect lots of people to find it attractive. I’d also expect it to be considered more attractive in e.g. China, where people seem to be more skeptical towards democracy than they are in the West.
Additionally, if the AI wouldn’t be the equivalent of a benevolent dictator, but rather had a more hands-off role that kept humans in power and only acted to e.g. prevent disease, violent crime, and accidents, then that could be attractive to a lot of people who preferred democracy
That also depends on both getting it right, and convincing people you have got it right
If you believe in the conjunction of claims that people are motivated to create autonomous, not just agentive, AIs, and that pretty well any AI can evolve into dangerous superintelligence, then the situation is dire, because you cannot guarantee to get in first with an AI policeman as a solution to AI threat.
The situation is better, but only slightly better with legal restraint as a solution to AI threat,
Indeed.
And how serious are you about the threat level? Compare with micro biological research. It could be the case that someone will accidentally create an organism that spells doom for the human race, it cannot be ruled out, but no one is panicing now because there is no specific reason to rule it in, no specific pathway to it. It is a remote possibility, not a serious one.
Someone who sincerely believed that rapid self improvement towards autonomous AI could happen at any time, because there are no specific precondition or precursors for it, is someone who effectively believes it could happen now. But someone who genuinely believes an AI apocalypse could happen now is someone who would e revealing their belief in their behaviour by heading for the hills, or smashing every computer they see.
I don’t think that rapid self-improvement towards a powerful AI could happen at any time. It’ll require AGI, and we’re still a long way from that.
Narrow superintelligences may well be less dangerous than general superintelligences, and if you are able to restrict the generality of an AI, that could be a path to incremental safety.
It could, yes.
But by the time the hardware requirements have been driven down for entry level AI, the large organizations will already have more powerful systems, and they will dominate for better or worse.
Assuming they can keep their AGI systems in control.
Do you think an organisation like the military or business has a motivation to deploy [autonomous AI]?
Yes.
Which would be what?
See my response here and also section 2 in this post.
But building an FAI capable of policing other AIs is potentially dangerous, since it would need to be both a general intelligence and super intelligence. [...] You have conceded that Gort AI is potentially dangerous. The danger is that it is fragile in a specific way: a near miss to a benevolent value system is a dangerous one,
I think you very much misunderstand my suggestion. I’m saying that there is no reason to presume AI will be given the keys to the kingdom from day one, not advocating for some sort of regulatory regime.
So what do you see as the mechanism that will prevent anyone from handing the AI those keys, given the tremendous economic pressure towards doing exactly that?
As with a boxed
AGI, there are many factors that would tempt the owners of
an Oracle AI to transform it to an autonomously acting agent.
Such an AGI would be far more effective in furthering its
goals, but also far more dangerous.
Current narrow-AI technology includes HFT algorithms,
which make trading decisions within fractions of a second, far
too fast to keep humans in the loop. HFT seeks to make a very
short-term profit, but even traders looking for a longer-term
investment benefit from being faster than their competitors.
Market prices are also very effective at incorporating various
sources of knowledge [135]. As a consequence, a trading
algorithmʼs performance might be improved both by making
it faster and by making it more capable of integrating various
sources of knowledge. Most advances toward general AGI
will likely be quickly taken advantage of in the financial
markets, with little opportunity for a human to vet all the
decisions. Oracle AIs are unlikely to remain as pure oracles
for long.
Similarly, Wallach [283] discuss the topic of autonomous
robotic weaponry and note that the US military is seeking to
eventually transition to a state where the human operators of
robot weapons are ‘on the loop’ rather than ‘in the loop’. In
other words, whereas a human was previously required to
explicitly give the order before a robot was allowed to initiate
possibly lethal activity, in the future humans are meant to
merely supervise the robotʼs actions and interfere if something
goes wrong.
Human Rights Watch [90] reports on a number of
military systems which are becoming increasingly autonomous,
with the human oversight for automatic weapons
defense systems—designed to detect and shoot down
incoming missiles and rockets—already being limited to
accepting or overriding the computerʼs plan of action in a
matter of seconds. Although these systems are better
described as automatic, carrying out pre-programmed
sequences of actions in a structured environment, than
autonomous, they are a good demonstration of a situation
where rapid decisions are needed and the extent of human
oversight is limited. A number of militaries are considering
the future use of more autonomous weapons.
In general, any broad domain involving high stakes,
adversarial decision making and a need to act rapidly is likely
to become increasingly dominated by autonomous systems.
The extent to which the systems will need general intelligence
will depend on the domain, but domains such as corporate
management, fraud detection and warfare could plausibly
make use of all the intelligence they can get. If oneʼs
opponents in the domain are also using increasingly
autonomous AI/AGI, there will be an arms race where one
might have little choice but to give increasing amounts of
control to AI/AGI systems.
Miller [189] also points out that if a person was close to
death, due to natural causes, being on the losing side of a war,
or any other reason, they might turn even a potentially
dangerous AGI system free. This would be a rational course
of action as long as they primarily valued their own survival
and thought that even a small chance of the AGI saving their
life was better than a near-certain death.
Some AGI designers might also choose to create less
constrained and more free-acting AGIs for aesthetic or moral
reasons, preferring advanced minds to have more freedom.
I thought my excerpt answered that, but maybe that was illusion of transparency speaking. In particular, this paragraph:
In general, any broad domain involving high stakes, adversarial decision making and a need to act rapidly is likely to become increasingly dominated by autonomous systems. The extent to which the systems will need general intelligence will depend on the domain, but domains such as corporate management, fraud detection and warfare could plausibly make use of all the intelligence they can get. If oneʼs opponents in the domain are also using increasingly autonomous AI/AGI, there will be an arms race where one might have little choice but to give increasing amounts of control to AI/AGI systems.
To rephrase: the main trend is history has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans do. This isn’t going to stop: I’ve already seen articles calling for both company middle managers, as well as government bureaucrats, to be replaced with AIs. If you have any kind of a business, you could potentially make it run better by putting a sufficiently sophisticated AI in charge—because it can think faster and smarter, deal with more information at once, and not have the issue of self-interest leading to office politics leading to many employees acting suboptimally from the company’s point of view, that you’d get if you had a thousand human employees rather than a single AI.
This trend has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better.
And if your competitors are having AIs run their company and you don’t, you’re likely to be outcompeted, so you’ll want to make sure your AIs are smarter and more capable of acting autonomously than the competitors. These pressures aren’t just going to vanish at the point when AIs start approaching human capability.
The same considerations also apply to other domains than business—like governance—but the business and military domains are the most likely to have intense arms race dynamics going on.
Yes, illusion of transparency at work here. That paragraph has always been so clearly wrong to me that I wrote it off as the usual academic prose fluff, and didn’t realize it was in fact the argument being made. Here is the issue I take with that:
You can find instances where industry is clamoring to use AI to reduce costs / improve productivity. For example, Uber and self-driving cars. However in these cases there are a combination of two factors at work: (1) the examples are necessarily specialized narrow AI, not general decision making; and/or (2) the costs of poor decision making are externalized. Let’s look at these points in more detail:
Anytime a human is being used as a meat robot, e.g. an Uber driver, a machine can do the job better and more efficiently with quantifiable tradeoffs due to the machine’s own quirks. However one must not forget that this is the case because the context has already been specialized! One can replace a minimum wage burger flipper with a machine because the job is part of a three-ring binder enterprise that has already been exhaustively thought out to such a degree that every component task can be taught to a minimum wage, distracted teenage worker. If the mechanical burger flipper fails, you go back to paying a $10/hr meat robot to do the trick. But what happens when the corporate strategy robot fails and the new product is a flop? You lose hundreds of millions of invested dollars. And worse, you don’t know until it is all over and played out. Not comparable at all.
Uber might want a fleet of self-driving cars. But that’s because the costs of being wrong are externalized. Get in an accident? It’s your driver’s problem, not Uber. Self-driving car get in an accident? It’s the owner of the car’s problem which, surprise, is not Uber. The applications of AGI have risks that are not so easily externalized, however.
I can see how one might think that unchecked AGI would improve the efficiency of corporate management, fraud detection, and warfare. However that’s confirmation bias. I assure you that the corporate strategists, fraud specialists, and generals get paid the big bucks to think about risk and the ways in which things can go wrong. I can give examples of what could go wrong when an alien AGI psychology tries to interact with irrational humans, but it’s much simpler to remember that even presumably superhuman AGIs have error rates, and these error rates will be higher than humans for a good duration of time while the technology is still developing. And what happens when an AGI makes a mistake?
A corporate strategist AGI makes a mistake, and the directors of the corporation who have a fiduciary responsibility to shareholders are held personally accountable. Indemnity insurance refuses to pay out as upper management purposefully took themselves out of the loop, an action that is considered irresponsible in hindsight.
A fraud specialist AGI makes a mistake, and its company turns a blind eye to hundreds of millions of dollars of fraud that a human would have seen. Business goes belly-up.
An war-making AGI makes a mistake, and you are now dead.
I hope that you’ll forgive me, but I must call on anecdotal evidence here. I am the co-founder of a startup that has raised >$75MM. I understand very well how investors, upper management, and corporate strategists manage risk. I also have observed how extremely terrified of additional risk they are. The supposition that they would be willing to put a high-risk proto-AGI in the driver’s seat is naïve to say the least. These are the people that are held accountable and suffer the largest losses when things go wrong, and they are terrified of that outcome.
What is likely to happen, on the other hand, is a hybridization of machine and human. AGI cognitive assistance will permeate these industries, but their job is to give recommendations, not steer things directly. And it’s not at all so clear to me that this approach, “Oracle AI” as it is called on LW, is so dangerous.
Thank you for the patient explanation! This is an interesting argument that I’ll have to think about some more, but I’ve already adjusted my view of how I expect things to go based on it.
Two questions:
First, isn’t algorithmic trading a counterexample to your argument? It’s true that it’s a narrow domain, but it’s also one where AI systems are trusted with enormous sums of money, and have the potential to make enormous losses. E.g. one company apparently lost $440 million in less than an hour due to a glitch in their software. Wikipedia on the consequences:
Knight Capital took a pre-tax loss of $440 million. This caused Knight Capital’s stock price to collapse, sending shares lower by over 70% from before the announcement. The nature of the Knight Capital’s unusual trading activity was described as a “technology breakdown”.[14][15]
On Sunday, August 5 the company managed to raise around $400 million from half a dozen investors led by Jefferies in an attempt to stay in business after the trading error. Jefferies’ CEO, Richard Handler and Executive Committee Chair Brian Friedman structured and led the rescue and Jefferies purchased $125 million of the $400 million investment and became Knight’s largest shareholder. [2]. The financing would be in the form of convertible securities, bonds that turn into equity in the company at a fixed price in the future.[16]
The incident was embarrassing for Knight CEO Thomas Joyce, who was an outspoken critic of Nasdaq’s handling of Facebook’s IPO.[17] On the same day the company’s stock plunged 33 percent, to $3.39; by the next day 75 percent of Knight’s equity value had been erased.[18]
Also, you give several examples of AGIs potentially making large mistakes with large consequences, but couldn’t e.g. a human strategist make a similarly big mistake as well?
You suggest that the corporate leadership could be held more responsible for a mistake by an AGI than if a human employer made the mistake, and I agree that this is definitely plausible. But I’m not sure whether it’s inevitable. If the AGI was initially treated the way a junior human employee would, i.e. initially kept subject to more supervision and given more limited responsibilities, and then had its responsibilities scaled up as people came to trust it more and it learned from its mistakes, would that necessarily be considered irresponsible by the shareholders and insurers? (There’s also the issue of privately held companies with no need to keep external shareholders satisfied.)
one where AI systems are trusted with enormous sums of money
Kinda. They are carefully watched and have separate risk management systems which impose constraints and limits on what they can do.
E.g. one company apparently lost $440 million in less than an hour due to a glitch in their software.
Yes, but that has nothing to do with AI: “To err is human, but to really screw up you need a computer”. Besides, there are equivalent human errors (fat fingers, add a few zeros to a trade inadvertently) with equivalent magnitude of losses.
have separate risk management systems which impose constraints and limits on what they can do.
If those risk management systems are themselves software, that doesn’t really change the overall picture.
Yes, but that has nothing to do with AI:
If we’re talking about “would companies place AI systems in a role where those systems could cost the company lots of money if they malfunctioned”, then examples of AI systems having been placed in roles where they cost the company a lot of money have everything to do with the discussion.
In the usual way. Contemporary trading systems are not black boxes full of elven magic. They are models, that is, a bunch of code and some data. If the model doesn’t do what you want it to do, you stick your hands in there and twiddle the doohickeys until it stops outputting twaddle.
Besides, in most trading systems the sophisticated part (“AI”) is an oracle. Typically it outputs predictions (e.g. of prices of financial assets) and its utility function is some loss function on the difference between the prediction and the actual. It has no concept of trades, or dollars, or position limits.
Translating these predictions into trades is usually quite straightforward.
I suspect that this dates back to a time when MIRI believed the answer to AI safety was to both build an agentive, maximal supeintelligence and align its values with ours, and put it in charge of all the other AIs.
The first idea has been effectively shelved, since MIRI had produced about zero lines of code,..but the idea that AI safety is value alignment continues with considerable momentum. And value alignment only makes sense if you are building an agentive AI (and have given up on corrigibility).
Briefly skimming Christiano’s post, this is actually one of the few/first proposals from someone MIRI related that actually seems to be on the right track (and similar to my own loose plans). Basically it just boils down to learning human utility functions with layers of meta-learning, with generalized RL and IRL.
Compared to its competition in the AGI race, MIRI was always going to be disadvantaged by both lack of resources and the need to choose an AI design that can predictably be made Friendly as opposed to optimizing mainly for capability. For this reason, I was against MIRI (or rather the Singularity Institute as it was known back then) going into AI research at all, as opposed to pursuing some other way of pushing for a positive Singularity.
In any case, what other approaches to Friendliness would you like MIRI to consider? The only other approach that I’m aware of that’s somewhat developed is Paul Christiano’s current approach (see for example https://medium.com/ai-control/alba-an-explicit-proposal-for-aligned-ai-17a55f60bbcf), which I understand is meant to be largely agnostic about the underlying AI technology. Personally I’m pretty skeptical but then I may be overly skeptical about everything. What are your thoughts? I don’t recall seeing you having commented on them much.
Are you aware of any other ideas that MIRI should be considering?
Do you have a concise explanation of skepticism about the overall approach, e.g. a statement of the difficulty or difficulties you think will be hardest to overcome by this route?
Or is your view more like “most things don’t work, and there isn’t much reason to think this would work”?
In discussion you most often push on the difficulty of doing reflection / philosophy. Would you say this is your main concern?
My take has been that we just need to meet the lower bar of “wants to defer to human views about philosophy, and has a rough understanding of how humans want to reflect and want to manage their uncertainty in the interim.”
Regarding philosophy/metaphilosophy, is it fair to describe your concern as one of:
The approach I am pursuing can’t realistically meet even my lower bar,
Meeting my lower bar won’t suffice for converging to correct philosophical views,
Our lack of philosophical understanding will cause problems soon in subjective time (we seem to have some disagreement here, but I don’t feel like adopting your view would change my outlook substantially), or
AI systems will be much better at helping humans solve technical than philosophical problems, driving a potentially long-lasting (in subjective time) wedge between our technical and philosophical capability, even if ultimately we would end up at the right place?
My hope is that thinking and talking more about bootstrapping procedures would go a long way to resolving the disagreements between us (either leaving you more optimistic or me more pessimistic). I think this is most plausible if #1 is the main disagreement. If our disagreement is somewhere else, it may be worth also spending some time focusing somewhere else. Or it may be necessary to better define my lower bar in order to tell where the disagreement is.
It seems to be a combination of all of these.
Training an AI to defer to one’s eventual philosophical judgments and interim method of managing uncertainty (and not falling prey to marketing worlds and incorrect but persuasive philosophical arguments etc) seems really hard, and made harder by the recursive structure in ALBA and the fact that the first level AI is sub-human in capacity which then has to handle being bootstrapped and training the next level AI. What percent of humans can accomplish this task, do you think? (I’d argue that the answer is likely zero, but certainly very small.) How do the rest use your AI?
Assuming that deferring to humans on philosophy and managing uncertainty is feasible but costly, how many people could resist dropping this feature and the associated cost, in favor of adopting some sort of straightforward utility maximization framework with a fixed utility function that they think captures most or all of their values, if that came as a suggestion from the AI with an apparently persuasive argument? If most people do this and only a few don’t (and those few are also disadvantaged in the competition to capture the cosmic commons due to deciding to carry these costs), that doesn’t seem like much of a win.
This is tied in with 1 and 2, in that correct meta-philosophical understanding is needed to accomplish 1, and unreasonable philosophical certainty would cause people to fail step 2.
Even if the AIs keep deferring to their human users and don’t end up short-circuit their philosophical judgements, if the AI/human systems become very powerful while still having incorrect and strongly held philosophical views, that seems likely to cause disaster. We also don’t have much reason to think that if we put people in such positions of power (for example, being able to act as a god in some simulation or domain of their choosing), that most will eventually realize their philosophical errors and converge to correct views, that the power itself wouldn’t further distort their already error-prone reasoning processes.
Re 1:
For a working scheme, I would expect it to be usable by a significant fraction of humans (say, comparable to the fraction that can learn to write a compiler).
That said, I would not expect almost anyone to actually play the role of the overseer, even if a scheme like this one ended up being used widely. An existing analogy would be the human trainers who drive facebook’s M (at least in theory, I don’t know how that actually plays out). The trainers are responsible for getting M to do what the trainers want, and the user trusts the trainers to do what the user wants. From the user’s perspective, this is no different from delegating to the trainers directly, and allowing them to use whatever tools they like.
I don’t yet see why “defer to human judgments and handle uncertainty in a way that they would endorse” requires evaluating complex philosophical arguments or having a correct understanding of metaphilosophy. If the case is unclear, you can punt it to the actual humans.
If I imagine an employee who sucks at philosophy but thinks 100x faster than me, I don’t feel like they are going to fail to understand how to defer to me on philosophical questions. I might run into trouble because now it is comparatively much harder to answer philosophical questions, so to save costs I will often have to do things based on rough guesses about my philosophical views. But the damage from using such guesses depends on the importance of having answers to philosophical questions in the short-term.
It really feels to me like there are two distinct issues:
Philosophical understanding may help us make good decisions in the short term, for example about how to trade off extinction risk vs faster development, or how to prioritize the suffering of non-human animals. So having better philosophical understanding (and machines that can help us build more understanding) is good.
Handing off control of civilization to AI systems might permanently distort society’s values. Understanding how to avoid this problem is good.
These seem like separate issues to me. I am convinced that #2 is very important, since it seems like the largest existential risk by a fair margin and also relatively tractable. I think that #1 does add some value, but am not at all convinced that it is a maximally important problem to work on. As I see it, the value of #1 depends on the importance of the ethical questions we face in the short term (and on how long-lasting are the effects of differential technological progress that accelerates our philosophical ability).
Moreover, it seems like we should evaluate solutions to these two problems separately. You seem to be making an implicit argument that they are linked, such that a solution to #2 should only be considered satisfactory if it also substantially addresses #1. But from my perspective, that seems like a relatively minor consideration when evaluating the goodness of a solution to #2. In my view, solving both problems at once would be at most 2x as good as solving the more important of the two problems. (Neither of them is necessarily a crisp problem rather than an axis along which to measure differential technological development.)
I can see several ways in which #1 and #2 are linked, but none of them seem very compelling to me. Do you have something in particular in mind? Does my position seem somehow more fundamentally mistaken to you?
(This comment was in response to point 1, but it feels like the same underlying disagreement is central to points 2 and 3. Point 4 seems like a different concern, about how the availability of AI would itself change philosophical deliberation. I don’t really see much reason to think that the availability of powerful AI would make the endpoint of deliberation worse rather than better, but probably this is a separate discussion.)
In that case, there would be severe principle-agent problems, given the disparity between power/intelligence of the trainer/AI systems and the users. If I was someone who couldn’t directly control an AI using your scheme, I’d be very concerned about getting uneven trades or having my property expropriated outright by individual AIs or AI conspiracies, or just ignored and left behind in the race to capture the cosmic commons. I would be really tempted to try another AI design that does purport to have the AI serve my interests directly, even if that scheme is not as “safe”.
If an employee sucks at philosophy, how does he even recognize philosophical problems as problems that he needs to consult you for? Most people have little idea that they should feel confused and uncertain about things like epistemology, decision theory, and ethics. I suppose it might be relatively easy to teach an AI to recognize the specific problems that we currently consider to be philosophical, but what about new problems that we don’t yet recognize as problems today?
Aside from that, a bigger concern for me is that if I was supervising your AI, I would be constantly bombarded with philosophical questions that I’d have to answer under time pressure, and afraid that one wrong move would cause me to lose control, or lock in some wrong idea.
Consider this scenario. Your AI prompts you for guidance because it has received a message from a trading partner with a proposal to merge your AI systems and share resources for greater efficiency and economy of scale. The proposal contains a new AI design and control scheme and arguments that the new design is safer, more efficient, and divides control of the joint AI fairly between the human owners according to your current bargaining power. The message also claims that every second you take to consider the issue has large costs to you because your AI is falling behind the state of the art in both technology and scale, becoming uncompetitive, so your bargaining power for joining the merger is dropping (slowly in the AI’s time-frame, but quickly in yours). Your AI says it can’t find any obvious flaws in the proposal, but it’s not sure that you’d consider the proposal to really be fair under reflective equilibrium or that the new design would preserve your real values in the long run. There are several arguments in the proposal that it doesn’t know how to evaluate, hence the request for guidance. But it also reminds you not to read those arguments directly since they were written by a superintelligent AI and you risk getting mind-hacked if you do.
What do you do? This story ignores the recursive structure in ALBA. I think that would only make the problem even harder, but I could be wrong. If you don’t think it would go like this, let me know how you think this kind of scenario would go.
In terms of your #1, I would divide the decisions requiring philosophical understanding into two main categories. One is decisions involved in designing/improving AI systems, like in the scenario above. The other, which I talked about in an earlier comment, is ethical disasters directly caused by people who are not uncertain, but just wrong. You didn’t reply to that comment, so I’m not sure why you’re unconcerned about this category either.
A general note: I’m not really taking a stand on the importance of a singleton, and I’m open to the possibility that the only way to achieve a good outcome even in the medium-term is to have very good coordination.
A would-be singleton will also need to solve the AI control problem, and I am just as happy to help with that problem as with the version of the AI control problem faced by a whole economy of actors each using their own AI systems.
The main way in which this affects my work is that I don’t want to count on the formation of a singleton to solve the control problem itself.
You could try to work on AI in a way that helps facilitate the formation of a singleton. I don’t think that is really helpful, but moreover it again seems like a separate problem from AI control. (Also don’t think that e.g. MIRI is doing this with their current research, although they are open to solving AI control in a way that only works if there is a singleton.)
In general I think that counterfactual oversight has problems in really low-latency environments. I think the most natural way to avoid them is synthesizing training data in advance. It’s not clear whether that proposal will work.
If your most powerful learners are strong enough to learn good-enough answers to these kinds of philosophical questions, then you only need to provide philosophical input during training and so synthesizing training data can take off time pressure. If your most powerful AI is not able to learn how to answer these philosophical questions, then the time pressure seems harder to avoid. In that case though, it seems quite hard to avoid the time pressure by any mechanism. (Especially if we are better at learning than we would be at hand-coding an algorithm for philosophical deliberation—if we are better at learning and our learner can’t handle philosophy, then we simply aren’t going to be able to build an AI that can handle philosophy.)
I replied to your earlier comment.
My overall feeling is still that these are separate problems. We can evaluate a solution to AI control, and we can evaluate philosophical work that improves our understanding of potentially-relevant issues (or metaphilosophical work to automate philosophy).
I am both less pessimistic about philosophical errors doing damage, and more optimistic about my scheme’s ability to do philosophy, but it’s not clear to me that either of those is the real disagreement (since if I imagining caring a lot about philosophy and thinking this scheme didn’t help automate philosophy, I would still feel like we were facing two distinct problems).
Is this your reaction if you imagine delegating your affairs to an employee today? Are you making some claim about the projected increase in the importance of these philosophical decisions? Or do you think that a brilliant employees’ lack of metaphilosophical understanding would in fact cause great damage right now?
I agree that AI may increase the stakes for philosophical decisions. One of my points is that a natural argument that it might increase the stakes—by forcing us to lock in an answer to philosophical questions—doesn’t seem to go through if you pursue this approach to AI control. There might be other arguments that building AI systems force us to lock in important philosophical views, but I am not familiar with those arguments.
I agree there may be other ways in which AI systems increase the stakes for philosophical decisions.
I like the bargaining example. I hadn’t thought about bargaining as competitive advantage before, and instead had just been thinking about the possible upside (so that the cost of philosophical error was bounded by the damage of using a weaker bargaining scheme). I still don’t feel like this is a big cost, but it’s something I want to think about somewhat more.
If you think there are other examples like this that might help move my view. On my current model, these are just facts that increase my estimates for the importance of philosophical work, I don’t really see it as relevant to AI control per se. (See the sibling, which is the better place to discuss that.)
I don’t see cases where a philosophical error causes you to lose control, unless you would have some reason to cede control based on philosophical arguments (e.g. in the bargaining case). Failing that, it seems like there is a philosophically simple, apparently adequate notion of “remaining in control” and I would expect to remain in control at least in that sense.
Are these worse than the principal-agent problems that exist in any industrialized society? Most humans lack effective control over many important technologies, both in terms of economic productivity and especially military might. (They can’t understand the design of a car they use, they can’t understand the programs they use, they don’t understand what is actually going on with their investments...) It seems like the situation is quite analogous.
Moreover, even if we could build AI in a different way, it doesn’t seem to do anything to address the problem, since it is equally opaque to an end user who isn’t involved in the AI development process. In any case, they are in some sense at the mercy of the AI developer. I guess this is probably the key point—I don’t understand the qualitative difference between being at the mercy of the software developer on the one hand, and being at the mercy of the software developer + the engineers who help the software run day-to-day on the other. There is a slightly different set of issues for monitoring/law enforcement/compliance/etc., but it doesn’t seem like a huge change.
(Probably the rest of this comment is irrelevant.)
To talk more concretely about mechanisms in a simple example, you might imagine a handful of companies who provide AI software. The people who use this software are essentially at the mercy of the software providers (since for all they know the software they are using will subvert their interests in arbitrary ways, whether or not there is a human involved in the process). In the most extreme case an AI provider could effectively steal all of their users’ wealth. They would presumably then face legal consequences, which are not qualitatively changed by the development of AI if the AI control problem is solved. If anything we expect the legal system and government to better serve human interests.
We could talk about monitoring/enforcement/etc., but again I don’t see these issues as interestingly different from the current set of issues, or as interestingly dependent on the nature of our AI control techniques. The most interesting change is probably the irrelevance of human labor, which I think is a very interesting issue economically/politically/legally/etc.
I agree with the general point that as technology improves a singleton becomes more likely. I’m agnostic on whether the control mechanisms I describe would be used by a singleton or by a bunch of actors, and as far as I can tell the character of the control problem is essentially the same in either case.
I do think that a singleton is likely eventually. From the perspective of human observers, a singleton will probably be established relatively shortly after wages fall below subsistence (at the latest). This prediction is mostly based on my expectation that political change will accelerate alongside technological change.
I wonder—are you also relatively indifferent between a hard and slow takeoff, given sufficient time before the takeoff to develop ai control theory?
(One of the reasons a hard takeoff seems scarier to me is that it is more likely to lead to a singleton, with a higher probability of locking in bad values.)
As far as I can tell, Paul’s current proposal might still suffer from blackmail, like his earlier proposal which I commented on. I vaguely remember discussing the problem with you as well.
One big lesson for me is that AI research seems to be more incremental and predictable than we thought, and garage FOOM probably isn’t the main danger. It might be helpful to study the strengths and weaknesses of modern neural networks and get a feel for their generalization performance. Then we could try to predict which areas will see big gains from neural networks in the next few years, and which parts of Friendliness become easy or hard as a result. Is anyone at MIRI working on that?
If they did that, then what? Try to convince NN researchers to attack the parts of Friendliness that look hard? That seems difficult for MIRI to do given where they’ve invested in building their reputation (i.e., among decision theorists and mathematicians instead of in the ML community). (It would really depend on people trusting their experience and judgment since it’s hard to see how much one could offer in the form of either mathematical proof or clearly relevant empirical evidence.) You’d have a better chance if the work was carried out by some other organization. But even if that organization got NN researchers to take its results seriously, what incentives do they have to attack parts of Friendliness that seem especially hard, instead of doing what they’ve been doing, i.e., racing as fast as they can for the next milestone in capability?
Or is the idea to bet on the off chance that building an FAI with NN turns out to be easy enough that MIRI and like-minded researchers can solve the associated Friendliness problems themselves and then hand the solutions to whoever ends up leading the AGI race, and they can just plug the solutions in at little cost to their winning the race?
Or you’re suggesting aiming/hoping for some feasible combination of both, I guess. It seems pretty similar to what Paul Christiano is doing, except he has “generic AI technology” in place of “NN” above. To me, the chance of success of this approach seems low enough that it’s not obviously superior to what MIRI is doing (namely, in my view, betting on the off chance that the contrarian AI approach they’re taking ends up being much easier/better than the mainstream approach, which is looking increasingly unlikely but still not impossible).
That may be true but that is hindsight bias. MIRIs (or EYs for that matter) approach to hedge against that being true was nonetheless a very (and maybe given the knowledge at the time only) reasonable approach.
What problem do you have in mind here?
I thought that the previous problem was mostly psychological, i.e. that if humans were rational agents then this AI would be roughly as vulnerable to blackmail as its designers. So I thought the issue was the psychological strangeness (and great length) of the weird hypothetical.
Here we have no such hypothetical, and the system’s behavior only depends on the predicted behavior of humans in the real world. That seems to address the narrow version of your concern.
I can see two analogous problems:
The learning system may believe itself to be in a simulation, as discussed here. But that seems to cash out to a straightforward technical problem: we want to ensure that as long as there is some important scenario where the AI behaves strangely, we provide feedback on one such scenario. And this technical problem both (1) looks like it should be soluble in principle, (2) looks pretty tractable right now, and (3) is needed to resolve a whole host of other problems (a bunch of exotic failures, but also a bunch of more prosaic difficulties).
The behavior of the system depends on what a human would do across many cases. We can sample from these cases at random, but maybe a small fraction of them are sufficiently surprising that they would lead a human to believe they are in a simulation. We would like to sample these situations with disproportionate probability as well so that the human can won’t panic when they find themselves in one, but I haven’t talked in any detail about how to do that and it’s not obvious whether it is possible. (I do think it’s possible.)
Did you have in mind 1, 2, or something else?
I mostly had in mind 2. Not sure how predicting humans is different from putting humans in hypotheticals. It seems like the same problems could happen.
I agree that the same problem appears for ALBA. I was originally working with proposals where the improbability of the human’s situation was bounded, but the recursive structure can lead to arbitrarily large improbability. I hadn’t thought about this explicitly.
Predicting humans is different from putting humans in hypotheticals, in the sense that in principle you can actually sample from the situations that cause humans to think they are in a simulation or whatever.
For example, suppose the human had access to a button that said “This is weird, I’m probably in a simulation,” and suppose that we expected the human to press it in any case where they would start behaving weirdly. Then we could potentially sample from the subset of situations where the human presses the button. And if we manage to do that, then the human isn’t right to suspect they are in a simulation (any more than they already should believe they are in a simulation, prior to even building the AI).
I agree that you shouldn’t expect the current version of the scheme to cope with this problem. And it’s certainly not obvious that the approach sketched above can work, there are many important questions. But I do think that there is a clear path to fixing this kind of problem.
(Interestingly, this is exactly analogous to what the active learning scheme has to do, in order to get the prediction algorithms to predict well.)
ETA: although I hadn’t thought about this explicitly in the context of ALBA, I have expected to need some way to overweight “weird” situations in order to stop them from being problematic, ever since here.
Is MIRI even in the AGI race? It certainly doesn’t look like it.
They’re working on figuring out what we want the AGI to do, not building one. (I believe Nate has stated this in previous LW comments.)
Yes, and the point is that MIRI is pondering the situation at the finish line, but is not running in the race.
A different analogy would be that MIRI is looking at the map and the compass to figure out what’s the right way to go, while others are just running in any random direction.
Not quite. The others are not running around in random directions, they are all running in a particular direction and MIRI is saying “Hold on, guys, there may be bears and tigers and pits of hell at your destination”. Which is all fine, but it still is not running.
Still better than running into all the bears and tigers and getting eaten, particularly if it lets you figure out the correct route eventually.
The question was not what is better, the question was whether MIRI is competing in the AGI race.
Sure. I wasn’t objecting to the “MIRI isn’t competing in the AGI race” point, but to the negative connotations that one might read into your original analogy.
Which unfortunately presumes that an AGI would be tasked with doing something and given free reign to do so, a truly naïve and unlikely outcome.
How does it presume that?
Aka friendliness research. But why does that matter? If the machine has no real effectors and lots of human oversight, then why should there even be concern over friendliness? It wouldn’t matter in that context. Tell a machine to do something, and it finds an evil-stupid way of doing it, and human intervention prevents any harm.
Why is it a going concern at all whether we can assure ahead of time that the actions recommended by a machine are human-friendly unless the machine is enabled to independently take those actions without human intervention? Just don’t do that and it stops being a concern.
Humanity is having trouble coordinating and enforcing even global restrictions in greenhouse gasses. Try ensuring that nobody does anything risky or short-sighted with a technology that has no clearly-cut threshold between a “safe” and “dangerous” level of capability, and which can be beneficial for performing in pretty much any competitive and financially lucrative domain.
Restricting the AI’s capabilities may work for a short while, assuming that only a small group of pioneers manages to develop the initial AIs and they’re responsible with their use of the technology—but as Bruce Schneier says, today’s top-secret programs become tomorrow’s PhD theses and the next day’s common applications. If we want to survive in the long term, we need to figure out how to make the free-acting AIs safe, too—otherwise it’s just a ticking time bomb before the first guys accidentally or intentionally release theirs.
Humanity has done more than zero and less that optimality about things like climate change. Importantly, the situation isbelow the immanent existential threat level.
If you are going to complain that alternative proposals face coordination problems, you need to show that yours dont, or you are committing the fallacy of the dangling comparision. If people aren’t going to refrain from building dangerously powerful superintellugences, assuming is possible, why would they have the sense to fit MIRIs safety features, assuming they are possible? If the law can make people fit safety features, why cant it prevent them building dangerous AIs ITFP?
I would suggest a combination of generality and agency. And what problem domain requires both?
If you allow for autonomously acting AIs, then you could have Friendly autonomous AIs tracking down and stopping Unfriendly / unauthorized AIs.
This of course depends on people developing the Friendly AIs first, but ideally it’d be enough for only the first people to get the design right, rather than depending on everyone being responsible.
It’s unclear whether AI risk will become obviously imminent, either. Goertzel & Pitt 2012 argue in section 3 of their paper that this is unlikely.
Business (which by nature covers just about every domain in which you can make a profit, which is to say just about every domain relevant for human lives), warfare, military intelligence, governance… (see also my response to Mark)
Somehow that reminds me of Sentinels from X-Men: Days of Future Past.
You could, but if you don’t have autonomously acting agents, you don’t need Gort AIs. Building an agentive superintelligence that is powerful enough to take down any othe, as as MIRI conceives it, is a very risky proposition, since you need to get the value system exactly right. So its better not to be in a place where you have to do that,
The first people have to be able as well as willing to get everything right, Safety through restraint is easier and more reliable. -- you can omit a feature more reliably than you can add one.
These organizations have a need for widespread intelligence gathering , and for agentive AI, but that doesn’t mean they need both in the same package. The military don’t need their entire intelligence database in every drone, and don’t want drones that change their mind about who the bad guys are in mid flight. Businesses don’t want HFT applications that decide capitalism is a bad thing.
We want agents to act on our behalf, which means we want agents that are predictable and controllable to the required extent. Early HFT had problems which led to the addition of limits and controls. Control and predictability are close to safety. There is no drive to power that is also a drive away from safety, because uncontrolled power is of no use.
Based on the behaviour of organisations, there seems to be natural division between high-level, unpredictable decision information systems and lower level, faster acting genitive systems. In other words, they voluntarily do some of what would be required for an incremental safety programme.
I agree that it would be better not to have autonomously acting AIs, but not having any autonomously acting AIs would require a way to prevent anyone deploying them, and so far I haven’t seen a proposal for that that’d seem even remotely feasible.
And if we can’t stop them from being deployed, then deploying Friendly AIs first looks like the scenario that’s more likely to work—which still isn’t to say very likely, but at least it seems to have a chance of working even in principle. I don’t see that an even-in-principle way for “just don’t deploying autonomous AIs” to work.
When you say autonomous AIs, do you mean AIs that are autonomous and superinteligent?
Do you think they could he deployed by basement hackers, or only by large organisations?
Do you think an organisation like the military or business has a motivation to deploy them?
Do you agree that there are dangers to an FAI project that goes wrong?
Do you have a plan B to cope with a FAI that goes rogue?
Do you think that having a AI potentially running the world is an attractive idea to a lot of people?
AIs that are initially autonomous and non-superintelligent, then gradually develop towards superintelligence. (With the important caveat that it’s unclear whether an AI needed to be generally superintelligent in order to pose a major risk for society. It’s conceivable that superintelligence in some more narrow domain, like cybersecurity, would be enough—particularly in a sufficiently networked society.)
Hard to say. The way AI has developed so far, it looks like the capability might be restricted to large organizations with lots of hardware resources at first, but time will likely drive down the hardware requirements.
Yes.
Yes.
Such a plan would seem to require lots of additional information about both the specifics of the FAI plan, and also the state of the world at that time, so not really.
Depends on how we’re defining “lots”, but I think that the notion of a benevolent dictator has often been popular in many circles, who’ve also acknowledged its largest problems to be that 1) power tends to corrupt 2) even if you got a benevolent dictator, you also needed a way to ensure that all of their successors were benevolent. Both problems could be overcome with an AI, so on that basis at least I would expect lots of people to find it attractive. I’d also expect it to be considered more attractive in e.g. China, where people seem to be more skeptical towards democracy than they are in the West.
Additionally, if the AI wouldn’t be the equivalent of a benevolent dictator, but rather had a more hands-off role that kept humans in power and only acted to e.g. prevent disease, violent crime, and accidents, then that could be attractive to a lot of people who preferred democracy.
If you believe in the conjunction of claims that people are motivated to create autonomous, not just agentive, AIs, and that pretty well any AI can evolve into dangerous superintelligence, then the situation is dire, because you cannot guarantee to get in first with an AI policeman as a solution to AI threat.
The situation is better, but only slightly better with legal restraint as a solution to AI threat, because you can lower the probability of disaster by banning autonomous AI...but you can only lower it, not eliminate it, because no ban is 100% effective.
And how serious are you about the threat level? Compare with micro biological research. It could be the case that someone will accidentally create an organism that spells doom for the human race, it cannot be ruled out, but no one is panicing now because there is no specific reason to rule it in, no specific pathway to it. It is a remote possibility, not a serious one.
Someone who sincerely believed that rapid self improvement towards autonomous AI could happen at any time, because there are no specific precondition or precursors for it, is someone who effectively believes it could happen now. But someone who genuinely believes an AI apocalypse could happen now is someone who would e revealing their belief in their behaviour by heading for the hills, or smashing every computer they see.
Narrow superintelligences may well be less dangerous than general superintelligences, and if you are able to restrict the generality of an AI, that could be a path to incremental safety.
But if the path to some kind of spontaneous superintelligence in an autonomous AI is also a path to spontaneous generality, that is hopeless. -- if the one can happen for no particular reason, so can the other. But is the situation really bad, or are these scenarios remote possibilities, like genetically engineered super plagues?
But by the time the hardware requirements have been driven down for entry level AI, the large organizations will already have more powerful systems, and they will dominate for better or worse. If benevolent, they will supress dangerous AIs coming out of basements, if dangerous they will suppress rivals. The only problematic scenario is where the hackers get in first, since they are less likely to partition agency from intelligence, as I have argued a large organisation would.
But the one thing we know for sure about AI is that it is hard.The scenario where a small team hits on the One Weird Trick to achieve ASI is the most worrying, but also the least likely.
Which would be what?
But building an FAI capable of policing other AIs is potentially dangerous, since it would need to be both a general intelligence and super intelligence.
For the purposes of the current argument, a democratic majority.
There are actually three problems with benevolent dictators. As well. as power corrupting, and successorship, there is the problem of ensuring or detecting benevolence in the first place.
You have conceded that Gort AI is potentially dangerous. The danger is that it is fragile in a specific way: a near miss to a benevolent value system is a dangerous one,
That also depends on both getting it right, and convincing people you have got it right
Indeed.
I don’t think that rapid self-improvement towards a powerful AI could happen at any time. It’ll require AGI, and we’re still a long way from that.
It could, yes.
Assuming they can keep their AGI systems in control.
See my response here and also section 2 in this post.
Very much so.
I think you very much misunderstand my suggestion. I’m saying that there is no reason to presume AI will be given the keys to the kingdom from day one, not advocating for some sort of regulatory regime.
So what do you see as the mechanism that will prevent anyone from handing the AI those keys, given the tremendous economic pressure towards doing exactly that?
As we discussed in Responses to AGI Risk:
What “tremendous economic pressure”? The argument doesn’t hold weight absent that unsubstantiated justification.
I thought my excerpt answered that, but maybe that was illusion of transparency speaking. In particular, this paragraph:
To rephrase: the main trend is history has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans do. This isn’t going to stop: I’ve already seen articles calling for both company middle managers, as well as government bureaucrats, to be replaced with AIs. If you have any kind of a business, you could potentially make it run better by putting a sufficiently sophisticated AI in charge—because it can think faster and smarter, deal with more information at once, and not have the issue of self-interest leading to office politics leading to many employees acting suboptimally from the company’s point of view, that you’d get if you had a thousand human employees rather than a single AI.
This trend has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better.
And if your competitors are having AIs run their company and you don’t, you’re likely to be outcompeted, so you’ll want to make sure your AIs are smarter and more capable of acting autonomously than the competitors. These pressures aren’t just going to vanish at the point when AIs start approaching human capability.
The same considerations also apply to other domains than business—like governance—but the business and military domains are the most likely to have intense arms race dynamics going on.
Yes, illusion of transparency at work here. That paragraph has always been so clearly wrong to me that I wrote it off as the usual academic prose fluff, and didn’t realize it was in fact the argument being made. Here is the issue I take with that:
You can find instances where industry is clamoring to use AI to reduce costs / improve productivity. For example, Uber and self-driving cars. However in these cases there are a combination of two factors at work: (1) the examples are necessarily specialized narrow AI, not general decision making; and/or (2) the costs of poor decision making are externalized. Let’s look at these points in more detail:
Anytime a human is being used as a meat robot, e.g. an Uber driver, a machine can do the job better and more efficiently with quantifiable tradeoffs due to the machine’s own quirks. However one must not forget that this is the case because the context has already been specialized! One can replace a minimum wage burger flipper with a machine because the job is part of a three-ring binder enterprise that has already been exhaustively thought out to such a degree that every component task can be taught to a minimum wage, distracted teenage worker. If the mechanical burger flipper fails, you go back to paying a $10/hr meat robot to do the trick. But what happens when the corporate strategy robot fails and the new product is a flop? You lose hundreds of millions of invested dollars. And worse, you don’t know until it is all over and played out. Not comparable at all.
Uber might want a fleet of self-driving cars. But that’s because the costs of being wrong are externalized. Get in an accident? It’s your driver’s problem, not Uber. Self-driving car get in an accident? It’s the owner of the car’s problem which, surprise, is not Uber. The applications of AGI have risks that are not so easily externalized, however.
I can see how one might think that unchecked AGI would improve the efficiency of corporate management, fraud detection, and warfare. However that’s confirmation bias. I assure you that the corporate strategists, fraud specialists, and generals get paid the big bucks to think about risk and the ways in which things can go wrong. I can give examples of what could go wrong when an alien AGI psychology tries to interact with irrational humans, but it’s much simpler to remember that even presumably superhuman AGIs have error rates, and these error rates will be higher than humans for a good duration of time while the technology is still developing. And what happens when an AGI makes a mistake?
A corporate strategist AGI makes a mistake, and the directors of the corporation who have a fiduciary responsibility to shareholders are held personally accountable. Indemnity insurance refuses to pay out as upper management purposefully took themselves out of the loop, an action that is considered irresponsible in hindsight.
A fraud specialist AGI makes a mistake, and its company turns a blind eye to hundreds of millions of dollars of fraud that a human would have seen. Business goes belly-up.
An war-making AGI makes a mistake, and you are now dead.
I hope that you’ll forgive me, but I must call on anecdotal evidence here. I am the co-founder of a startup that has raised >$75MM. I understand very well how investors, upper management, and corporate strategists manage risk. I also have observed how extremely terrified of additional risk they are. The supposition that they would be willing to put a high-risk proto-AGI in the driver’s seat is naïve to say the least. These are the people that are held accountable and suffer the largest losses when things go wrong, and they are terrified of that outcome.
What is likely to happen, on the other hand, is a hybridization of machine and human. AGI cognitive assistance will permeate these industries, but their job is to give recommendations, not steer things directly. And it’s not at all so clear to me that this approach, “Oracle AI” as it is called on LW, is so dangerous.
Thank you for the patient explanation! This is an interesting argument that I’ll have to think about some more, but I’ve already adjusted my view of how I expect things to go based on it.
Two questions:
First, isn’t algorithmic trading a counterexample to your argument? It’s true that it’s a narrow domain, but it’s also one where AI systems are trusted with enormous sums of money, and have the potential to make enormous losses. E.g. one company apparently lost $440 million in less than an hour due to a glitch in their software. Wikipedia on the consequences:
Also, you give several examples of AGIs potentially making large mistakes with large consequences, but couldn’t e.g. a human strategist make a similarly big mistake as well?
You suggest that the corporate leadership could be held more responsible for a mistake by an AGI than if a human employer made the mistake, and I agree that this is definitely plausible. But I’m not sure whether it’s inevitable. If the AGI was initially treated the way a junior human employee would, i.e. initially kept subject to more supervision and given more limited responsibilities, and then had its responsibilities scaled up as people came to trust it more and it learned from its mistakes, would that necessarily be considered irresponsible by the shareholders and insurers? (There’s also the issue of privately held companies with no need to keep external shareholders satisfied.)
Kinda. They are carefully watched and have separate risk management systems which impose constraints and limits on what they can do.
Yes, but that has nothing to do with AI: “To err is human, but to really screw up you need a computer”. Besides, there are equivalent human errors (fat fingers, add a few zeros to a trade inadvertently) with equivalent magnitude of losses.
If those risk management systems are themselves software, that doesn’t really change the overall picture.
If we’re talking about “would companies place AI systems in a role where those systems could cost the company lots of money if they malfunctioned”, then examples of AI systems having been placed in roles where they cost the company a lot of money have everything to do with the discussion.
It does because the issue is complexity and opaqueness. A simple gatekeeper filter along the lines of
is not an “AI system”.
In which case the AI splits the transaction into 2 transactions, each just below a gazillion.
I’m talking about contemporary-level-of-technology trading systems, not about future malicious AIs.
So? An opaque neural net would quickly learn how to get around trade size restrictions if given the proper motivations.
At which point the humans running this NN will notice that it likes to go around risk control measures and will… persuade it that it’s a bad idea.
It’s not like no one is looking at the trades it’s doing.
How? By instituting more complex control measures? Then you’re back to the problem Kaj mentioned above.
In the usual way. Contemporary trading systems are not black boxes full of elven magic. They are models, that is, a bunch of code and some data. If the model doesn’t do what you want it to do, you stick your hands in there and twiddle the doohickeys until it stops outputting twaddle.
Besides, in most trading systems the sophisticated part (“AI”) is an oracle. Typically it outputs predictions (e.g. of prices of financial assets) and its utility function is some loss function on the difference between the prediction and the actual. It has no concept of trades, or dollars, or position limits.
Translating these predictions into trades is usually quite straightforward.
I suspect that this dates back to a time when MIRI believed the answer to AI safety was to both build an agentive, maximal supeintelligence and align its values with ours, and put it in charge of all the other AIs.
The first idea has been effectively shelved, since MIRI had produced about zero lines of code,..but the idea that AI safety is value alignment continues with considerable momentum. And value alignment only makes sense if you are building an agentive AI (and have given up on corrigibility).
Briefly skimming Christiano’s post, this is actually one of the few/first proposals from someone MIRI related that actually seems to be on the right track (and similar to my own loose plans). Basically it just boils down to learning human utility functions with layers of meta-learning, with generalized RL and IRL.