My problems with Formal Friendly Artificial Intelligence work
I’m writing this to get information about the lesswrong community and whether it worth engaging. I’m a bit out of the loop in terms of what the LW community is like and whether it can maintain multiple view points (and how known the criticisms are).
The TL;DR is I have problems with treating computation in an overly formal fashion. The more pragmatic philosophy suggested implies (but doesn’t prove) that AI will not be as powerful as expected as the physicality of computation is important and instantiating computing in a physical fashion is expensive.
I think all the things I will talk about are interesting, but I don’t see the sufficiency of them when considering AI running in the real world in real computers.
1. Source code based decision theory
I don’t understand why:
other agents trust that your source code is what you say it is
other agents trust that your implementation of your interpreter matches their understanding of the interpreter. I don’t see how they get round trustless trust (inserting code/behaviour via malicious compiler/interpreter) issues when they don’t have the ability to do diverse compilation.
2. General Functionalism
The idea that it doesn’t matter how you compute something just whether the inputs and outputs are the same.
The battery life of my phone says that the way of computation is very important, is it done on the cloud and I have to power up my antennae to transmit the result.
Timing attacks say that speed of the computation is important, that faster is not always better.
Rowhammer says that how you layout your memory is important. Can I flip a bit of your utility calculation?
Memory usage, overheating, van Eck phreaking etc etc....
- 24 Apr 2017 17:38 UTC; 2 points) 's comment on Neuralink and the Brain’s Magical Future by (
It is the “learn to walk before you start learning to run and jump” approach.
Imagine that you are the first human who ever tried to invent math. You have invented the “2 + 2 = 4” theorem, and it seems to work! Two stones together with other two stones always make four stones. And two bananas together with other two bananas always make four bananas. Seems like you have found a powerful abstraction behind stones and bananas—who knows what else might be obeying the same rules. Excited you run towards your tribe members to share your discovery...
...but they dismiss it as worthless. “No two stones are same. A sharp stone is a more useful tool than a blunt stone, and your theory ignores that. A banana can be eaten. Actually, bananas are only useful for eating, and you completely ignore this essential part. And have you ever tried to ‘add’ two and two raindrops? Obviously, this ‘mathematics’ of yours has no future, and is just a waste of time” they say.
Well, the answer is that you need to apply the “2 + 2 = 4” only in the situations where it is appropriate, and not where it’s not. And telling the difference may be tricky. And it may be easy to underestimate some problem. But still, the equation is useful in some contexts, which makes it useful to know it.
Does refusing to know “2 + 2 = 4” because of these practical concerns make you stronger or weaker?
It’s an assumption to simplify the problem. First we learn to solve the “simple” problem, then we move to the more complex ones. If we can’t solve the simpler problem yet, would jumping to the more complex one make us progress faster?
Would a recommendation to skip solving the simplified problems, because they don’t include all the messy details, more likely lead to prople solving the hard problems faster, or giving up?
How specifically are you planning to use this insight in constructing a Friendly superintelligent AI?
2+2=4 pays rent.
Ugh can use it right away for counting days. Source code based decision theory not so much. There aren’t the societies based on agents that can read each others source code, so I can’t try and predict them with source code based decision theories. It seems like it is mathematically interesting thing though, so it is still interesting. I just don’t want it to be a core part of our sole pathway to try and solve the AI problem.
Perhaps it would lead people to avoid trying to find optimal decision theories and accept the answer that the best decision theory depends on the circumstances. Then we can figure out what our circumstances are and find good decisions theories for those. And create designs that can do similarly.
Like the best search algorithm is context dependent, where even algorithms of a worst complexity class can be better due to memory locality and small size.
Supposition> If answers to how decisions are made (and a whole host of other problems) are contextual and complex then it is worth trading information about what answers they have found within their context.
Figure the control flows between parts of the human brain that means they manage to roughly align themselves into a coherent entity (looking at dopamine etc).
Apply the same informational constraints to computers so that we can link a computer up to a human (it doesn’t need to be physically as long as the control flows work). The computer should be aligned with the user as much as a part of the brain is aligned to another part (which is hopefully sufficient).
While this is in an embryonic stage get as many people from all around the world to use it .
Hard take off (I think this less likely due to the context sensitivity things I have described, but still possible)- it is worthwhile to trade between agents so groups of human/computers that co-operate and trade information advance quicker. than loan rogue agents. If we have done the alignment work right then it is likely the weight of them will be able to squash any human inimical systems that appear.
This pathway makes no sense from people that expect there to be winner take all optimal AI designs as any one embryonic system might find the keys to the future and take it over. But if that is not the way the world works....
There are mathematical concepts that didn’t pay rent immediately: imaginary numbers, quaternions, non-Euclidean geometry, Turing machines...
But thanks for the specific plan, it sounds like it could work.
If you have any interest in my work into a hypothesis for 1 let me know.
Human beings can probabilistically read each others’ source code. That’s why we use primitive versions of noncausal decision theory like getting angry, wanting to take revenge, etc.
This seems like a weird way of say, humans can make/refine hypotheses about other agents. What does talking about source code give you?
Tit for tat (which seems like revenge) works in normal game theories for IPD (of infinite length) which is a closer to what we experience in everyday life. I thought Non-causal decision theories are needed for winning on one-shots?
In the case of humans, “talking about source code” is perhaps not that useful, though we do have source code, it’s written in quaternary and has a rather complex probabilistic compiler. And that source code was optimized by a purely causal process, demonstrating the fact that causal decision theory agents self modify into acausal decision theory agents in many circumstances.
Revenge and anger work for one shot problems, for example if some stranger comes and sexually assaults your wife, they cannot escape your wrath by “saying oh it’s only one shot, I’m a stranger in a huge city you’ll never see me again so there’s no point taking revenge”. You want to punch the in the face as an end in itself now, this is a simple way of our brains being a bit acausal, decision theory wise.
I thought anger and revenge (used in one shot situations) might be generalising from what to do in the iterated version which is what we had for more of our evolutionary history.
I kinda like a-causal decision theory for choosing to vote at all. I will choose to vote so that other people like me choose to vote.
Doing theoretical research that ignores practicalities is sometimes turns out to be valuable in practice. It can open a door to something you assumed to be impossible; or save a lot of wasted effort on a plan that turns out to have an impossible sub-problem.
A concrete example of first category might be something like quantum error correcting codes. Prior to that theoretical work, a lot of people thought that quantum computers were not worth pursuing because noise and decoherence would be an insurmountable problem. Quantum fault tolerance theorems did nothing to help solve the very tough practical problems of building a quantum computer, but it did show people that it might be worth pursuing—and here we are 20 years later closing in on practical quantum computers.
I think source code based decision theory might have something of this flavour. It doesn’t address all those practical issues such as how one machine comes to trust that another machine’s source code is what it says. That might indeed scupper the whole thing. But it does clarify where the theoretical boundaries of the problem are.
You might have thought “well, two machines could co-operate if they had identical source code, but that’s too restrictive to be practical”. But it turns out that you don’t need identical source code if you have the source code and can prove things about it. Then you might have though “ok, but those proofs will never work because of non-termination and self-reference” … and it turns out that that is wrong too.
Theoretical work like this could inform you about what you could hope to achieve if you could solve the practical issues; and conversely what problems are going to come up that you are absolutely going to have to solve.
But it might not be a useful line of thinking? Is it possible for lesswrong/AI Risk community to talk about the two universes at the same time? To be concrete I mean the universe where very formal models are useful for AI development and the other where the only practical solutions are adhoc and contextual.
Just as AIXI seems to have been discarded for not being naturalistic enough (Hoorah) you might want to entertain the idea that software based decision theories aren’t naturalistic enough.
This feels like a weird objection to me and I’m having some difficulty expressing why. Yes, it’s better to use lines of thinking that are more likely to produce results, but it feels like a fully general counterargument because all lines of thinking might not produce results. If we want to move past the results of the past, we have to actually explore, and that necessarily involves some dead ends. And Kyre’s argument is why I would say that research direction isn’t an obvious dead end.
MIRI currently has two research agendas, which seem to me like trying to do both at the same time. (I don’t think it lines up quite with the division you’re discussing; for example, my suspicion is that improved naturalism mostly comes out of the agent foundations agenda.)
I don’t think I want to make it fully general. I want to contrast between two schools of thinking, both of which are hard to get out of with only the tools of that school.
At what point do you put down the formal proof tools? You might prove that a statement is unprovable, but you may not do it may just hang around like P vs NP forever. Or you might prove a statement is unprovable within a system and then just switch to a different system (adding oracles and what not). What is your stopping condition?
Same with Empiricism. You will never show that empircism isn’t a useful method of gaining knowledge for a specific goal using just experiments, just that a single line of experimentation is unfruitful.
You can prove that certain lines of empiricism are likely to be unfruitful (things like trying to create infinite free energy) based on other axioms that hold up empirically. Similarly if an empirical approach is being productive you can use that to constrain the formal proofs you attempt.
This is something I would like to have said a long time ago. I’m not sure how relevant it is to current MIRI research. How much of it is actually writing programs?
The criticisms of “general functionalism” in the post seem to me to be aimed at a different sort of functionalism from the sort widely espoused around here.
The LW community is (I think) mostly functionalist in the sense of believing e.g. that if you’re conscious then something that does the same as you do is also conscious. They’d say that implementation details don’t matter for answering various philosophical questions. Is this thing me? Is it a person? Do I need to care about its interests? Is it intelligent? Etc. But that’s a long way from saying that implementation details don’t matter at all and, e.g., I think it’s “LW orthodoxy” that they do; that, e.g., something that thinks just like me but 1000x faster would be hugely more capable than me in all sorts of important ways.
(The advantages of the humans over the aliens in Eliezer’s “That Alien Message” have a lot to do with speed, though that wasn’t quite Eliezer’s point and he makes the humans smarter in other ways and more numerous too.)
If formal AI-safety work neglects speed, power consumption, side-channel attacks, etc., I think it’s only for the sake of beginning with simpler more tractable versions of the problems you care about, not because anyone seriously believes that those things are unimportant.
(And, just to be explicit, I believe those things are important, and I think it’s unlikely that any approach to AI safety that ignores them can rightly be said to deliver safety. But an approach that begins by ignoring them might be reasonable.)
I think decision theory is a big functionalist part of LW thinking about AI or it at least it used to be.
We don’t have scenarios where utility depends upon the amount of time taken to compute results. E.g. Impatient Jim only cooperates if you cooperate within 5 ms. Which precludes vasts searches through proof space about Jims source code.
I find Impatient Jim closer to the problems we face in the real world than Omniscient Omegas. YMMV.
The orthodoxy is not consistently applied :)
What?
Sorry that is after I checked out of keeping up with LW. Have any formal problems like the smoking lesion or sleeping beauty been created from the insight that speed matters?
Should they be? It looks like people here would be receptive if you have an idea for a problem that doesn’t just tell us what we already know. But it also looks to me like the winners of the tournament both approximated in a practical way the search through many proofs approach (LW writeup and discussion here.)
Actually strictly speaking that is game theory not decision theory. Probably worth pointing out. I forgot the distinction for a while myself.
I don’t understand UDT, but TDT can look at the evidence and decide what the other AI actually does. It can even have a probability distribution over possible source codes and use that to estimate expected value. This gives the other AI strong incentive to look for ways to prove its honesty.
I worry that the last paragraph of this post is too optimistic.If “formal proof is insufficient”, that might mean that proceeding formally can’t produce superintelligent AI, in which case indeed we don’t need to worry so much about AI risks—but it might instead mean that proceeding formally produces flaky superintelligent AI. That is, AI that’s just about as smart as we’d have hoped or feared, except that it’s extra-vulnerable to malicious hackers, or it has weird cognitive biases a bit like ours but orders of magnitude more subtle, or it basically operates like a “normal” superintelligence except that every now and then a cosmic ray flips a bit and it decides to destroy the sun.
That would not be reassuring.
Yeah.
I think the orthodox MIRI position is not that logical proofs are necessary, or even the most efficient way, to make a super-intelligence. It’s that humans need formal proofs to be sure that the AI will be well-behaved. A random kludgy program might be much smarter than your carefully proven one, but that’s cold comfort if it then proceeds to kill you.
Computer that was expensive a decade ago is a lot less expensive today.
In the Age of Em Hanson lays out how even brain emulations that aren’t smarter than the smartest humans could out-compete humanity very fast.
If formal proof doesn’t work that only indicates that everything is more dangerous because you can’t use it to create safety.
I didn’t mean to imply that advanced computing (whether it is AI, IA or Ems) was not potentially dangerous. See my reply to Viliam for my current strategy.
Nothing is more dangerous, it is as dangerous as it was :) It is more dangerous if formal proof doesn’t work and everyone who cares about AI safety is working on formal proofs.
You can’t prove everything you want to know about physics with formal proofs. That doesn’t mean that it isn’t valuable that physicist prove theorems about abstract physical laws.
It’s also not the case that everyone working on FAI tries the same approach.
I think the physicist and the AI researcher are in different positions.
One can chop a small bit off the world and focus on that, at a certain scale or under certain conditions.
The other has to create something that can navigate the entire world and potentially get to know it in better or very different ways than we do. It is unbounded in what it can know and how it might be best shaped.
It is this asymmetry that I think makes their jobs very different.
Thanks. I had almost written off LW and by extension MIRI completely inimical to proof based AI research. I’m a bit out of the loop have you got any recommendations of people working along the lines I am thinking?
If you look at the CFAR theory of action of first focusing on getting reasoning right to think more clearly about AI risk, that’s not a strategy based on mathematical proofs.