What happens if an AI manages to game the system despite the n layers of abstraction?
Apteris
Your argument would be stronger if you provided a citation. I’ve only skimmed CEV, for instance, so I’m not fully familiar with Eliezer strongest arguments in favour of goal structure tending to be preserved (though I know he did argue for that) in the course of intelligence growth. For that matter, I’m not sure what your arguments for goal stability under intelligence improvement are. Nevertheless, consider the following:
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
Yudkowsky, E. (2004). Coherent Extrapolated Volition. Singularity Institute for Artificial Intelligence
(Bold mine.) See that bolded part above? Those are TODOs. They would be good to have, but they’re not guaranteed. The goals of a more intelligent AI might diverge from those of its previous self; it may extrapolate differently; it may interpret differently; its desires may, at higher levels of intelligence, interfere with ours rather than cohere.
If I want X, and I’m considering an improvement to my systems that would make me not want X, then I’m not going to get X if I take that improvement, so I’m going to look for some other improvement to my systems to try instead.
A more intelligent AI might:
find a new way to fulfill its goals, e.g. Eliezer’s example of distancing your grandmother from the fire by detonating a nuke under her;
discover a new thing it could do, compatible with its goal structure, that it did not see before, and that, if you’re unlucky, takes priority over the other things it could be doing, e.g. you tell it “save the seals” and it starts exterminating orcas; see also Lumifer’s post.
just decide to do things on its own. This is merely a suspicion I have, call it a mind projection, but: I think it will be challenging to design an intelligent agent with no “mind of its own”, metaphorically speaking. We might succeed in that, we might not.
We might be approaching a point of diminishing returns as far as improving cultural transmission is concerned. Sure, it would be useful to adopt a better language, e.g. one less ambiguous, less subject to misinterpretation, more revealing of hidden premises and assumptions. More bandwidth and better information retrieval would also help. But I don’t think these constraints are what’s holding AI back.
Bandwidth, storage, and retrieval can be looked at as hardware issues, and performance in these areas improves both with time and with adding more hardware. What AI requires are improvements in algorithms and in theoretical frameworks such as decision theory, morality, and systems design.
I think it will prove computationally very expensive, both to solve protein folding and to subsequently design a bootstrapping automaton. It might be difficult enough for another method of assembly to come out ahead cost-wise.
You’re right, that is more realistic. Even so, I get the feeling that the human would have less and less to do as time goes on. I quote:
“He just loaded up on value stocks,” says Mr. Fleiss, referring to the AI program. The fund gained 41% in 2009, more than doubling the Dow’s 19% gain.
As another data point, a recent chess contest between a chess grandmaster (Daniel Naroditsky) working together with an older AI (Rybka, rated ~3050) and the current best chess AI (Stockfish 5, rated 3290) ended with a 3.5 − 0.5 win for Stockfish.
While not exactly investment, consider the case of an AI competing with a human to devise a progressively better high-frequency trading strategy. An AI would probably:
be able to bear more things in mind at one time than the human
evaluate outcomes faster than the human
be able to iterate on its strategies faster than the human
I expect the AI’s superior capacity to “drink from the fire hose” together with its faster response time to yield a higher exponent for the growth function than that resulting from the human’s iterative improvement.
The effectiveness of learning hyper-heuristics for other problems, i.e. how much better algorithmically-produced algorithms perform than human-produced algorithms, and more pertinently, where the performance differential (if any) is heading.
As an example, Effective learning hyper-heuristics for the course timetabling problem says: “The dynamic scheme statistically outperforms the static counterpart, and produces competitive results when compared to the state-of-the-art, even producing a new best-known solution. Importantly, our study illustrates that algorithms with increased autonomy and generality can outperform human designed problem-specific algorithms.”
Similar results can be found for other problems, bin packing, traveling salesman, and vehicle routing being just some off-the-top-of-my-head examples.
Only problem is cooking. Eats up like 4 hours a week.
This article by Roger Ebert on cooking is, I suspect, highly relevant to your interests. Mine too, as a matter of fact.
For example, consider a system that takes seriously the idea of souls. One might very well decide that all that matters is whether an entity has a soul, completely separate from its apparent intelligence level. Similarly, a sufficiently racist individual might assign no moral weight to people of some specific racial group, regardless of their intelligence.
Right you are. I did not express myself well above. Let me try and restate, just for the record.
Assuming one does not assign equal rights to all autonomous agents (for instance, if we take the position that a human has more rights than a bacterium), then discriminating based on cognitive capacity (of the species, not the individual) (as one of many possible criteria) is not ipso facto wrong. It may be wrong some of the time, and it may be an approach employed by bigots, but it is not always wrong. This is my present opinion, you understand, not established fact.
there’s the additional problem that I pointed out that it wouldn’t even necessarily be in humanity’s best interest for the entity to have such an ethical system.
Agreed. But this whole business of “we don’t want the superintelligence to burn us with its magnifying glass, so we in turn won’t burn ants with our magnifying glass” strikes me as rather intractable. Even though, of course, it’s essential work.
I would say a few more words, but I think it’s best to stop here. This subthread has cost me 66% of my Karma. :)
I think it would be difficult to construct an ethical system where you give no consideration to cognitive capacity. Is there a practical reason for said superintelligence to not take into account humans’ cognitive capacity? Is there a logical reason for same?
Not to make light of a serious question, but, “Equal rights for bacteria!”? I think not.
Aside: I am puzzled as to the most likely reason Esar’s comment was downvoted. Was it perhaps considered insufficiently sophisticated, or implying that its poster was insufficiently well-read, for LW?
I’m watching this dialogue now, I’m 45 (of 73) minutes in. I’d just like to remark that:
Eliezer is so nice! Just so patient, and calm, and unmindful of others’ (ahem) attempts to rile him.
Robert Wright seemed more interested in sparking a fiery argument than in productive discussion. And I’m being polite here. Really, he was rather shrill.
Aside: what is the LW policy on commenting on old threads? All good? Frowned upon?
Indeed it is. But the way you fight “memetic infection” in the real world is to take a look at the bad stuff and see where it goes wrong, not to isolate yourself from harmful ideas.
Thankfully for Mr. Pratchett, you can’t influence the genetic lottery or the luck fairy, so his is still valid advice. In fact, one could see “trust in yourself” et al. as invitations to “do or do not, there is no try”, whereas “work hard, learn hard and don’t be lazy” supports the virtue of scholarship as well as that of “know when to give up”. Miss Tick is being eminently practical, and “do or do not”, while also an important virtue, requires way more explanation before the student can understand it.
Hello LessWrong,
I’ve been reading the website for at least the past two years. I like the site, I admire the community, and I figured I should start commenting.
I like to think of myself as a rationalist. LW, along with other sources (Bertrand Russell, Richard Dawkins) has contributed heavily (and positively) to my mental models. Still, I have a lot of work to do.
I like to learn. I like to discuss. I used to like to engage in heated debates, but this seems to have lost some of its appeal recently—either someone is wrong or isn’t, and I prefer to figure out which it is (and how much), point out the error in either my or his thoughts, and move on.
Procrastination is a major problem for me. Risk-aversion too. I’ve lost many dollars to them. I’m working on it, although not as hard as I should (read: desperately hard). I’ve been having a lot of fun, in fact, ever since I realised that just because you’re aware of your biases doesn’t mean you’re no longer subject to them. :-|
There are a few areas where, after I do my due diligence, I will ask the LW community for help. How to properly learn (spaced repetition and (memorising better) [http://lesswrong.com/lw/52x/i_want_a_better_memory/] are of particular interest to me) and how to convince others of your perspective are two topics of particular concern.
In closing, I’d like to say I was very glad there was a Zurich LW meetup recently (even though I couldn’t attend) and there should be more Europe meet-ups. Preferably on the mainland because trains are moar better than planes.
Apteris
Let me clarify why I asked. I think the “multiple layers of abstraction” idea is essentially “build in a lot of ‘manual’ checks that the AI isn’t misbehaving”, and I don’t think that is a desirable or even possible solution. You can write n layer of checks, but how do you know that you don’t need n+1?
The idea being—as has been pointed out here on LW—that what you really want and need is a mathematical model of morality, which the AI will implement and which moral behaviour will fall out of without you having to specify it explicitly. This is what MIRI are working on with CEV & co.
Whether or not CEV or whatever emerges as the best model to use are gameable is itself a mathematical question,[1] central to the FAI problem.
[1] There are also implementation details to consider, e.g. “can I mess with the substrate” or “can I trust my substrate”.