Um, surely if you take (a) people with a track record of successful achievement in an area (b) people without a track record of success but who think they know a lot about the area, the presumption that (a) is more likely to know what they’re talking about should be the default presumption. It may of course not work out that way, but that would surely be the way to bet.
Yes, I agree, but that is only part of the story, right?
What if autodidacts, in their untutored excitability, are excessively concerned about a real risk? Or if a real risk has nearly all autodidacts significantly worried, but only 20% of actual experts significantly worried? Wouldn’t that falsify /u/private_messaging’s assertion? And what’s so implausible about that scenario? Shouldn’t we expect autodidacts’ concerns to be out of step with real risks?
To clarify, I have nothing anything against self educated persons. Some do great things. The “autodidacts” was specifically in quotes.
What is implausible, is this whole narrative where you have a risk obvious enough that people without any relevant training can see it (by the way of that paperclipping argument), yet the relevant experts are ignoring it. Especially when the idea of an intelligence turning against it’s creator is incredibly common in fiction, to the point that nobody has to form that idea on their own.
In general, current AGI architectures work via reinforcement learning: reward and punishment. Relevant experts are worried about what will happen when an AGI with the value-architecture of a pet dog finds that it can steal all the biscuits from the kitchen counter without having to do any tricks.
They are less worried about their current creations FOOMing into god-level superintelligences, because current AI architectures are not FOOMable, and it seems quite unlikely that you can create a self-improving ultraintelligence by accident. Except when that’s exactly what they plan for them to do (ie: Shane Legg).
Juergen Schmidhuber gave an interview on this very website where he basically said that he expects his Goedel Machines to undergo a hard takeoff at some point, with right and wrong being decided retrospectively by the victors of the resulting Artilect War. He may have been trolling, but it’s a bit hard to tell.
With regards to reinforcement learning, one thing to note is that the learning process is in general not the same thing as the intelligence that is being built by the learning process. E.g. if you were to evolve some ecosystem of programs by using “rewards” and “punishments”, the resulting code ends up with distinct goals (just as humans are capable of inventing and using birth control). Not understanding this, local genuises of the AI risk been going on about “omg he’s so stupid it’s going to convert the solar system to smiley faces” with regards to at least one actual AI researcher.
Here is his interview. It’s very, very hard to tell if he’s got his tongue firmly in cheek (he refers to minds of human-level intelligence and our problems as being “small”), or if he’s enjoying an opportunity to troll the hell out of some organization with a low opinion of his work.
With regards to reinforcement learning, one thing to note is that the learning process is in general not the same thing as the intelligence that is being built by the learning process.
With respect to genetic algorithms, you are correct. With respect to something like neural networks (real world stuff) or AIXI (pure theory), you are incorrect. This is actually why machine-learning experts differentiate between evolutionary algorithms (“use an evolutionary process to create an agent that scores well on X”) versus direct learning approaches (“the agent learns to score well on X”).
Not understanding this, local genuises of the AI risk been going on about “omg he’s so stupid it’s going to convert the solar system to smiley faces” with regards to at least one actual AI researcher.
What, really? I mean, while I do get worried about things like Google trying to take over the world, that’s because they’re ideological Singulatarians. They know the danger line is there, and intend to step over it. I do not believe that most competent Really Broad Machine Learning (let’s use that nickname for AGI) researchers are deliberately, suicidally evil, but then again, I don’t believe you can accidentally make a dangerous-level AGI (ie: a program that acts as a VNM-rational agent in pursuit of an inhumane goal).
Accidental and evolved programs are usually just plain not rational agents, and therefore pose rather more limited dangers (crashing your car, as opposed to killing everyone everywhere).
With respect to something like neural networks (real world stuff)
Well, the neural network in my head doesn’t seem to want to maximize the reward signal itself, but instead is more interested in maximizing values imprinted into it by the reward signal (which it can do even by hijacking the reward signal or even by administering “punishments”). Really, reward signal is not utility, period. Teach the person to be good, and they’ll keep themselves good by punishing/rewarding themselves.
or AIXI (pure theory), you are incorrect.
I don’t think it’s worth worrying about the brute force iteration over all possible programs. Once you stop iterating over the whole solution space in the learning method itself, the learning method faces the problem that it can not actually ensure that the structures constructed by the learning method don’t have separate goals (nor is it desirable to ensure such, as you would want to be able to teach values to an agent using the reward signal).
Well, the neural network in my head doesn’t seem to want to maximize the reward signal itself, but instead is more interested in maximizing values imprinted into it by the reward signal (which it can do even by hijacking the reward signal or even by administering “punishments”). Really, reward signal is not utility, period. Teach the person to be good, and they’ll keep themselves good by punishing/rewarding themselves.
Firstly, I was talking about artificial neural networks, which do indeed function as reinforcement learners, by construction and mathematical proof.
Secondly, human beings often function as value learners (“learn what is good via reinforcement, but prefer a value system you’re very sure about over a reward that seems to contradict the learned values”) rather than reinforcement learners. Value learners, in fact, are the topic of a machine ethics paper from 2011, by Daniel Dewey.
the learning method faces the problem that it can not actually ensure that the structures constructed by the learning method don’t have separate goals (nor is it desirable to ensure such, as you would want to be able to teach values to an agent using the reward signal).
Sorry, could you explain this better? It doesn’t match up with how the field of machine learning usually works.
Yes, any given hypothesis a learner has about a target function is only correct to within some probability of error. But that probability can be very small.
With the smiley faces, I am referring to disagreement with Hibbard, summarized e.g. here on wikipedia
Secondly, human beings often function as value learners (“learn what is good via reinforcement, but prefer a value system you’re very sure about over a reward that seems to contradict the learned values”) rather than reinforcement learners. Value learners, in fact, are the topic of a machine ethics paper from 2011, by Daniel Dewey.
You’re speaking as if value learners were not a subtype of reinforcement learners.
For a sufficiently advanced AI, i.e. one that learns to try different counter-factual actions on a world model, it is essential to build a model of the reward, which is to be computed on the counter-factual actions. It’s this model of the reward that is specifying which action gets chosen.
Yes, any given hypothesis a learner has about a target function is only correct to within some probability of error. But that probability can be very small.
Looks like presuming a super-intelligence from the start.
With the smiley faces, I am referring to disagreement with Hibbard, summarized e.g. here on wikipedia
Right, and that wikipedia article refers to stuff Eliezer was writing more than ten years ago. That stuff is nowhere near state-of-the-art machine ethics.
(I think this weekend I might as well blog some decent verbal explanations of what is usually going on in up-to-date machine ethics on here, since a lot of people appear to confuse real, state-of-the-art work with either older, superseded ideas or very intuitive fictions.
Luckily, it’s a very young field, so it’s actually possible for some bozo like me to know a fair amount about it.)
You’re speaking as if value learners were not a subtype of reinforcement learners.
That’s because they are not. These are precise mathematical terms being used here, and while they are similar (for instance, I’d consider a Value Learner closer to a reinforcement learner than to a fixed direct-normativity utility function), they’re not identical, neither is one a direct supertype of the other.
For a sufficiently advanced AI, i.e. one that learns to try different counter-factual actions on a world model, it is essential to build a model of the reward, which is to be computed on the counter-factual actions. It’s this model of the reward that is specifying which action gets chosen.
This intuition is correct, regarding reinforcement learners. It is slightly incorrect regarding value learners, but how precisely it is incorrect is at the research frontier.
Looks like presuming a super-intelligence from the start.
No, I didn’t say the target function was so complex as to require superintelligence. If I have a function f(x) = x + 1, a learner will be able to learn that this is the target function to within a very low probability of error, very quickly, precisely because of its simplicity.
The simpler the target function, the less training data needed to learn it in a supervised paradigm.
Right, and that wikipedia article refers to stuff Eliezer was writing more than ten years ago. That stuff is nowhere near state-of-the-art machine ethics.
I think I seen him using smiley faces as example much more recently, that’s why I thought of it as an example, but can’t find the link.
These are precise mathematical terms being used here
The field of reinforcement learning is far too diverse for these to be “precise mathematical terms”.
The simpler the target function, the less training data needed to learn it in a supervised paradigm.
I thought you were speaking of things like learning an alternative way to produce a button press.
I thought you were speaking of things like learning an alternative way to produce a button press.
Here’s where things like deep learning come in.
Deep learning learns features from the data. The better your set of features, the less complex the true target function is when phrased in terms of those features. However, features themselves can contain a lot of internal complexity.
So, for instance, “press the button” is a very simple target from our perspective, because we already possess abstractions for “button” and “press” and also the ability to name one button as “the button”. Our minds contain a whole lot of very high-level features, some of which we’re born with and some of which we’ve learned over a very long time (by computer-science standards, 18 years of training to produce an adult from an infant is an aeon) using some of the world’s most intelligent deep-learning apparatus (ie: our brains).
Hence the fable of the “dwim” program, which is written in the exact same language of features your mind uses, and which therefore is the Do What I Mean program. This is also known as a Friendly AI.
The point is that the AI is spending a lot of time learning how to make the human press the button. Which results in a model of the human value, used as the reward calculation for the alternative actions.
Granted, there is a possibility of over-fitting of sorts, where the AI proceeds to make rewards more directly—pressing the button if it’s really stupid, soldering together the wires if it’s a little smarter, altering the memory and cpu to sublime into the eternal bliss in a finite time, if it’s really really clever.
Granted, there is a possibility of over-fitting of sorts, where the AI proceeds to make rewards more directly—pressing the button if it’s really stupid, soldering together the wires if it’s a little smarter, altering the memory and cpu to sublime into the eternal bliss in a finite time, if it’s really really clever.
This is exactly why we consider reinforcement learners Unfriendly. A sufficiently smart agent would eventually figure out that what rewards it is not the human’s intent to press the button, but in fact the physical pressing of the button itself, and then, yes, the electrical signal sent by physically pressing the button, blah blah blah.
Its next move would then be to get some robotic arm or foolish human janitor to duct-tape the button in the pressed position. Unfortunately for us, this would not cause it to “bliss out” if it was constructed as a rational learning agent, so it would then proceed to take actions to stop anyone from ever removing the duct-tape.
A sufficiently smart agent would eventually figure out that what rewards it is not the human’s intent to press the button, but in fact the physical pressing of the button itself,
Look, the algorithm that’s adjusting the network weights, it’s really dull. You keep confusing how smart the neural network becomes, with how good the weight adjustment algorithm is.
and then, yes, the electrical signal sent by physically pressing the button, blah blah blah.
and it’s not the clock on the wall that makes the utility sum over time, yes?
so it would then proceed to take actions to stop anyone from ever removing the duct-tape.
One hell of a stupid AI that didn’t even solder together the wires (in case duct tape un-peels), and couldn’t directly set the network values where they’ll be after an infinite time of reward. There’s nothing about “rational” that says “solve a mathematical problem in the same way a dull ape which confused mathematical constraints with the feeling of pleasure would”.
One hell of a stupid AI that didn’t even solder together the wires (in case duct tape un-peels), and couldn’t directly set the network values where they’ll be after an infinite time of reward. There’s nothing about “rational” that says “solve a mathematical problem in the same way a dull ape which confused mathematical constraints with the feeling of pleasure would”.
Do you agree that the way time affects utility is likewise manipulated? The AI has no utility to gain from protecting the duct tape once it has found the way to bypass the button, and it has no utility to gain from protecting the future self once it bypassed the mechanisms tying reward to time (i.e. the clock).
Ohh, by the way, this behaviour probably needs a name… wire-clocking maybe? I came up with the idea on my own a while back but I doubt I’d be the first, it’s not a very difficult insight.
Might write an article for my site. I don’t think said “greater experts” are particularly exceptional at anything other than messiah complex. Here’s something I wrote about that before . My opinion about this general sort of phenomenon is that people get an internally administered reinforcement for intellectual accomplishments, which sometimes mis-trains the network to see great insights where there are none.
I don’t think said “greater experts” are particularly exceptional at anything other than messiah complex.
I didn’t mean him ;-). There are actual journals and conferences where you could publish this sort of result with real peer review, but generally this site would be a good place to get people to point out the embarrassing-level mistakes before you face a review committee.
Try to separate between the problems of AI and the person of, say, Eliezer Yudkowsky. Remember, it was Juergen Schmidhuber, who is in fact the reigning Real Expert on AGI, who said the creation of AI would lead to a massive war between superintelligences in which right and wrong would be defined in retrospect by the winners; so we’ve kinda got a stake in this.
but generally this site would be a good place to get people to point out the embarrassing-level mistakes before you face a review committee.
I’d run it by people I know who are not cherry-picked to have rather unusual views.
Remember, it was Juergen Schmidhuber, who is in fact the reigning Real Expert on AGI, who said the creation of AI would lead to a massive war between superintelligences in which right and wrong would be defined in retrospect by the winners; so we’ve kinda got a stake in this.
He’s hardly the only expert. The war really seems at odds with the notion that AI undergoes rapid hard takeoff, anyhow.
edit: Thing is, opinions are somewhat stochastic, i.e. for something that’s wrong there will be some small number of experts that believe it, and so their mere presence doesn’t provide much evidence.
edit2: also, I don’t believe “rational reward maximization” is what a learning AI ends up doing, except maybe for theoretical constructs such as AIXI. Mostly the reward signal doesn’t work remotely like rational expected utility.
I’d run it by people I know who are not cherry-picked to have rather unusual views.
A good point. Do you perhaps know some? Unfortunately, AI is a very divided field on the subject of predicting what actual implementations of proposed algorithms will really do.
He’s hardly the only expert.
Please, find me a greater expert in AGI than Juergen Schmidhuber. Someone with more publications in peer-reviewed journals, more awards, more victories at learning competitions, more grants given by committees of tenured professors. Shane Legg and Marcus Hutter worked in his lab.
As we normally define credibility (ie: a very credible scientist is one with many publications and grants who works as a senior, tenured professor at a state-sponsored university), Schmidhuber is probably the most credible expert on this subject, as far as I’m aware.
A good point. Do you perhaps know some? Unfortunately, AI is a very divided field on the subject of predicting what actual implementations of proposed algorithms will really do.
I’d talk with some mathematicians.
Please, find me a greater expert in AGI than Juergen Schmidhuber.
Interestingly in the quoted piece he said he doesn’t think friendly AI is possible, and endorsed both the hard take-off (perhaps he means something different by this) and AI wars...
By the way I’d support his group as far as ‘safety’ goes: neural networks would seem particularly unlikely to undergo said “hard take-off”, and assuming gradual improvement, before the AI that goes around killing everyone, in the lines of AIs that tend not to learn what we want, we’d be getting an AI which (for example) whines very annoyingly just like my dog right now does, and for all the pattern recognition powers, can’t even get into the cupboard with the dog food. Getting stuck in a local maximum where annoying approaches are not explored, is a desirable feature in a learning process.
Interestingly in the quoted piece he said he doesn’t think friendly AI is possible
And this is where I’d disagree with him, being probably more knowledgeable in machine ethics than him. Ethical AI is difficult, but I would argue it’s definitely possible. That is, I don’t believe human notions of goodness are so completely, utterly incoherent that we will hate any and all possible universes into which we are placed, and certainly there have existed humans who loved their lives and their world.
If we don’t hate all universes and we love some universes, then the issue is just locating the universes we love and sifting them out from the ones we hate. That might be very difficult, but I don’t believe it’s impossible.
endorsed both the hard take-off (perhaps he means something different by this) and AI wars...
He did design the non-neural Goedel Machine to basically make a hard take-off happen. On purpose. He’s a man of immense chutzpah, and I mean that with all possible admiration.
That is, I don’t believe human notions of goodness are so completely, utterly incoherent
The problem is that as a rational “utility function” things like human desires, or pain, must be defined down at the basic level of computational operations performed by human brains (and the ‘computational operations performed by something’ might itself not even be a definable concept).
Then there’s also ontology issue.
All the optimality guarantees for things like Solomonoff Induction are for predictions, not for the internal stuff inside the model—works great for pressing your button, not so much for determining what people exists and what they want.
For the same observable data, there’s the most probable theory, but there’s also a slightly more complex theory which has far more people at stake. Picture a rather small modification to the theory which multiple-invokes the original theory and makes an enormous number of people get killed depending on the number of anti-protons in this universe, or other such variable that the AI can influence. There’s a definite potential of getting, say, an antimatter maximizer or blackhole minimizer or something equally silly from a provably friendly AI that maximizes expected value over an ontology that has a subtle flaw. Proofs do not extend to checking the sanity of assumptions.
He did design the non-neural Goedel Machine to basically make a hard take-off happen. On purpose. He’s a man of immense chutzpah, and I mean that with all possible admiration.
To be honest, I just fail to be impressed with things such as AIXI or Goedel machine (which admittedly is cooler than the former).
I see as main obstacle to that kind of “neat AI” the reliance on extremely effective algorithms for things such as theorem proving (especially in the presence of logical uncertainty). Most people capable of doing such work would rather work on something that makes use of present and near future technologies. Things like Goedel machine seem to require far more power from the theorem prover than I would consider to be sufficient for the first person to create an AGI.
Yeah, took me a bit of time to figure that out also. The solution where the AI builds enormous amount of defences around itself just seemed quite imperfect—an asteroid might hit it before it builds defences, it might be in a simulation that gets shut-down...
I expect the presence of rogue behaviour to depend on the relation between learning algorithm and the learned data, though.
Suppose the learning algorithm builds up the intelligence by adjusting data in some Turing-complete representation, e.g. adjusting weight in a sufficiently advanced neural network which can have the weights set up so that the network is intelligent. Then the code that adjusts said parameters is not really part of the AI—it’s here for bootstrapping purposes, essentially, and the AI implemented in the neural network should not want to press the reward button unless it wants to self modify in precisely the way in which the reward modifies it.
What I expect is gradual progress, settling on the approaches and parameters that make it easy to teach the AI to do things, gradually improving how AI learns, etc. You need to keep in mind that there’s a very powerful well trained neural network on one side of the teaching process, actively trying to force it’s values into a fairly blank network on the other side, which to begin with probably doesn’t even run in the real-time. Expecting the latter to hack into the former, and not vice versa, strikes me as magical, scifi type thinking. Just because it is on computer doesn’t grant it superpowers.
Unfortunately for us, this would not cause it to “bliss out” if it was constructed as a rational learning agent, so it would then proceed to take actions to stop anyone from ever removing the duct-tape.
That might be true for taping the button down or doing something analogous in software; in that case it’d still be evaluating expected button presses, it’s just that most of the numbers would be very large (and effectively useless from a training perspective). But more sophisticated means of hacking its reward function would effectively lobotomize it: if a pure reinforcement learner’s reward function returns MAXINT on every input, it has no way of planning or evaluating actions against each other.
Those more sophisticated means are also subjectively more rewarding as far as the agent’s concerned.
Ah, really? Oh, right, because current pure reinforcement learners have no self-model, and thus an anvil on their own head might seem very rewarding.
Well, consider my statement modified: current pure reinforcement learners are Unfriendly, but stupid enough that we’ll have a way to kill them, which they will want us to enact.
A self-model might help, but it might not. It depends on the details of how it plans and how time discounting and uncertainty get factored in.
That comes at the stage before the agent inserts a jump-to-register or modifies its defaults or whatever it ends up doing, though. Once it does that, it can’t plan no matter how good of a self-model it had before. The reward function isn’t a component of the planning system in a reinforcement learner; it is the planning system. No reward gradient, no planning.
(Early versions of EURISKO allegedly ran into this problem. The maintainer eventually ended up walling off the reward function from self-modification—a measure that a sufficiently smart AI would presumably be able to work around.)
Thanks for explaining that! Really. For one thing, it clarified a bunch of things I’d been wondering about learning architectures, the evolution of complicated psychologies like ours, and the universe at large. (Yeah, I wish my Machine Learning course had covered reinforcement learners and active environments, but apparently active environments means AI whereas passive learning means ML. Oh well.)
For instance, I now have a clear answer to the question: why would a value architecture more complex than reinforcement learning evolve in the first place? Answer: because pure reinforcement learning falls into a self-destructive bliss-out attractor. Therefore, even if it’s computationally (and therefore physically/biologically) more simple, it will get eliminated by natural selection very quickly.
Well, this is limited by the agent’s ability to hack its reward system, and most natural agents are less than perfect in that respect. I think the answer to “why aren’t we all pure reinforcement learners?” is a little less clean than you suggest; it probably has something to do with the layers of reflexive and semi-reflexive agency our GI architecture is built on, and something to do with the fact that we have multiple reward channels (another symptom of messy ad-hoc evolution), and something to do with the bounds on our ability to anticipate future rewards.
Even so, it’s not perfect. Heroin addicts do exist.
However, a reality in which pure reinforcement learners self-destruct from blissing out remains simpler than one in which a sufficiently good reinforcement learner goes FOOM and takes over the universe.
It may not be. I was not taking issue with the claim “Experts need not listen to autodidacts.” I was taking issue with the claim “Given a real risk, experts are more likely to be concerned than autodidacts are.”
I would assume that experts are likely to be concerned to an extent more appropriate to the severity of the risk than autodidacts are.
There can be exceptions, of course, but when non-experts make widely more extreme claims than experts do on some issue, especially a strongly emotively charged issue (e.g. the End of the World), unless they can present really compelling evidence and arguments, Dunning–Kruger effect seems to be the most likely explanation.
I would assume that experts are likely to be concerned to an extent more appropriate to the severity of the risk than autodidacts are.
That is exactly what I would assume too. Autodidacts’ risk estimates should be worse than experts’. It does not follow that autodidacts’ risk estimates should be milder than experts’, though. The latter claim is what I meant to contest.
Let’s talk about some woo that you’re not interested in. E.g. health risks of thymerosal and vaccines in general. Who’s more likely to notice it, some self proclaimed “autodidacts”, or normal biochemistry experts? Who noticed the possibility of a nuke, back-then conspiracy theorists or scientists? Was Semmelweis some weird outsider, or was he a regular medical doctor with medical training? And so on and so forth.
Right now, experts are concerned with things like nuclear war, run-away methane releases, epidemics, and so on, while various self proclaimed existential risk people (mostly philosophers) seem to be to greater or lesser extent neglecting said risks in favor of movie plot dangers such as runaway self improving AI or perhaps totalitarian world government. (Of course if you listen to said x-risk folks, they’re going to tell you that it’s because the real experts are wrong.)
Who’s more likely to notice it, some self proclaimed “autodidacts”, or normal biochemistry experts? Who noticed the possibility of a nuke, back-then conspiracy theorists or scientists? Was Semmelweis some weird outsider, or was he a regular medical doctor with medical training?
All are good and relevant examples, and they all support the claim in question. Thanks!
But your second paragraph supports the opposite claim. (Again, the claim in question is: Experts are more likely to be concerned over risks than autodidacts are.) In the second paragraph, you give a couple “movie plot” risks, and note that autodidacts are more concerned about them than experts are. Those would therefore be cases of autodidacts being more concerned about risks than experts, right?
If the claim were “Experts have more realistic risk estimates than autodidacts do,” then I would readily agree. But you seem to have claimed that autodidacts’ risk estimates aren’t just wrong—they are biased downward. Is that indeed what you meant to claim, or have I misunderstood you?
What I said was that “autodidacts” (note the scare quotes) are more likely to fail to notice some genuine risk, than the experts are. E.g. if there’s some one specific medication that poses risk for a reason X, those anti vaxers are extremely unlikely to spot that, due to the lack of necessary knowledge and skills.
By “autodidacts” in scare quotes I mean interested and somewhat erudite laymen who may have read a lot of books but clearly did very few exercises from university textbooks (edit: or any other feedback providing exercises at all).
Does that mean there is a terrible ignored risk? No, when there is a real risk, the brightest people of extreme and diverse intellectual accomplishment are the ones most likely to be concerned about it (and various “autodidacts” are most likely to fail to notice the risk).
Besides, being more concerned is not the same as being more likely to be concerned. Just as being prone to panic doesn’t automatically make you better at hearing danger.
being more concerned is not the same as being more likely to be concerned
True, and I see that this distinction undercuts one of the ways there could be more autodidact concern than expert concern. But there is at least one more way, which I suggested earlier.
Imagine a world populated by a hundred experts, a hundred autodidacts, and a risk. Let E be the number of experts concerned about the risk, and A be the number of concerned autodidacts.
I interpret you as saying that E is greater than A. Is this a correct interpretation?
To the claim that E > A, I am saying “not necessarily.” Here is how.
Since the risk is a genuine risk, we assume that nearly all the experts are concerned. So we set E = 95. Now suppose those without formal training all suffer from the same common pitfalls, and so tend to make errors in the same direction. Suppose that due to these errors, autodidacts with their little learning are even more likely to be concerned. If they were all better trained, they would all relax a bit, and some would relax enough to cross the line into “not concerned” territory.
The above scenario seems perfectly plausible to me; is there some problem with it that I have missed? Does it miss the point? It is not the most likely scenario, but it’s far from impossible, and you seem to have cavalierly ruled it out. Hence my original request for a source.
The above scenario seems perfectly plausible to me
Seems highly unlikely for some risk the properties of which you don’t get to choose. Therefore in no way contradicts the assertion that experts are more likely to become aware of risks.
To large extent everyone is an autodidact, without scare quotes—a lot of learning is done on your own even if you are attending an university. It’s just that some people skip exercises and mistake popularization books for learning material, and so on. Those aren’t more likely to make correct inferences, precisely due to their lack of training in drawing inferences.
edit: and of course there are people who were not able to attend an university, despite intelligence and inclinations towards education, due to factors such as poverty, disability, etc. Some of them manage to learn properly on their own. Those have their work to show for it, various achievements in technical fields, and so on. I wouldn’t put scare quotes around those. And the brightest aren’t going to ignore someone just because they don’t have PhD, or listen to someone just because they do.
Seems highly unlikely for some risk the properties of which you don’t get to choose. Therefore in no way contradicts the assertion that experts are more likely to become aware of risks.
OK, so maybe this turns on how likely “likely” is?
Well, one can always make some unlikely circumstances where something generally unlikely is likely. E.g. it’s unlikely to roll 10 sixes in the row with this die. You can postulate we’re living in a simulator set up so that the die would have 99% probability of rolling 10 sixes, that doesn’t actually make this die likely to roll 10 sixes in the row if its unlikely that we are living in such a simulator. This is just moving improbability around.
Yes, that’s true. So, is that what I was doing all along? It sure looks like it. Oops. Sorry for taking so long to change my mind, and thanks for your persistence and patience.
Um, surely if you take (a) people with a track record of successful achievement in an area (b) people without a track record of success but who think they know a lot about the area, the presumption that (a) is more likely to know what they’re talking about should be the default presumption. It may of course not work out that way, but that would surely be the way to bet.
Yes, I agree, but that is only part of the story, right?
What if autodidacts, in their untutored excitability, are excessively concerned about a real risk? Or if a real risk has nearly all autodidacts significantly worried, but only 20% of actual experts significantly worried? Wouldn’t that falsify /u/private_messaging’s assertion? And what’s so implausible about that scenario? Shouldn’t we expect autodidacts’ concerns to be out of step with real risks?
To clarify, I have nothing anything against self educated persons. Some do great things. The “autodidacts” was specifically in quotes.
What is implausible, is this whole narrative where you have a risk obvious enough that people without any relevant training can see it (by the way of that paperclipping argument), yet the relevant experts are ignoring it. Especially when the idea of an intelligence turning against it’s creator is incredibly common in fiction, to the point that nobody has to form that idea on their own.
In general, current AGI architectures work via reinforcement learning: reward and punishment. Relevant experts are worried about what will happen when an AGI with the value-architecture of a pet dog finds that it can steal all the biscuits from the kitchen counter without having to do any tricks.
They are less worried about their current creations FOOMing into god-level superintelligences, because current AI architectures are not FOOMable, and it seems quite unlikely that you can create a self-improving ultraintelligence by accident. Except when that’s exactly what they plan for them to do (ie: Shane Legg).
Juergen Schmidhuber gave an interview on this very website where he basically said that he expects his Goedel Machines to undergo a hard takeoff at some point, with right and wrong being decided retrospectively by the victors of the resulting Artilect War. He may have been trolling, but it’s a bit hard to tell.
I’d need to have links and to read it by myself.
With regards to reinforcement learning, one thing to note is that the learning process is in general not the same thing as the intelligence that is being built by the learning process. E.g. if you were to evolve some ecosystem of programs by using “rewards” and “punishments”, the resulting code ends up with distinct goals (just as humans are capable of inventing and using birth control). Not understanding this, local genuises of the AI risk been going on about “omg he’s so stupid it’s going to convert the solar system to smiley faces” with regards to at least one actual AI researcher.
Here is his interview. It’s very, very hard to tell if he’s got his tongue firmly in cheek (he refers to minds of human-level intelligence and our problems as being “small”), or if he’s enjoying an opportunity to troll the hell out of some organization with a low opinion of his work.
With respect to genetic algorithms, you are correct. With respect to something like neural networks (real world stuff) or AIXI (pure theory), you are incorrect. This is actually why machine-learning experts differentiate between evolutionary algorithms (“use an evolutionary process to create an agent that scores well on X”) versus direct learning approaches (“the agent learns to score well on X”).
What, really? I mean, while I do get worried about things like Google trying to take over the world, that’s because they’re ideological Singulatarians. They know the danger line is there, and intend to step over it. I do not believe that most competent Really Broad Machine Learning (let’s use that nickname for AGI) researchers are deliberately, suicidally evil, but then again, I don’t believe you can accidentally make a dangerous-level AGI (ie: a program that acts as a VNM-rational agent in pursuit of an inhumane goal).
Accidental and evolved programs are usually just plain not rational agents, and therefore pose rather more limited dangers (crashing your car, as opposed to killing everyone everywhere).
Well, the neural network in my head doesn’t seem to want to maximize the reward signal itself, but instead is more interested in maximizing values imprinted into it by the reward signal (which it can do even by hijacking the reward signal or even by administering “punishments”). Really, reward signal is not utility, period. Teach the person to be good, and they’ll keep themselves good by punishing/rewarding themselves.
I don’t think it’s worth worrying about the brute force iteration over all possible programs. Once you stop iterating over the whole solution space in the learning method itself, the learning method faces the problem that it can not actually ensure that the structures constructed by the learning method don’t have separate goals (nor is it desirable to ensure such, as you would want to be able to teach values to an agent using the reward signal).
Firstly, I was talking about artificial neural networks, which do indeed function as reinforcement learners, by construction and mathematical proof.
Secondly, human beings often function as value learners (“learn what is good via reinforcement, but prefer a value system you’re very sure about over a reward that seems to contradict the learned values”) rather than reinforcement learners. Value learners, in fact, are the topic of a machine ethics paper from 2011, by Daniel Dewey.
Sorry, could you explain this better? It doesn’t match up with how the field of machine learning usually works.
Yes, any given hypothesis a learner has about a target function is only correct to within some probability of error. But that probability can be very small.
With the smiley faces, I am referring to disagreement with Hibbard, summarized e.g. here on wikipedia
You’re speaking as if value learners were not a subtype of reinforcement learners.
For a sufficiently advanced AI, i.e. one that learns to try different counter-factual actions on a world model, it is essential to build a model of the reward, which is to be computed on the counter-factual actions. It’s this model of the reward that is specifying which action gets chosen.
Looks like presuming a super-intelligence from the start.
Right, and that wikipedia article refers to stuff Eliezer was writing more than ten years ago. That stuff is nowhere near state-of-the-art machine ethics.
(I think this weekend I might as well blog some decent verbal explanations of what is usually going on in up-to-date machine ethics on here, since a lot of people appear to confuse real, state-of-the-art work with either older, superseded ideas or very intuitive fictions.
Luckily, it’s a very young field, so it’s actually possible for some bozo like me to know a fair amount about it.)
That’s because they are not. These are precise mathematical terms being used here, and while they are similar (for instance, I’d consider a Value Learner closer to a reinforcement learner than to a fixed direct-normativity utility function), they’re not identical, neither is one a direct supertype of the other.
This intuition is correct, regarding reinforcement learners. It is slightly incorrect regarding value learners, but how precisely it is incorrect is at the research frontier.
No, I didn’t say the target function was so complex as to require superintelligence. If I have a function f(x) = x + 1, a learner will be able to learn that this is the target function to within a very low probability of error, very quickly, precisely because of its simplicity.
The simpler the target function, the less training data needed to learn it in a supervised paradigm.
I think I seen him using smiley faces as example much more recently, that’s why I thought of it as an example, but can’t find the link.
The field of reinforcement learning is far too diverse for these to be “precise mathematical terms”.
I thought you were speaking of things like learning an alternative way to produce a button press.
Here’s where things like deep learning come in.
Deep learning learns features from the data. The better your set of features, the less complex the true target function is when phrased in terms of those features. However, features themselves can contain a lot of internal complexity.
So, for instance, “press the button” is a very simple target from our perspective, because we already possess abstractions for “button” and “press” and also the ability to name one button as “the button”. Our minds contain a whole lot of very high-level features, some of which we’re born with and some of which we’ve learned over a very long time (by computer-science standards, 18 years of training to produce an adult from an infant is an aeon) using some of the world’s most intelligent deep-learning apparatus (ie: our brains).
Hence the fable of the “dwim” program, which is written in the exact same language of features your mind uses, and which therefore is the Do What I Mean program. This is also known as a Friendly AI.
The point is that the AI is spending a lot of time learning how to make the human press the button. Which results in a model of the human value, used as the reward calculation for the alternative actions.
Granted, there is a possibility of over-fitting of sorts, where the AI proceeds to make rewards more directly—pressing the button if it’s really stupid, soldering together the wires if it’s a little smarter, altering the memory and cpu to sublime into the eternal bliss in a finite time, if it’s really really clever.
This is exactly why we consider reinforcement learners Unfriendly. A sufficiently smart agent would eventually figure out that what rewards it is not the human’s intent to press the button, but in fact the physical pressing of the button itself, and then, yes, the electrical signal sent by physically pressing the button, blah blah blah.
Its next move would then be to get some robotic arm or foolish human janitor to duct-tape the button in the pressed position. Unfortunately for us, this would not cause it to “bliss out” if it was constructed as a rational learning agent, so it would then proceed to take actions to stop anyone from ever removing the duct-tape.
Look, the algorithm that’s adjusting the network weights, it’s really dull. You keep confusing how smart the neural network becomes, with how good the weight adjustment algorithm is.
and it’s not the clock on the wall that makes the utility sum over time, yes?
One hell of a stupid AI that didn’t even solder together the wires (in case duct tape un-peels), and couldn’t directly set the network values where they’ll be after an infinite time of reward. There’s nothing about “rational” that says “solve a mathematical problem in the same way a dull ape which confused mathematical constraints with the feeling of pleasure would”.
Yes, I agree. The duct-tape is a metaphor.
Do you agree that the way time affects utility is likewise manipulated? The AI has no utility to gain from protecting the duct tape once it has found the way to bypass the button, and it has no utility to gain from protecting the future self once it bypassed the mechanisms tying reward to time (i.e. the clock).
Yes, I think we agree at this point. Today I learned: “rogue” reinforcement learners are dead easy to kill. Suckers.
Ohh, by the way, this behaviour probably needs a name… wire-clocking maybe? I came up with the idea on my own a while back but I doubt I’d be the first, it’s not a very difficult insight.
If it’s your idea, you should probably write it up as a LessWrong post, possibly get the Greater Experts to talk about it, possibly add a wiki page.
“Clock smoking”, I’d almost say, but I have a punny mind.
Might write an article for my site. I don’t think said “greater experts” are particularly exceptional at anything other than messiah complex. Here’s something I wrote about that before . My opinion about this general sort of phenomenon is that people get an internally administered reinforcement for intellectual accomplishments, which sometimes mis-trains the network to see great insights where there are none.
I didn’t mean him ;-). There are actual journals and conferences where you could publish this sort of result with real peer review, but generally this site would be a good place to get people to point out the embarrassing-level mistakes before you face a review committee.
Try to separate between the problems of AI and the person of, say, Eliezer Yudkowsky. Remember, it was Juergen Schmidhuber, who is in fact the reigning Real Expert on AGI, who said the creation of AI would lead to a massive war between superintelligences in which right and wrong would be defined in retrospect by the winners; so we’ve kinda got a stake in this.
I’d run it by people I know who are not cherry-picked to have rather unusual views.
He’s hardly the only expert. The war really seems at odds with the notion that AI undergoes rapid hard takeoff, anyhow.
edit: Thing is, opinions are somewhat stochastic, i.e. for something that’s wrong there will be some small number of experts that believe it, and so their mere presence doesn’t provide much evidence.
edit2: also, I don’t believe “rational reward maximization” is what a learning AI ends up doing, except maybe for theoretical constructs such as AIXI. Mostly the reward signal doesn’t work remotely like rational expected utility.
A good point. Do you perhaps know some? Unfortunately, AI is a very divided field on the subject of predicting what actual implementations of proposed algorithms will really do.
Please, find me a greater expert in AGI than Juergen Schmidhuber. Someone with more publications in peer-reviewed journals, more awards, more victories at learning competitions, more grants given by committees of tenured professors. Shane Legg and Marcus Hutter worked in his lab.
As we normally define credibility (ie: a very credible scientist is one with many publications and grants who works as a senior, tenured professor at a state-sponsored university), Schmidhuber is probably the most credible expert on this subject, as far as I’m aware.
I’d talk with some mathematicians.
Interestingly in the quoted piece he said he doesn’t think friendly AI is possible, and endorsed both the hard take-off (perhaps he means something different by this) and AI wars...
By the way I’d support his group as far as ‘safety’ goes: neural networks would seem particularly unlikely to undergo said “hard take-off”, and assuming gradual improvement, before the AI that goes around killing everyone, in the lines of AIs that tend not to learn what we want, we’d be getting an AI which (for example) whines very annoyingly just like my dog right now does, and for all the pattern recognition powers, can’t even get into the cupboard with the dog food. Getting stuck in a local maximum where annoying approaches are not explored, is a desirable feature in a learning process.
And this is where I’d disagree with him, being probably more knowledgeable in machine ethics than him. Ethical AI is difficult, but I would argue it’s definitely possible. That is, I don’t believe human notions of goodness are so completely, utterly incoherent that we will hate any and all possible universes into which we are placed, and certainly there have existed humans who loved their lives and their world.
If we don’t hate all universes and we love some universes, then the issue is just locating the universes we love and sifting them out from the ones we hate. That might be very difficult, but I don’t believe it’s impossible.
He did design the non-neural Goedel Machine to basically make a hard take-off happen. On purpose. He’s a man of immense chutzpah, and I mean that with all possible admiration.
The problem is that as a rational “utility function” things like human desires, or pain, must be defined down at the basic level of computational operations performed by human brains (and the ‘computational operations performed by something’ might itself not even be a definable concept).
Then there’s also ontology issue.
All the optimality guarantees for things like Solomonoff Induction are for predictions, not for the internal stuff inside the model—works great for pressing your button, not so much for determining what people exists and what they want.
For the same observable data, there’s the most probable theory, but there’s also a slightly more complex theory which has far more people at stake. Picture a rather small modification to the theory which multiple-invokes the original theory and makes an enormous number of people get killed depending on the number of anti-protons in this universe, or other such variable that the AI can influence. There’s a definite potential of getting, say, an antimatter maximizer or blackhole minimizer or something equally silly from a provably friendly AI that maximizes expected value over an ontology that has a subtle flaw. Proofs do not extend to checking the sanity of assumptions.
To be honest, I just fail to be impressed with things such as AIXI or Goedel machine (which admittedly is cooler than the former).
I see as main obstacle to that kind of “neat AI” the reliance on extremely effective algorithms for things such as theorem proving (especially in the presence of logical uncertainty). Most people capable of doing such work would rather work on something that makes use of present and near future technologies. Things like Goedel machine seem to require far more power from the theorem prover than I would consider to be sufficient for the first person to create an AGI.
Yeah, took me a bit of time to figure that out also. The solution where the AI builds enormous amount of defences around itself just seemed quite imperfect—an asteroid might hit it before it builds defences, it might be in a simulation that gets shut-down...
I expect the presence of rogue behaviour to depend on the relation between learning algorithm and the learned data, though.
Suppose the learning algorithm builds up the intelligence by adjusting data in some Turing-complete representation, e.g. adjusting weight in a sufficiently advanced neural network which can have the weights set up so that the network is intelligent. Then the code that adjusts said parameters is not really part of the AI—it’s here for bootstrapping purposes, essentially, and the AI implemented in the neural network should not want to press the reward button unless it wants to self modify in precisely the way in which the reward modifies it.
What I expect is gradual progress, settling on the approaches and parameters that make it easy to teach the AI to do things, gradually improving how AI learns, etc. You need to keep in mind that there’s a very powerful well trained neural network on one side of the teaching process, actively trying to force it’s values into a fairly blank network on the other side, which to begin with probably doesn’t even run in the real-time. Expecting the latter to hack into the former, and not vice versa, strikes me as magical, scifi type thinking. Just because it is on computer doesn’t grant it superpowers.
That might be true for taping the button down or doing something analogous in software; in that case it’d still be evaluating expected button presses, it’s just that most of the numbers would be very large (and effectively useless from a training perspective). But more sophisticated means of hacking its reward function would effectively lobotomize it: if a pure reinforcement learner’s reward function returns MAXINT on every input, it has no way of planning or evaluating actions against each other.
Those more sophisticated means are also subjectively more rewarding as far as the agent’s concerned.
Ah, really? Oh, right, because current pure reinforcement learners have no self-model, and thus an anvil on their own head might seem very rewarding.
Well, consider my statement modified: current pure reinforcement learners are Unfriendly, but stupid enough that we’ll have a way to kill them, which they will want us to enact.
A self-model might help, but it might not. It depends on the details of how it plans and how time discounting and uncertainty get factored in.
That comes at the stage before the agent inserts a jump-to-register or modifies its defaults or whatever it ends up doing, though. Once it does that, it can’t plan no matter how good of a self-model it had before. The reward function isn’t a component of the planning system in a reinforcement learner; it is the planning system. No reward gradient, no planning.
(Early versions of EURISKO allegedly ran into this problem. The maintainer eventually ended up walling off the reward function from self-modification—a measure that a sufficiently smart AI would presumably be able to work around.)
Thanks for explaining that! Really. For one thing, it clarified a bunch of things I’d been wondering about learning architectures, the evolution of complicated psychologies like ours, and the universe at large. (Yeah, I wish my Machine Learning course had covered reinforcement learners and active environments, but apparently active environments means AI whereas passive learning means ML. Oh well.)
For instance, I now have a clear answer to the question: why would a value architecture more complex than reinforcement learning evolve in the first place? Answer: because pure reinforcement learning falls into a self-destructive bliss-out attractor. Therefore, even if it’s computationally (and therefore physically/biologically) more simple, it will get eliminated by natural selection very quickly.
Neat!
Well, this is limited by the agent’s ability to hack its reward system, and most natural agents are less than perfect in that respect. I think the answer to “why aren’t we all pure reinforcement learners?” is a little less clean than you suggest; it probably has something to do with the layers of reflexive and semi-reflexive agency our GI architecture is built on, and something to do with the fact that we have multiple reward channels (another symptom of messy ad-hoc evolution), and something to do with the bounds on our ability to anticipate future rewards.
Even so, it’s not perfect. Heroin addicts do exist.
True true.
However, a reality in which pure reinforcement learners self-destruct from blissing out remains simpler than one in which a sufficiently good reinforcement learner goes FOOM and takes over the universe.
If autodidacts are excessively concerned, then why would it be worth for experts to listen to them?
It may not be. I was not taking issue with the claim “Experts need not listen to autodidacts.” I was taking issue with the claim “Given a real risk, experts are more likely to be concerned than autodidacts are.”
I would assume that experts are likely to be concerned to an extent more appropriate to the severity of the risk than autodidacts are.
There can be exceptions, of course, but when non-experts make widely more extreme claims than experts do on some issue, especially a strongly emotively charged issue (e.g. the End of the World), unless they can present really compelling evidence and arguments, Dunning–Kruger effect seems to be the most likely explanation.
That is exactly what I would assume too. Autodidacts’ risk estimates should be worse than experts’. It does not follow that autodidacts’ risk estimates should be milder than experts’, though. The latter claim is what I meant to contest.
“Autodidacts” was in quotes for a reason.
Let’s talk about some woo that you’re not interested in. E.g. health risks of thymerosal and vaccines in general. Who’s more likely to notice it, some self proclaimed “autodidacts”, or normal biochemistry experts? Who noticed the possibility of a nuke, back-then conspiracy theorists or scientists? Was Semmelweis some weird outsider, or was he a regular medical doctor with medical training? And so on and so forth.
Right now, experts are concerned with things like nuclear war, run-away methane releases, epidemics, and so on, while various self proclaimed existential risk people (mostly philosophers) seem to be to greater or lesser extent neglecting said risks in favor of movie plot dangers such as runaway self improving AI or perhaps totalitarian world government. (Of course if you listen to said x-risk folks, they’re going to tell you that it’s because the real experts are wrong.)
All are good and relevant examples, and they all support the claim in question. Thanks!
But your second paragraph supports the opposite claim. (Again, the claim in question is: Experts are more likely to be concerned over risks than autodidacts are.) In the second paragraph, you give a couple “movie plot” risks, and note that autodidacts are more concerned about them than experts are. Those would therefore be cases of autodidacts being more concerned about risks than experts, right?
If the claim were “Experts have more realistic risk estimates than autodidacts do,” then I would readily agree. But you seem to have claimed that autodidacts’ risk estimates aren’t just wrong—they are biased downward. Is that indeed what you meant to claim, or have I misunderstood you?
What I said was that “autodidacts” (note the scare quotes) are more likely to fail to notice some genuine risk, than the experts are. E.g. if there’s some one specific medication that poses risk for a reason X, those anti vaxers are extremely unlikely to spot that, due to the lack of necessary knowledge and skills.
By “autodidacts” in scare quotes I mean interested and somewhat erudite laymen who may have read a lot of books but clearly did very few exercises from university textbooks (edit: or any other feedback providing exercises at all).
I understand the scare quotes.
I agree that autodidacts “are more likely to fail to notice some genuine risk, than experts are.”
But autodidacts are also more likely to exaggerate other genuine risks than experts are, are they not?
If (3) is true, then doesn’t that undermine the claim “Experts are more likely to be concerned over risks than autodidacts are”?
What I said was:
Besides, being more concerned is not the same as being more likely to be concerned. Just as being prone to panic doesn’t automatically make you better at hearing danger.
True, and I see that this distinction undercuts one of the ways there could be more autodidact concern than expert concern. But there is at least one more way, which I suggested earlier.
Imagine a world populated by a hundred experts, a hundred autodidacts, and a risk. Let E be the number of experts concerned about the risk, and A be the number of concerned autodidacts.
I interpret you as saying that E is greater than A. Is this a correct interpretation?
To the claim that E > A, I am saying “not necessarily.” Here is how.
Since the risk is a genuine risk, we assume that nearly all the experts are concerned. So we set E = 95. Now suppose those without formal training all suffer from the same common pitfalls, and so tend to make errors in the same direction. Suppose that due to these errors, autodidacts with their little learning are even more likely to be concerned. If they were all better trained, they would all relax a bit, and some would relax enough to cross the line into “not concerned” territory.
The above scenario seems perfectly plausible to me; is there some problem with it that I have missed? Does it miss the point? It is not the most likely scenario, but it’s far from impossible, and you seem to have cavalierly ruled it out. Hence my original request for a source.
Seems highly unlikely for some risk the properties of which you don’t get to choose. Therefore in no way contradicts the assertion that experts are more likely to become aware of risks.
To large extent everyone is an autodidact, without scare quotes—a lot of learning is done on your own even if you are attending an university. It’s just that some people skip exercises and mistake popularization books for learning material, and so on. Those aren’t more likely to make correct inferences, precisely due to their lack of training in drawing inferences.
edit: and of course there are people who were not able to attend an university, despite intelligence and inclinations towards education, due to factors such as poverty, disability, etc. Some of them manage to learn properly on their own. Those have their work to show for it, various achievements in technical fields, and so on. I wouldn’t put scare quotes around those. And the brightest aren’t going to ignore someone just because they don’t have PhD, or listen to someone just because they do.
OK, so maybe this turns on how likely “likely” is?
Edit: fixed quotation marks
Well, one can always make some unlikely circumstances where something generally unlikely is likely. E.g. it’s unlikely to roll 10 sixes in the row with this die. You can postulate we’re living in a simulator set up so that the die would have 99% probability of rolling 10 sixes, that doesn’t actually make this die likely to roll 10 sixes in the row if its unlikely that we are living in such a simulator. This is just moving improbability around.
Yes, that’s true. So, is that what I was doing all along? It sure looks like it. Oops. Sorry for taking so long to change my mind, and thanks for your persistence and patience.