Douglas Reay graduated from Cambridge University (Trinity College) in 1994. He has since worked in the computing and educational sectors.
Douglas_Reay
The ability to edit this particular post appears to be broken at the moment (bug submitted).
In the mean time, here’s a link to the next part:
https://www.lesserwrong.com/posts/SypqmtNcndDwAxhxZ/environments-for-killing-ais
Edited to add: It is now working again, so I’ve fixed it.
> Also maybe this is just getting us ready for later content
Yes, that is the intention.
Parts 2 and 3 now added (links in post), so hopefully the link to building aligned AGI is now clearer?
The other articles in the series have been written, but it was suggested that rather than posting a whole series at once, it is kinder to post one part a day, so as not to flood the frontpage.
So, unless I hear otherwise, my intention is to do that and edit the links at the top of the article to point to each part as it gets posted.
Companies writing programs to model and display large 3D environments in real time face a similar problem, where they only have limited resources. One work around they common use are “imposters”
A solar system sized simulation of a civilisation that has not made observable changes to anything outside our own solar system could take a lot of short cuts when generating the photons that arrive from outside. In particular, until a telescope or camera of particular resolution has been invented, would they need to bother generating thousands of years of such photons in more detail than could be captured by devices yet present?
Look for people who can state your own position as well (or better) than you can, and yet still disagree with your conclusion. They may be aware of additional information that you are not yet aware of.
In addition, if someone who knows more than you about a subject in which you disagree, also has views about several other areas that you do know lots about, and their arguments in those other areas are generally constructive and well balanced, pay close attention to them.
Another approach might be to go meta. Assume that there are many dire threats theoretically possible which, if true, would justify a person in the sole position stop them, doing so at near any cost (from paying a penny or five pounds, all the way up to the person cutting their own throat, or pressing a nuke launching button that would wipe out the human species). Indeed, once the size of action requested in response to the threat is maxed out (it is the biggest response the individual is capable of making), all such claims are functionally identical—the magnitiude of the threat beyond that needed to max out the response, is irrelevant. In this context, there is no difference between 3↑↑↑3 and 3↑↑↑↑3 .
But, what policy upon responding to claims of such threats, should a species have, in order to maximise expected utility?
The moral hazard from encouraging such claims to be made falsely needs to be taken into account.
It is that moral hazard which has to be balanced against a pool of money that, species wide, should be risked on covering such bets. Think of it this way: suppose I, Pascal’s Policeman, were to make the claim “On behalf of the time police, in order to deter confidence tricksters, I hereby guarantee that an additional utility will be added to the multiverse equal in magnitude to the sum of all offers made by Pascal Muggers that happen to be telling the truth (if any), in exchange for your not responding positively to their threats or offers.”
It then becomes a matter of weighing the evidence presented by different muggers and policemen.
Are programmers more likely to pay attention to detail in the middle of a functioning simulation run (rather than waiting until the end before looking at the results), or to pay attention to the causes of unexpected stuttering and resource usage? Could a pattern of enforced ‘rewind events’ be used to communicate?
Should such an experiment be carried out, or is persuading an Architect to terminate the simulation you are in, by frustrating her aim of keeping you guessing, not a good idea?
Assuming that Arthur is knowledgeable enough to understand all the technical arguments—otherwise they’re just impressive noises—it seems that Arthur should view David as having a great advantage in plausibility over Ernie, while Barry has at best a minor advantage over Charles.
This is the slippery bit.
People are often fairly bad at deciding whether or not their knowledge is sufficient to completely understand arguments in a technical subject that they are not a professional in. You frequently see this with some opponents of evolution or anthropogenic global climate change, who think they understand slogans such as “water is the biggest greenhouse gas” or “mutation never creates information”, and decide to discount the credentials of the scientists who have studied the subjects for years.
I’ve always thought of that question as being more about the nature of identity itself.
If you lost your memories, would you still be the same being? If you compare a brain at two different points in time, is their ‘identity’ a continuum, or is it the type of quantity where there is a single agreed definition of “same” versus “not the same”?
See:
157. [Similarity Clusters](http://lesswrong.com/lw/nj/similarity_clusters) 158. [Typicality and Asymmetrical Similarity](http://lesswrong.com/lw/nk/typicality_and_asymmetrical_similarity) 159. [The Cluster Structure of Thingspace](http://lesswrong.com/lw/nl/the_cluster_structure_of_thingspace)
Though I agree that the answer to a question that’s most fundamentally true (or of interest to a philosopher), isn’t necessarily going to be the answer that is most helpful in all circumstances.
It is plausible that the AI thinks that the extrapolated volition of his programmers, the choice they’d make in retrospect if they were wiser and braver, might be to be deceived in this particular instance, for their own good.
Perhaps that is true for a young AI. But what about later on, when the AI is much much wiser than any human?
What protocol should be used for the AI to decide when the time has come for the commitment to not manipulate to end? Should there be an explicit ‘coming of age’ ceremony, with handing over of silver engraved cryptographic keys?
Stanley Coren put some numbers on the effect of sleep deprivation upon IQ test scores.
There’s a more detailed meta-analysis of multiple studies, splitting it by types of mental attribute, here:
Assume we’re talking about the Coherent Extrapolated Volition self-modifying general AI version of “friendly”.
The situation is intended to be a tool, to help think about issues involved in it being the ‘friendly’ move to deceive the programmers.
The situation isn’t fully defined, and no doubt one can think of other options. But I’d suggest you then re-define the situation to bring it back to the core decision. By, for instance, deciding that the same oversight committee have given Albert a read-only connection to the external net, which Albert doesn’t think he will be able to overcome unaided in time to stop Bertram.
Or, to put it another way “If a situation were such, that the only two practical options were to decide between (in the AI’s opinion) overriding the programmer’s opinion via manipulation, or letting something terrible happen that is even more against the AI’s supergoal than violating the ‘be transparent’ sub-goal, which should a correctly programmed friendly AI choose?”
Indeed, it is a question with interesting implications for Nick Bostrom’s Simulation Argument
If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator’s results, thwarting his intentions?
Would you want your young AI to be aware that it was sending out such text messages?
Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon—it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.
Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?
Here’s a poll, for those who’d like to express an opinion instead of (or as well as) comment.
[pollid:749]
Thank you for creating an off-topic test reply to reply to.
[pollid:748]
When such a situation arises again, that there’s an investment opportunity which is generally thought to be worth while, but which has a lower than expected uptake due to ‘trivial inconveniences’, I wonder whether that is in itself an opportunity for a group of rationalists to cooperate by outsourcing as much as possible of the inconvenience to just a few members of the group? Sort of:
“Hey, Lesswrong. I want to invest $100 in new technology foo, but I’m being put off by the upfront time investment of 5-20 hours. If anyone wants to make the offer of {I’ve investigated foo, I know the technological process needed to turn dollars into foo investments, here’s a step by step guide that I’ve tested and which works, or post me a cheque and an email address, and I’ll set it up for you and send you back the access details} I’d be interested in being one of those who pays you compensation for providing that service. ”
There’s a lot lesswrong (or a similar group) could set up to facilitate such outsourcing, such as letting multiple people register interest in the same potential offer, and providing some filtering or guarantee against someone claiming the offer and then ripping people off.