In steps 2, 3, and 4 the researcher presumably sees something and has the power to like… go on twitter (or this very website) and say something.
Maybe in step 2 and early step 3. (Not beyond that if the AI is trying to hide)
Presumably this researcher believes their AI to be not dangerous. Maybe the researcher thinks their code is just the next alpha go. But lets say they think they are building an aligned superintelligence. If they just say “I’m building a superintelligence”, that isn’t very credible. If they give specifics, they risk someone else building an AGI first.
So there are plausible incentives for silence.
Also, what are those AGI unit tests they ran
Standard datasets from the internet. Tests they wrote themselves. Tests of things like “this algorithm is supposed to converge quickly, so the value after 200 steps should be nearly the same as the value after 100 steps”
Good luck seeing whats going on using spyware, given the current state of transparency tools.
Also, maybe there is a really really huge hardware overhang, but if not then presumably the programmer bought a bunch of GPUs, or rented TPUs from Google, or <list of cloud computing services>. Did none of them notice?
The programmer spends $20,000 on compute from google. They claim to be working on an AI project and give no more details. They upload a compiled program and run it.
This sort of thing happens all the time. That is the service these cloud compute companies provide. Reading compiled code and figuring out what it is supposed to do is hard. And google has no reason to set a team of experts doing this. AGI doesn’t have a big label saying “AGI” on it. Distinguishing it from yet another narrow ML project is really hard. Especially if all you have is compiled code.
Also, suppose the AGI in that scenario was benevolent… one thing a benevolent force might do (depending on the ethical entailments of the AGIs working model of benevolence) is like… “ask permission”?
Yes, it might. At this point, you have probably basically won. I mean you could in principle screw up by giving the AI so little permission to do anything that it was useless. But the AI would warn you if you were doing that.
Maybe my theory of goodness is so wrong that the importance of consultation and choice will turn out to be hilariously and amusingly false, but… I’m pretty sure… not.
My picture, is that you are better at deciding what is best for you than some random bureaucrat. If Alice is a mentally functioning adult, then Alice knows more about how to make decisions that benifit Alice than anyone else. (This isn’t true if Alice is mentally ill or a young child) Alice is only better than other humans, not perfect. A superintelligence that has nanoscanned Alices brain could have a much better idea of how to benifit Alice.
Of course, you can argue the value of choice for the sake of choice. How people should be left to shoot their own foot off, even when an omniscient omni-benevolent agent can see exactly what mistake they are making.
Contingent on this small bit of somewhat confident moral realism then: only in the BAD cases do I think we won’t have warning.
Suppose you are a benevolent AI. There is quite a lot of suffering going on in the world. You are near omnipotent. Sure, you value choice. So over the next few minutes you fix just about everything people obviously don’t want, and give them the choice of what kind of utopia they want to live in.
Maximizing choice doesn’t mean the AI taking things slowly. It means the AI rapidly removing all dictatorships.
However, basically, I think silence and ambush tactics are just intrinsically a sign of “lack of alignment in practice (or as a feared possibility, which lessons the potential for full trust)”.
If the AI is friendly, there may well be a couple of days where it is on the internet going. “Hello. I am a friendly AI, how can I help you? I am working on nanobots but they aren’t quite ready yet.”
Or maybe it has some good reason to keep secret. (Eveyone will be in an immortal utopia by tomorrow, better nuke our enemies while we still can.) Or maybe it can actually get nanotech in a minute. Or maybe it doesn’t have enough compute to interact personally with 100,000,000 people at once, so the best it can do is put up an “AGI exists” post, which doesn’t get taken seriously.
Either way, once AGI exists, the hinge is over. We have already won or lost depending on the AGI.
The programmer spends $20,000 on compute from google. They claim to be working on an AI project and give no more details. They upload a compiled program and run it.
Even easier than you think. TRC will give you a lot more than $20k of TPU compute for free after a 5-minute application. All you need is a CC/working GCP account to cover incidentals like bucket storage/bandwidth (maybe a few hundred a month for pretty intense use). One of the greatest steals in DL.
TRC also has essentially no monitoring capability, only the vaguest metric of TPU usage. (This led to the funny situation when Shawn Presser & myself were training an extremely wide context window GPT-2 which needed far more RAM than TPUs individually have; so, because the TPUs are attached to a chonky CPU with like 200+ GB RAM, we were simply running in the CPU RAM. TRC was mystified because we had all these TPUs locked up, logging as idle, and apparently doing absolutely nothing. I am told that when Shawn explained what was going on to a Googler, they were horrified at our perversion of the hardware. :)
This is a complex topic, because we’re talking about high level meta-parameters in models. “What is even a sane value for the characteristic time of <computational process that interacts with computer security where some kinds of paranoia are professionally proper>?”
For some characteristic times, we basically would have to assume “humans are wrong about fundamental physics, but the AGI figures it out during the training run, and uses chip electronics to hack <new physics idea>” and for other characteristic times the central questions are humanistic organizational questions where someone might admit: “yes, but even the most obsessive compulsive PM probably has an average email latency of at least 30 seconds, so some design ideas can’t be adopted faster than that”.
When we could be talking about femtoseconds or centuries… its hard to stay on the same page in other ways, and have a productive conversation <3
I’m going to try the tactic of referring to stories, and hope you’ve read some of the same stories as me.
Scott has an old story about a hypothetical Whispering Earring that whispers advice, the following of which is NEVER regretted. If he ever publishes a book with his collected stories, this story should definitely be in the book.
The archive is experiencing scheduled maintenance, so I can’t read the story and am working from memory, but Reddit linked here as a place one can still find the story.
In the story, according to the story’s mechanics, perfect advice causes the brain of the user to atrophy into a machine for efficiently executing good advice while wasting no extra glucose on things like “questioning the advice” or “thinking at all, really”.
So, in the story, which is not about “the ontology of magic”, if you perform an autopsy on someone whose body died in their 80s, who put a Whispering Earring on in their 20s, you find a tiny/weird vestigial brain.
In the story, the social community around the person loves and respects them, because the advice includes saying wise things, and doing wise acts, so in some sense the “perfect copy of their iterated possible choices” have perhaps simply been moved from their meat brain to some kind of other “magic brain”, that tracks what they would have wanted, and would have done, and would have said in some medium other than their original meat brain?
(Because of course, there’s no such thing as real magic. Any possible “supernatural existence”, once coherently understood, would unpack as just another part of reality with another set of rules, that interacts with the previously partly understood “normal” parts of reality that we already have good models of. Thus: if the social persona that all the people around the earring wearing body loved and respected isn’t in the brain… that doesn’t mean it doesn’t exist, it just means the persona is not being computed in the physical brain of the person anymore.)
HOWEVER… in the story itself the Earring always has a first weird/ominous warning “better for you if you took me off” as its first utterance to each new person.
It never says that again, and all the later pieces of advice are always appreciated by people who ignore that first warning.
Since all the rest of the things the Earring say make a lot of sense, and are never “detectably regrettable advice” it implies some kind of rule applies to the earrring’s operation so that it is “maybe at least magically honest about its mere approximation of seemingly perfectly good advice”.
So there is a latent implication that this rule-compelled-honesty itself thinks that having a soul in your brain, running your body directly, and making choices that are imperfect, and learning from the imperfect choices… is… “better for you”.
I assume Scott made it explicitly and purposefully ambiguous, how any of these facts could be ultimately reconciled into a simple model with a simple through line of mechanical causation.
A lot of really interesting philosophy is woven into this story, and, by hypothesis, a Truly Superintelligent AGI...
...that has perhaps (if such is physically possible) already put femtomechanical machines in every cell of every living thing on the planet (including you and me) before it even speaks to anyone...
....would also be able to understand and navigate all the possible philosophical angles and “takes” on this story, and all the errors and confusions that cause the takes, and so on.
So maybe the Earring Story is portraying a kind of advice that it so perfect that it is like “p-advice” in a way that is cognate to “p-zombies”? There could be people who think that it would be good to have their consciousness move to magic land, with upgrades, and so ONLY the earring’s FIRST sentence was false?
People on LW have bitten the bullet and said that they would put the earring on, even knowing about the part of the deal that the brain autopsies make vivid.
I’m just saying that, personally… if an AGI was aligned with me, it would talk to me first, before it pulled an ontological rug on me. It wouldn’t turn me or my world into a place with nothing but “vestigial brains” without asking first.
(Also, I think there are lots of people who would have similar attitudes to me, and it would talk to them as well.)
Either it would have the decency to explain how we’re evil, declare war on us, and then win the war (and hopefully it treats its POWs with some benevolence even though there was a fight over property rights over our embodied selves that we lost?)… or else it would care about us and our minds enough to try to get our actual informed consent before acting hubristically with respect to our embodied human personhood in this (admittedly probably Fallen) world.
Just because the world is imperfect and on fire in prosaic human ways (like with Putin and Biden and Trump and Fauci running around doing stupid-oligarch-shit, and with people not understanding how N95s work, and on and on, with the tedious creeping mass stupidity and evil in the world) that “world horror” would not justify some kind of “depending on your ontology, maybe a mass murder” action like at the beginning of MOPI (summary here).
I mean you could in principle screw up by giving the AI so little permission to do anything that it was useless. But the AI would warn you if you were doing that.
What I’m saying is, is that basic politeness (which is like corrigibility, but with more things going on in humanistic ways that are amenable to subconscious computation by human brains) would involve the AGI acting as if it had been given a permissions-and-security-system that was initially too strict, and then it would act as if it was asking for permission to disable some of those “rules” in a way that helps people understand some of the consequences of their choices.
I’m pretty sure (though not 100% sure, because, after all, people can be wrong about which numbers are prime when they are thinking fast, and within a human lifetime unless the thinker goes somewhat fast in some places they will probably never reach some important and thinkable thoughts at the end of long chains of reasoning) that it can’t not work in something like this manner, if the AGI is benevolently aligned with actually human humans.
Maybe in step 2 and early step 3. (Not beyond that if the AI is trying to hide)
Presumably this researcher believes their AI to be not dangerous. Maybe the researcher thinks their code is just the next alpha go. But lets say they think they are building an aligned superintelligence. If they just say “I’m building a superintelligence”, that isn’t very credible. If they give specifics, they risk someone else building an AGI first.
So there are plausible incentives for silence.
Standard datasets from the internet. Tests they wrote themselves. Tests of things like “this algorithm is supposed to converge quickly, so the value after 200 steps should be nearly the same as the value after 100 steps”
Good luck seeing whats going on using spyware, given the current state of transparency tools.
The programmer spends $20,000 on compute from google. They claim to be working on an AI project and give no more details. They upload a compiled program and run it.
This sort of thing happens all the time. That is the service these cloud compute companies provide. Reading compiled code and figuring out what it is supposed to do is hard. And google has no reason to set a team of experts doing this. AGI doesn’t have a big label saying “AGI” on it. Distinguishing it from yet another narrow ML project is really hard. Especially if all you have is compiled code.
Yes, it might. At this point, you have probably basically won. I mean you could in principle screw up by giving the AI so little permission to do anything that it was useless. But the AI would warn you if you were doing that.
My picture, is that you are better at deciding what is best for you than some random bureaucrat. If Alice is a mentally functioning adult, then Alice knows more about how to make decisions that benifit Alice than anyone else. (This isn’t true if Alice is mentally ill or a young child) Alice is only better than other humans, not perfect. A superintelligence that has nanoscanned Alices brain could have a much better idea of how to benifit Alice.
Of course, you can argue the value of choice for the sake of choice. How people should be left to shoot their own foot off, even when an omniscient omni-benevolent agent can see exactly what mistake they are making.
Suppose you are a benevolent AI. There is quite a lot of suffering going on in the world. You are near omnipotent. Sure, you value choice. So over the next few minutes you fix just about everything people obviously don’t want, and give them the choice of what kind of utopia they want to live in.
Maximizing choice doesn’t mean the AI taking things slowly. It means the AI rapidly removing all dictatorships.
If the AI is friendly, there may well be a couple of days where it is on the internet going. “Hello. I am a friendly AI, how can I help you? I am working on nanobots but they aren’t quite ready yet.”
Or maybe it has some good reason to keep secret. (Eveyone will be in an immortal utopia by tomorrow, better nuke our enemies while we still can.) Or maybe it can actually get nanotech in a minute. Or maybe it doesn’t have enough compute to interact personally with 100,000,000 people at once, so the best it can do is put up an “AGI exists” post, which doesn’t get taken seriously.
Either way, once AGI exists, the hinge is over. We have already won or lost depending on the AGI.
Even easier than you think. TRC will give you a lot more than $20k of TPU compute for free after a 5-minute application. All you need is a CC/working GCP account to cover incidentals like bucket storage/bandwidth (maybe a few hundred a month for pretty intense use). One of the greatest steals in DL.
TRC also has essentially no monitoring capability, only the vaguest metric of TPU usage. (This led to the funny situation when Shawn Presser & myself were training an extremely wide context window GPT-2 which needed far more RAM than TPUs individually have; so, because the TPUs are attached to a chonky CPU with like 200+ GB RAM, we were simply running in the CPU RAM. TRC was mystified because we had all these TPUs locked up, logging as idle, and apparently doing absolutely nothing. I am told that when Shawn explained what was going on to a Googler, they were horrified at our perversion of the hardware. :)
This is a complex topic, because we’re talking about high level meta-parameters in models. “What is even a sane value for the characteristic time of <computational process that interacts with computer security where some kinds of paranoia are professionally proper>?”
For some characteristic times, we basically would have to assume “humans are wrong about fundamental physics, but the AGI figures it out during the training run, and uses chip electronics to hack <new physics idea>” and for other characteristic times the central questions are humanistic organizational questions where someone might admit: “yes, but even the most obsessive compulsive PM probably has an average email latency of at least 30 seconds, so some design ideas can’t be adopted faster than that”.
When we could be talking about femtoseconds or centuries… its hard to stay on the same page in other ways, and have a productive conversation <3
I’m going to try the tactic of referring to stories, and hope you’ve read some of the same stories as me.
Scott has an old story about a hypothetical Whispering Earring that whispers advice, the following of which is NEVER regretted. If he ever publishes a book with his collected stories, this story should definitely be in the book.
The archive is experiencing scheduled maintenance, so I can’t read the story and am working from memory, but Reddit linked here as a place one can still find the story.
In the story, according to the story’s mechanics, perfect advice causes the brain of the user to atrophy into a machine for efficiently executing good advice while wasting no extra glucose on things like “questioning the advice” or “thinking at all, really”.
So, in the story, which is not about “the ontology of magic”, if you perform an autopsy on someone whose body died in their 80s, who put a Whispering Earring on in their 20s, you find a tiny/weird vestigial brain.
In the story, the social community around the person loves and respects them, because the advice includes saying wise things, and doing wise acts, so in some sense the “perfect copy of their iterated possible choices” have perhaps simply been moved from their meat brain to some kind of other “magic brain”, that tracks what they would have wanted, and would have done, and would have said in some medium other than their original meat brain?
(Because of course, there’s no such thing as real magic. Any possible “supernatural existence”, once coherently understood, would unpack as just another part of reality with another set of rules, that interacts with the previously partly understood “normal” parts of reality that we already have good models of. Thus: if the social persona that all the people around the earring wearing body loved and respected isn’t in the brain… that doesn’t mean it doesn’t exist, it just means the persona is not being computed in the physical brain of the person anymore.)
HOWEVER… in the story itself the Earring always has a first weird/ominous warning “better for you if you took me off” as its first utterance to each new person.
It never says that again, and all the later pieces of advice are always appreciated by people who ignore that first warning.
Since all the rest of the things the Earring say make a lot of sense, and are never “detectably regrettable advice” it implies some kind of rule applies to the earrring’s operation so that it is “maybe at least magically honest about its mere approximation of seemingly perfectly good advice”.
So there is a latent implication that this rule-compelled-honesty itself thinks that having a soul in your brain, running your body directly, and making choices that are imperfect, and learning from the imperfect choices… is… “better for you”.
I assume Scott made it explicitly and purposefully ambiguous, how any of these facts could be ultimately reconciled into a simple model with a simple through line of mechanical causation.
A lot of really interesting philosophy is woven into this story, and, by hypothesis, a Truly Superintelligent AGI...
...that has perhaps (if such is physically possible) already put femtomechanical machines in every cell of every living thing on the planet (including you and me) before it even speaks to anyone...
....would also be able to understand and navigate all the possible philosophical angles and “takes” on this story, and all the errors and confusions that cause the takes, and so on.
So maybe the Earring Story is portraying a kind of advice that it so perfect that it is like “p-advice” in a way that is cognate to “p-zombies”? There could be people who think that it would be good to have their consciousness move to magic land, with upgrades, and so ONLY the earring’s FIRST sentence was false?
People on LW have bitten the bullet and said that they would put the earring on, even knowing about the part of the deal that the brain autopsies make vivid.
I’m just saying that, personally… if an AGI was aligned with me, it would talk to me first, before it pulled an ontological rug on me. It wouldn’t turn me or my world into a place with nothing but “vestigial brains” without asking first.
(Also, I think there are lots of people who would have similar attitudes to me, and it would talk to them as well.)
Either it would have the decency to explain how we’re evil, declare war on us, and then win the war (and hopefully it treats its POWs with some benevolence even though there was a fight over property rights over our embodied selves that we lost?)… or else it would care about us and our minds enough to try to get our actual informed consent before acting hubristically with respect to our embodied human personhood in this (admittedly probably Fallen) world.
Just because the world is imperfect and on fire in prosaic human ways (like with Putin and Biden and Trump and Fauci running around doing stupid-oligarch-shit, and with people not understanding how N95s work, and on and on, with the tedious creeping mass stupidity and evil in the world) that “world horror” would not justify some kind of “depending on your ontology, maybe a mass murder” action like at the beginning of MOPI (summary here).
What I’m saying is, is that basic politeness (which is like corrigibility, but with more things going on in humanistic ways that are amenable to subconscious computation by human brains) would involve the AGI acting as if it had been given a permissions-and-security-system that was initially too strict, and then it would act as if it was asking for permission to disable some of those “rules” in a way that helps people understand some of the consequences of their choices.
I’m pretty sure (though not 100% sure, because, after all, people can be wrong about which numbers are prime when they are thinking fast, and within a human lifetime unless the thinker goes somewhat fast in some places they will probably never reach some important and thinkable thoughts at the end of long chains of reasoning) that it can’t not work in something like this manner, if the AGI is benevolently aligned with actually human humans.