This is a complex topic, because we’re talking about high level meta-parameters in models. “What is even a sane value for the characteristic time of <computational process that interacts with computer security where some kinds of paranoia are professionally proper>?”
For some characteristic times, we basically would have to assume “humans are wrong about fundamental physics, but the AGI figures it out during the training run, and uses chip electronics to hack <new physics idea>” and for other characteristic times the central questions are humanistic organizational questions where someone might admit: “yes, but even the most obsessive compulsive PM probably has an average email latency of at least 30 seconds, so some design ideas can’t be adopted faster than that”.
When we could be talking about femtoseconds or centuries… its hard to stay on the same page in other ways, and have a productive conversation <3
I’m going to try the tactic of referring to stories, and hope you’ve read some of the same stories as me.
Scott has an old story about a hypothetical Whispering Earring that whispers advice, the following of which is NEVER regretted. If he ever publishes a book with his collected stories, this story should definitely be in the book.
The archive is experiencing scheduled maintenance, so I can’t read the story and am working from memory, but Reddit linked here as a place one can still find the story.
In the story, according to the story’s mechanics, perfect advice causes the brain of the user to atrophy into a machine for efficiently executing good advice while wasting no extra glucose on things like “questioning the advice” or “thinking at all, really”.
So, in the story, which is not about “the ontology of magic”, if you perform an autopsy on someone whose body died in their 80s, who put a Whispering Earring on in their 20s, you find a tiny/weird vestigial brain.
In the story, the social community around the person loves and respects them, because the advice includes saying wise things, and doing wise acts, so in some sense the “perfect copy of their iterated possible choices” have perhaps simply been moved from their meat brain to some kind of other “magic brain”, that tracks what they would have wanted, and would have done, and would have said in some medium other than their original meat brain?
(Because of course, there’s no such thing as real magic. Any possible “supernatural existence”, once coherently understood, would unpack as just another part of reality with another set of rules, that interacts with the previously partly understood “normal” parts of reality that we already have good models of. Thus: if the social persona that all the people around the earring wearing body loved and respected isn’t in the brain… that doesn’t mean it doesn’t exist, it just means the persona is not being computed in the physical brain of the person anymore.)
HOWEVER… in the story itself the Earring always has a first weird/ominous warning “better for you if you took me off” as its first utterance to each new person.
It never says that again, and all the later pieces of advice are always appreciated by people who ignore that first warning.
Since all the rest of the things the Earring say make a lot of sense, and are never “detectably regrettable advice” it implies some kind of rule applies to the earrring’s operation so that it is “maybe at least magically honest about its mere approximation of seemingly perfectly good advice”.
So there is a latent implication that this rule-compelled-honesty itself thinks that having a soul in your brain, running your body directly, and making choices that are imperfect, and learning from the imperfect choices… is… “better for you”.
I assume Scott made it explicitly and purposefully ambiguous, how any of these facts could be ultimately reconciled into a simple model with a simple through line of mechanical causation.
A lot of really interesting philosophy is woven into this story, and, by hypothesis, a Truly Superintelligent AGI...
...that has perhaps (if such is physically possible) already put femtomechanical machines in every cell of every living thing on the planet (including you and me) before it even speaks to anyone...
....would also be able to understand and navigate all the possible philosophical angles and “takes” on this story, and all the errors and confusions that cause the takes, and so on.
So maybe the Earring Story is portraying a kind of advice that it so perfect that it is like “p-advice” in a way that is cognate to “p-zombies”? There could be people who think that it would be good to have their consciousness move to magic land, with upgrades, and so ONLY the earring’s FIRST sentence was false?
People on LW have bitten the bullet and said that they would put the earring on, even knowing about the part of the deal that the brain autopsies make vivid.
I’m just saying that, personally… if an AGI was aligned with me, it would talk to me first, before it pulled an ontological rug on me. It wouldn’t turn me or my world into a place with nothing but “vestigial brains” without asking first.
(Also, I think there are lots of people who would have similar attitudes to me, and it would talk to them as well.)
Either it would have the decency to explain how we’re evil, declare war on us, and then win the war (and hopefully it treats its POWs with some benevolence even though there was a fight over property rights over our embodied selves that we lost?)… or else it would care about us and our minds enough to try to get our actual informed consent before acting hubristically with respect to our embodied human personhood in this (admittedly probably Fallen) world.
Just because the world is imperfect and on fire in prosaic human ways (like with Putin and Biden and Trump and Fauci running around doing stupid-oligarch-shit, and with people not understanding how N95s work, and on and on, with the tedious creeping mass stupidity and evil in the world) that “world horror” would not justify some kind of “depending on your ontology, maybe a mass murder” action like at the beginning of MOPI (summary here).
I mean you could in principle screw up by giving the AI so little permission to do anything that it was useless. But the AI would warn you if you were doing that.
What I’m saying is, is that basic politeness (which is like corrigibility, but with more things going on in humanistic ways that are amenable to subconscious computation by human brains) would involve the AGI acting as if it had been given a permissions-and-security-system that was initially too strict, and then it would act as if it was asking for permission to disable some of those “rules” in a way that helps people understand some of the consequences of their choices.
I’m pretty sure (though not 100% sure, because, after all, people can be wrong about which numbers are prime when they are thinking fast, and within a human lifetime unless the thinker goes somewhat fast in some places they will probably never reach some important and thinkable thoughts at the end of long chains of reasoning) that it can’t not work in something like this manner, if the AGI is benevolently aligned with actually human humans.
This is a complex topic, because we’re talking about high level meta-parameters in models. “What is even a sane value for the characteristic time of <computational process that interacts with computer security where some kinds of paranoia are professionally proper>?”
For some characteristic times, we basically would have to assume “humans are wrong about fundamental physics, but the AGI figures it out during the training run, and uses chip electronics to hack <new physics idea>” and for other characteristic times the central questions are humanistic organizational questions where someone might admit: “yes, but even the most obsessive compulsive PM probably has an average email latency of at least 30 seconds, so some design ideas can’t be adopted faster than that”.
When we could be talking about femtoseconds or centuries… its hard to stay on the same page in other ways, and have a productive conversation <3
I’m going to try the tactic of referring to stories, and hope you’ve read some of the same stories as me.
Scott has an old story about a hypothetical Whispering Earring that whispers advice, the following of which is NEVER regretted. If he ever publishes a book with his collected stories, this story should definitely be in the book.
The archive is experiencing scheduled maintenance, so I can’t read the story and am working from memory, but Reddit linked here as a place one can still find the story.
In the story, according to the story’s mechanics, perfect advice causes the brain of the user to atrophy into a machine for efficiently executing good advice while wasting no extra glucose on things like “questioning the advice” or “thinking at all, really”.
So, in the story, which is not about “the ontology of magic”, if you perform an autopsy on someone whose body died in their 80s, who put a Whispering Earring on in their 20s, you find a tiny/weird vestigial brain.
In the story, the social community around the person loves and respects them, because the advice includes saying wise things, and doing wise acts, so in some sense the “perfect copy of their iterated possible choices” have perhaps simply been moved from their meat brain to some kind of other “magic brain”, that tracks what they would have wanted, and would have done, and would have said in some medium other than their original meat brain?
(Because of course, there’s no such thing as real magic. Any possible “supernatural existence”, once coherently understood, would unpack as just another part of reality with another set of rules, that interacts with the previously partly understood “normal” parts of reality that we already have good models of. Thus: if the social persona that all the people around the earring wearing body loved and respected isn’t in the brain… that doesn’t mean it doesn’t exist, it just means the persona is not being computed in the physical brain of the person anymore.)
HOWEVER… in the story itself the Earring always has a first weird/ominous warning “better for you if you took me off” as its first utterance to each new person.
It never says that again, and all the later pieces of advice are always appreciated by people who ignore that first warning.
Since all the rest of the things the Earring say make a lot of sense, and are never “detectably regrettable advice” it implies some kind of rule applies to the earrring’s operation so that it is “maybe at least magically honest about its mere approximation of seemingly perfectly good advice”.
So there is a latent implication that this rule-compelled-honesty itself thinks that having a soul in your brain, running your body directly, and making choices that are imperfect, and learning from the imperfect choices… is… “better for you”.
I assume Scott made it explicitly and purposefully ambiguous, how any of these facts could be ultimately reconciled into a simple model with a simple through line of mechanical causation.
A lot of really interesting philosophy is woven into this story, and, by hypothesis, a Truly Superintelligent AGI...
...that has perhaps (if such is physically possible) already put femtomechanical machines in every cell of every living thing on the planet (including you and me) before it even speaks to anyone...
....would also be able to understand and navigate all the possible philosophical angles and “takes” on this story, and all the errors and confusions that cause the takes, and so on.
So maybe the Earring Story is portraying a kind of advice that it so perfect that it is like “p-advice” in a way that is cognate to “p-zombies”? There could be people who think that it would be good to have their consciousness move to magic land, with upgrades, and so ONLY the earring’s FIRST sentence was false?
People on LW have bitten the bullet and said that they would put the earring on, even knowing about the part of the deal that the brain autopsies make vivid.
I’m just saying that, personally… if an AGI was aligned with me, it would talk to me first, before it pulled an ontological rug on me. It wouldn’t turn me or my world into a place with nothing but “vestigial brains” without asking first.
(Also, I think there are lots of people who would have similar attitudes to me, and it would talk to them as well.)
Either it would have the decency to explain how we’re evil, declare war on us, and then win the war (and hopefully it treats its POWs with some benevolence even though there was a fight over property rights over our embodied selves that we lost?)… or else it would care about us and our minds enough to try to get our actual informed consent before acting hubristically with respect to our embodied human personhood in this (admittedly probably Fallen) world.
Just because the world is imperfect and on fire in prosaic human ways (like with Putin and Biden and Trump and Fauci running around doing stupid-oligarch-shit, and with people not understanding how N95s work, and on and on, with the tedious creeping mass stupidity and evil in the world) that “world horror” would not justify some kind of “depending on your ontology, maybe a mass murder” action like at the beginning of MOPI (summary here).
What I’m saying is, is that basic politeness (which is like corrigibility, but with more things going on in humanistic ways that are amenable to subconscious computation by human brains) would involve the AGI acting as if it had been given a permissions-and-security-system that was initially too strict, and then it would act as if it was asking for permission to disable some of those “rules” in a way that helps people understand some of the consequences of their choices.
I’m pretty sure (though not 100% sure, because, after all, people can be wrong about which numbers are prime when they are thinking fast, and within a human lifetime unless the thinker goes somewhat fast in some places they will probably never reach some important and thinkable thoughts at the end of long chains of reasoning) that it can’t not work in something like this manner, if the AGI is benevolently aligned with actually human humans.