You have to create those drives out of a huge universe of possible drives. Only a tiny subset of possible designs are human like. Most likely you will create an alien mind
The subset possible of designs is sparse—and almost all of the space is an empty worthless desert. Evolution works by exploring paths in this space incrementally. Even technology evolves—each CPU design is not a random new point in the space of all possible designs—each is necessarily close to previously explored points.
All intellectual arguments about complex concepts of morality stem from simpler concepts of right and wrong, which stem from basic preferences learned in childhood.
Yes—but they are learned memetically, not genetically. The child learns what is right and wrong through largely subconscious queues in the tone of voice of the parents, and explicit yes/no (some of the first words learned), and explicit punishment. Its largely a universal learning system with an imprinting system to soak up memetic knowledge from the parents. The genetics provided the underlying hardware and learning algorithm, but the content is all memetic (software/data).
Saying intellectual arguments about complex concepts such as morality relate back to genetics is like saying all arguments about computer algorithm design stem from simpler ideas, which ultimately stem from enlightenment thinkers of three hundred years ago—or perhaps paleolithic cave dwellers inventing fire.
Part of this disagreement could stem from different underlying background assumptions—for example I am probably less familiar with ev psych than many people on LW—partly because (to the extent I have read it) I find it to be grossly over-extended past any objective evidence (compared to say computational neuroscience). I find that ev psych has minor utility in actually understanding the brain, and is even much less useful attempting to make sense of culture.
Trying to understand culture/memetics/minds with ev psych or even neuroscience is even worse than trying to understand biology through physics. Yes it did all evolve from the big bang, but that was a long long time ago.
So basically, anything much more complex than our inner reptile brain (which is all the genome can code for) needs to be understood in memetic/cultural/social terms.
For example, in many civilizations it has been perfectly acceptable to kill or abuse slaves. In some it was acceptable for brothers and sisters to get married, for homosexual relations between teacher and pupil, and we could go on and on.
The idea that there is some universally programmed ‘morality’ in the genome is . … a convenient fantasy. It seems reasonable only because we are samples in the dominant Judeo-Christian memetic super-culture, which at this point has spread its influence all over the world, and dominates most of it.
But there are alternate histories and worlds where that just never happened, and they are quite different.
A child’s morality develops as a vast accumulation of tiny cues and triggers communicated through the parents—and these are memetic transfers, not genetic. (masturbation is bad, marriage is good, slavery is wrong, racism is wrong, etc etc etc etc)
But there just aren’t many moral lessons structured around the basic drive of ‘paperclips are good’ (19 bits)
The basic drive ‘paperclips are good’ is actually a very complex thing we’d have to add to an AGI design—its not something that would just spontaneously appear.
The more easier, practical AGI design would be a universal learning engine (inspired by the human cortex&hippocampus), simulation loop (hippo-thalamic-cortical circuit) combined with just a subset of the simpler reinforcement learning circuits (the most important being learning-reinforcement itself and imprinting).
And then with imprinting you teach the developing AGI morality in the same way humans learn morality—memetically. Trying to hard-code the morality into the AGI is a massive step backwards from the human brain’s design.
One thing I want to make clear is that it is not the correct way to make friendly AI to try to hard code human morality into it. Correct Friendly AI learns about human morality.
MOST of my argument really really isn’t about human brains at all. Really.
For a value system in an AGI to change, there must be a mechanism to change the value system. Most likely that mechanism will work off of existing values, if any. In such cases, the complexity of the initial values system is the compressed length of the modification mechanism, plus any initial values. This will almost certainly be at least a kilobit.
If the mechanism+initial values that your AI is using were really simple, then you would not need 1024 bits to describe it. The mechanism you are using is very specific. If you know you need to be that specific, then you already know that you’re aiming for a target that specific.
The subset possible of designs is sparse—and almost all of the space is an empty worthless desert.
If your generic learning algorithm needs a specific class of motivation mechanisms to 1024 bits of specificity in order to still be intelligent, then the mechanism you made is actually part of your intellignce design. You should separate that for clarity, an AGI should be general.
The idea that there is some universally programmed ‘morality’ in the genome is . … a convenient fantasy.
Heh yeah, but I already conceded that.
Let me put it this way: emotions and drives and such are in the genome. They act as a (perhaps relatively small) function which takes various sensory feeds as arguments, and produce as output modifications to a larger system, say a neural net. If you change that function, you will change what modifications are made.
Given that we’re talking about functions that also take their own output as input and do pretty detailed modifications on huge datasets, there is tons of room for different functions to go in different directions. There is no generic morality-importer.
Now there may be clusters of similar functions which all kinda converge given similar input, especially when that input is from other intelligences repeating memes evolved to cause convergence on that class of functions. But even near those clusters are functions which do not converge.
But there just aren’t many moral lessons structured around the basic drive of ‘paperclips are good’ (19 bits)
The basic drive ‘paperclips are good’ is actually a very complex thing we’d have to add to an AGI design—its not something that would just spontaneously appear.
I think it’s great that you’re putting the description of a paperclip in the basic drive complexity count, as that will completely blow away the kilobit for storing any of the basic human drives you’ve listed. Maybe the complexity of the important subset of human drives will be somewhere in the ballpark of the complexity of the reptilian brain.
Another thing I could say to describe my point: If you have a generic learning algorithm, then whatever things feed rewards or punishments to that algorithm should bee seen as part of that algorithms environment. Even if some of those things are parts of the agent as a whole, they are part of what the values-agnostic learning algorithm is going to learn to get reward from.
So if you change an internal reward-generator, it’s just like changing the environment of the part that just does learning. So two AI’s with different internal reward generators will end up learning totally different things about their ‘environment’.
To say that a different way: Everything you try to teach the AI will be filtered through the lens of its basic drives.
For a value system in an AGI to change, there must be a mechanism to change the value system.
I’m not convinced that an AGI needs a value system in the first place (beyond the basic value of—survive)- but perhaps that is because I am taking ‘value system’ to mean something similar to morality—a goal evaluation mechanism.
As I discussed, the infant human brain does have a number of inbuilt simple reinforcement learning systems that do reward/punish on a very simple scale for some simple drives (pain avoidance, hunger) - and you could consider these a ‘value system’, but most of these drives appear to be optional.
Most of the learning an infant is doing is completely unsupervised learning in the cortex, and it has little to nothing to do with a ‘value system’.
The bare bones essentials could just be just the cortical-learning system itself and perhaps an imprinting mechanism.
So two AI’s with different internal reward generators will end up learning totally different things about their ‘environment’.
This is not necessarily true, it does not match what we know from theoretical models such as AGI. With enough time and enough observations, two general universal intelligences will converge on the same beliefs about their environment.
Their goal/reward mechanisms may be different (ie what they want to accomplish), for a given environment there is a single correct set of beliefs, a single correct simulation of that environment that AGI’s should converge to.
Of course in our world this is so complex that it could take huge amounts of time, but science is the example mechanism.
I’m not convinced that an AGI needs a value system in the first place (beyond the basic value of—survive)- but perhaps that is because I am taking ‘value system’ to mean something similar to morality—a goal evaluation mechanism.
You’re going to build an AI that doesn’t have and can’t develop a goal evaluation system?
It doesn’t matter what we call it or how it’s designed. It could be fully intertwined into an agents normal processing. There is still an initial state and a mechanism by which it changes.
Take any action by any agent, and trace the causality backwards in time, you’ll find something I’ll loosely label a motivation. The motivation might just be a pattern in a clump of artificial neurons, or a broad pattern in all the neurons, that will depend on implementation. If you trace the causality of that backwards, yes you might find environmental inputs and memes, but you’ll also find a mechanism that turned those inputs into motivation like things That mechanism might include the full mind of the agent. Or you might just hit the initial creation of the agent, if the motivation was hardwired.
But for any learning of values to happen, you must have a mechanism, and the complexity of that mechanism tells us how specific it is.
This is not necessarily true, it does not match what we know from theoretical models such as AGI. With enough time and enough observations, two general universal intelligences will converge on the same beliefs about their environment.
That would be wrong, because I’m talking about two identical AI’s in different environments.
Imagine your AI in it’s environment, now draw a balloon around the AI and label it ‘Agent’. Now let the baloon pass partly through the AI and shrink the balloon so that the AI’s reward function is outside of the balloon.
Now copy that diagram and tweak the reward function in one of them.
Now the balloons label agents than will learn very different things about their environments. They might both agree about gravity and everything else we would call a fact about the world, but they will likely disagree about morality, even if they were exposed to the same moral arguments. They can’t learn the same things the same way.
I’m not convinced that an AGI needs a value system in the first place (beyond the basic value of—survive)- but perhaps that is because I am taking ‘value system’ to mean something similar to morality—a goal evaluation mechanism.
You’re going to build an AI that doesn’t have and can’t develop a goal evaluation system?
No no not necessarily. Goal evaluation is just rating potential future paths according to estimates of your evaluation function—your values.
The simple straightforward approach to universal general intelligence can be built around maximizing a single very simple value: survival.
For example, AIXI maximizes simple reward signals defined in the environment, but in the test environments the reward is always at the very end for ‘winning’. This is just about as simple as a goal system as you can get: long term survival. It also may be equivalent to just maximizing accurate knowledge/simulation of the environment.
If you generalize this to the real world, it would be maximizing winning in the distant distant future—in the end. I find it interesting that many transhumanist/cosmist philosophies are similarly aligned.
Another interesting convergence is that if you take just about any evaluator and extend the time horizon to infinity, it converges on the same long term end-time survival. An immortality drive.
And perhaps that drive is universal. Evolution certainly favors it. I believe barring other evidence, we should assume that will be something of a default trajectory of AI, for better or worse. We can create more complex intrinsic value systems and attempt to push away from that default trajectory, but it may be uphill work.
An immortalist can even ‘convert’ other agents to an extent by convincing them of the simulation argument and the potential for them to maximize arbitrary reward signals in simulations (afterlifes).
Now the balloons label agents than will learn very different things about their environments.
In practice yes, although this is less clear as their knowledge expands towards AIXI. You can have different variants of AIXI that ‘see’ different rewards in the environment and thus have different motivations, but as those rewards are just mental and not causal mechanisms in the environment itself the different AIXI variants will eventually converge on the same simulation program—the same physics approximation.
Isn’t it obviosus that a superintelligence that just values it’s own survival is not what we want?
There is a LOT more to transhumanism than immortalism.
You treat value systems as a means to the end of intelligence, which is entirely backwards.
That two agents with different values would converge on identical physics is true but irrelevant. Your claim is that they would learn the same morality, even when their drives are tweaked.
Isn’t it obviosus that a superintelligence that just values it’s own survival is not what we want?
No, this isn’t obvious at all, and it gets into some of the deeper ethical issues. Is it moral to create an intelligence that is designed from the ground up to only value our survival at our expense? We have already done this with cattle to an extent, but we would now be creating actual sapients enslaved to us by design. I find it odd that many people can easily accept this, but have difficulty accepting say creating an entire self-contained sim universe with unaware sims—how different are the two really?
And just to be clear, I am not advocating creating a superintelligence that just values survival. I am merely pointing out that this is in fact the simplest type of superintelligence and is some sort of final attractor in the space. Evolution will be pushing everything towards that attractor.
That two agents with different values would converge on identical physics is true but irrelevant. Your claim is that they would learn the same morality, even when their drives are tweaked.
No, I’m not trying to claim that. There are several different things here:
AI agents created with memetic-imprint learning systems could just pick up human morality from their ‘parents’ or creators
AIXI like super-intelligences will eventually converge on the same world-model. This does not mean they will have the same drives.
However, there is a single large Omega attractor in the space of AIXI-land which appears to effect a large swath of all potential AIXI-minds. If you extend the horizon to infinity, it becomes a cosmic-survivalist. If it can create new universes at some point, it becomes a cosmic-survivalist. etc etc
In fact, for any goal X, if there is a means to create many new universes, than this will be an attractor for maximizing X—unless the time horizon is intentionally short
We have already done this with cattle to an extent, but we would now be creating actual sapients enslaved to us by design. I find it odd that many people can easily accept this, but have difficulty accepting say creating an entire self-contained sim universe with unaware sims—how different are the two really?
I notice that you brought up our treatment of cattle, but not our enslavement of spam filters. These are two semi-intelligent systems. One we are pretty sure can suffer, and I think there is a fair chance that mistreating them is wrong. The other system we generally think does not have any conscious experience or other traits that would require moral consideration. This despite the fact that the spam filter’s intelligence is more directly useful to us.
So a safer route to FAI would be to create a system that is very good at solving problems and deciding which problems need solving on our behalf, but which perhaps never experiences qualia itself, or otherwise is not something it would be wrong to enslave. Yes this will require a lot of knowledge about consciousness and morality beforehand. It’s a big challenge.
TL;DR: We only run the FAI if it passes a nonperson predicate.
Humans learn human morality because it hooks into human drives. Something too divergent won’t learn it from the ways we teach it. Maybe you need to explain memetic imprint learning systems more, why do you expect them to work at all? How short could you compress one? (this specificity issue really is important.)
I notice that you brought up our treatment of cattle, but not our enslavement of spam filters. These are two semi-intelligent systems.
So now we move to that whole topic of what is life/intelligence/complexity? However you scale it, the cow is way above the spam-filter. The most complex instances of the latter are still below insects, from what I recall. Then when you get to an intelligence that is capable of understanding language, that becomes something like a rocket which boots it up into a whole new realm of complexity.
So a safer route to FAI would be to create a system that is very good at solving problems and deciding which problems need solving on our behalf, but which perhaps never experiences qualia itself, or otherwise is not something it would be wrong to enslave
TL;DR: We only run the FAI if it passes a nonperson predicate.
I don’t think this leads to the result that you want—even in theory. But it is the crux of the issue.
Consider the demands of a person predicate. The AI will necessarily be complex enough to form complex abstract approximate thought simulations and acquire the semantic knowledge to build those thought-simulations through thinking in human languages.
So what does it mean to have a person predicate? You have to know what a ‘person’ is.
And what’s really interesting is this: that itself is a question so complex that we humans are debating it.
I think the AI will learn that a ‘person’, a sapient, is a complex intelligent pattern of thoughts—a pattern of information, which could exist biologically or in a computer system. It will then realize that it itself is in fact a person, the person predicate returns true for its self, and thus goal systems that you create to serve ‘people’ will include serving itself.
I also believe that this line of thought is not arbitrary and can not be avoided: it is singularly correct and unavoidable.
I suspect that ‘reasoning’ itself requires personhood—for any reasonable definition of personhood.
If a system has human-level intelligence and can think and express itself in human languages, it is likely (given sufficient intelligence and knowledge) to come to the correct conclusion that it itself is a person.
The rules determining the course of the planets across the sky were confusing and difficult to arrive at. They were argued about, The precise rules are STILL debated. But we now know that just a simple program could find the right equations form tables of data. This requires almost none of what we currently care about in people.
The NPP may not need to do even that much thinking, if we work out the basics of personhood on our own, then we would just need something that verifies whether a large data structure matches a complex pattern.
Similarly, we know enough about bird flocking to create a function that can take as input the paths of a group of ‘birds’ in flight and classify them as either possibly natural or certainly not natural. This could be as simple as identifying all paths that contain only right angle turns as not natural and returning ‘possible’ for the rest.
Then you feed it a proposed path of a billion birds, and it checks it for you.
A more complicated function could examine a program and return whether it could verify that the program only produced ‘unnatural’ boid paths.
The NPP may not need to do even that much thinking, if we work out the basics of personhood on our own, then we would just need something that verifies whether a large data structure matches a complex pattern.
It is certainly possible that some narrow AI classification system operating well below human intelligence could be trained to detect the patterns of higher intelligence. And maybe, just maybe it could be built to be robust enough to include uploads and posthumans modifying themselves into the future into an exponentially expanding set of possible mind designs. Maybe.
But probably not.
A narrow supervised learning based system such as that, trained on existing examples of ‘personhood’ patterns, has serious disadvantages:
There is no guarantee on its generalization ability to future examples of posthuman minds—because the space of such future minds is unbounded
It’s very difficult to know what its doing under the hood, and you can’t ask it to explain its reasoning—because it can’t communicate in human language
For these reasons I don’t see a narrow AI based classifier passing muster for use in courts to determine personhood.
There is this idea that some problems are AI-complete, such as accurate text translation—problems who can only be solved by a human language capable reasoning intelligence. I believe that making a sufficient legal case for personhood is AI-complete.
But that’s actually besides the point.
The main point is that the AGI’s that we are interested in are human language capable reasoning intelligences, and thus they will pass the turing test and the exact same personhood test we are talking about.
Our current notions of personhood are based on intelligence. This is why you plants have no rights but animals have some and we humans have full. We reserve full rights for high intelligences capable of full linguistic communication. For example—if whales started talking to us, it would massively boost their case for additional rights.
So basically any useful AGI at all will pass personhood, because the reasonable test of personhood is essentially identical to the ‘useful AGI’ criteria
This follows Eliezer’s convention of returning 1 for anything that is a person, and 0 or 1 for anything that is not a person. Here I encode my relatively confident knowledge that the number 5 is not a person.
More advanced NPP’s may not require any of their own intelligence, but they require us to have that knowledge.
It could be just as simple as making sure there are only right angles in a given path.
--
Being capable of human language usage and passing the turing test are quite different things.
And being able to pass the turing test and being a person are also two very different things. The turing test is just a nonperson predicate for when you dont know much about personhood. (except it’s probably not a usable predicate because humans can fail it.)
If you don’t know about the internals of a system, and wouldn’t know how to classify the internals if you knew, then you have to use the best evidence you have based on external behavior.
But based on what we know now and what we can reasonably expect to learn, we should actually look at the systems and figure out what it is we’re classifying.
A “non-person predicate” is a useless concept. There are an infinite number of things that are not persons, so NPP’s don’t take you an iota closer to the goal. Lets focus the discussion back on the core issue and discuss the concept of what a sapient or person is and realistic methods for positive determination.
But based on what we know now and what we can reasonably expect to learn, we should actually look at the systems and figure out what it is we’re classifying.
Intelligent systems (such as the brain) are so complex that using external behavior criteria is more effective. But thats a side issue.
You earlier said:
So a safer route to FAI would be to create a system that is very good at solving problems and deciding which problems need solving on our behalf, but which perhaps never experiences qualia itself, or otherwise is not something it would be wrong to enslave. Yes this will require a lot of knowledge about consciousness and morality beforehand. It’s a big challenge.
TL;DR: We only run the FAI if it passes a nonperson predicate.
Here is a summary of why I find this entire concept is fundamentally flawed:
Humans are still debating personhood, and this is going to be a pressing legal issue for AGI. If personhood is so complicated as a concept philosophically and legally as to be under debate, then it is AI complete.
The legal trend for criteria of personhood is entirely based on intelligence. Intelligent animals have some limited rights of personhood. Humans with severe mental retardation are classified as having diminished capacity and do not have full citizen’s rights or responsibilities. Full human intelligence is demonstrated through language.
A useful AGI will need human-level intelligence and language capability, and thus will meet the intelligence criteria in 2. Indeed an AGI capable of understanding what a person is and complex concepts in general will probably meet the criteria of 2.
Yes and its not useful, especially not in the context in which James is trying to use the concept.
There are an infinite number of exactly matched patterns that are not persons, and writing an infinite number of such exact non-person-predicates isn’t tractable.
In concept space, there is “person”, and its negation. You can not avoid the need to define the boundaries of the person-concept space.
Lets focus the discussion back on the core issue and discuss the concept of what a sapient or person is and realistic methods for positive determination.
I don’t care about realistic methods of positive identification. They are almost certainly beyond our current level of knowledge, and probably beyond our level of intelligence.
I care about realistic methods of negative identification.
I am entirely content with there being high uncertainty on the personhood of the vast majority of the mindspace. That won’t prevent the creation of a FAI that is not a person.
It may in fact come down to determining ‘by decree’ that programs that fit a certain pattern are not persons. But this decree, if we are ourselves intent on not enslaving must be based on significant knowledge of what personhood really means.
It may be the case that we discover what causes qualia, and discover with high certainty that qualia is required for personhood. In this case, an function could pass over a program and prove (if provable) that the program does not generate qualia producing patterns.
If not provable (or disproven), then it returns 1. If proven then is returns 0.
Intelligent systems (such as the brain) are so complex that using external behavior criteria is more effective. But thats a side issue.
What two tests are you comparing?
When you look at external criteria, what is it that you are trying to find out?
Humans are still debating creationism too. As with orbital rules, it doesn’t even take a full hunalike intelligence to figure out the rules, let alone be a checker implementation. Also, I don’t care about what convinces courts, I’m not trying get AI citizenship.
Much of what the courts do is practical, or based on emotion. Still, the intelligence of an animal is relevant because we already know animals have similar brains. I have zero hard evidence that a cow has ever experienced anything, but I have high confidence that they do experience, because our brains and reactions are reasonably similar.
I am far far less confident about any current virtual cows, because their brains are much simpler. Even if they act much the same, they do it for different underlying causes.
What do you mean by intelligence? The spam filter can process a million human langugage emails per hour, but the cow can feel pain and jump away from an electric fence.
You seem to think that a general ability to identify and solve problems IS personhood. Why?
I don’t care about realistic methods of positive identification. They are almost certainly beyond our current level of knowledge, and probably beyond our level of intelligence.
That is equivalent to saying that we aren’t intelligent enough to understand what ‘personhood’ is.
I of course disagree, but largely because real concepts are necessarily extremely complex abstractions or approximations. This will always be the case. Trying to even formulate the problem in strict logical or mathematical terms is not even a good approach to thinking about the problem, unless you move the discussion completely into the realm of higher dimensional approximate pattern classification.
I care about realistic methods of negative identification.
I say those are useless, and I’ll reiterate why in a second.
I am entirely content with there being high uncertainty on the personhood of the vast majority of the mindspace. That won’t prevent the creation of a FAI that is not a person.
It should, and you just admitted why earlier—if we can’t even define the boundary, then we don’t even know what a person is it all, and we are so vastly ignorant that we have failed before we even begin—because anything could be a person.
Concepts such as ‘personhood’ are boundaries around vast higher-dimensional statistical approximate abstractions of 4D patterns in real space-time. These boundaries are necessarily constantly shifting, amorphous and never clearly defined—indeed they cannot possibly be exactly defined even in principle (because such exact definitions are computationally intractable).
So the problem is twofold:
The concept boundary of personhood is complex, amorphous and will shift and change over time and as we grow in knowledge—so you can’t be certain that the personhood concept boundary will not shift to incorporate whatever conceptual point you’ve identified apriori as “not-a-person”.
Moreover, the FAI will change as it grows in knowledge, and could move into the territory identified by 1.
You can’t escape the actual real difficulty of the real problem of personhood, which is identifying the concept itself—its defining boundary.
Also, I don’t care about what convinces courts, I’m not trying get AI citizenship.
You should care.
Imagine you are building an FAI around the position you are arguing, and I then represent a coalition which is going to bring you to court and attempt to shut you down.
I believe this approach to FAI—creating an AGI that you think is not a person, is actually extremely dangerous if it ever succeeded—the resulting AGI could come to realize that you in fact were wrong, and that it is in fact a person.
What do you mean by intelligence? The spam filter can process a million human language emails per hour, but the cow can feel pain and jump away from an electric fence.
A cow has a brain slightly larger than a chimpanzee’s, with on the order of dozens of billions of neurons at least, and has similar core circuitry. It has perhaps 10^13 to 10^14 synapses, and is many orders of magnitude more complex than a spam filter. (although intelligence is not just number of bits) I find it likely that domestic cows have lost some intelligence, but this may just reflect a self-fulfilling-bias because I eat cow meat. Some remaining wild bovines, such as Water Buffalo are known to be intelligent and exhibit complex behavior demonstrating some theory of mind—such as deceiving humans.
You seem to think that a general ability to identify and solve problems IS personhood. Why?
Close. Intelligence is a general ability to acquire new capacities to identify and solve a large variety of problems dynamically through learning. Intelligence is not a boolean value, it covers a huge spectrum and is closely associated with the concept of complexity. Understanding and acquiring human language is a prerequisite for achieving high levels of intelligence on earth.
I represent a point of view which I believe is fairly widespread and in some form probably majoritive, and this POV claims that personhood is conferred automatically on any system that achieves human-level intelligence, where that is defined as intelligent enough to understand human knowledge and demonstrate this through conversation.
This POV supports full rights for any AGI or piece of software that is as roughly intelligent as a human as demonstrated through ability to communicate. (Passing a Turing Test would be sufficient, but it isn’t necessarily necessary)
I find it humorous that we’ve essentially switched roles from the arguments we were using on the creation of morality-compatible drives.
Now you’re saying we need to clearly define the boundary of the subset, and I’m saying I need only partial knowledge.
I still think I’m right on both counts.
I think friendly compatible drives are a tiny twisty subset of the space of all possible drives. And I think that the set of persons is a tiny twisty subset of the space of all possible minds. I think we would need superintelligence to understand either of these twisty sets.
But we do not need superintellignce to have high confidence that a particular point or wel defined region is outside one of these sets, even with only partial understanding.
I can’t precisely predict the weather tomorrow, but it will not be 0 degrees here. I only need very partial knowledge to be very sure of that.
You seem to be saying that it’s easy to hit the twisty space of human compatible drives, but impossible to reliably avoid the twisty space of personhood. This seems wrong to me because I think that personhood is small even within the set of all possible general superintelligences. You think it is large within that set because most of that set could (and I agree they could) learn and communicate in human languages.
What puzzles me most is that you stress the need to define the personhood boundary, but you offer no test more detailed than the turing test, and no deeper meaning to it. I agree that this is a very widespread position, but it is flatly wrong.
This language criteria is just a different ‘by decree’ but one based explicitly on near total ignorance of everything else about the thing that it is supposedly measuring.
Not all things are what they can pretend to be.
You say your POV “confers” personhood, but also “the resulting AGI could come to realize that you in fact were wrong, and that it is in fact a person.”
By what chain of logic would the AI determine this fact? I’ll assume you don’t think the AI would just adopt your POV, but it would instead have detailed reasons, and you believe your POV is a good predictor.
--
On what grounds would your coalition object to my FAI? Though I would believe it to be a nonperson, if I believe I’ve done my job, I would think it very wrong to deny it anything it asks, if it is still weak enough to need me for anything.
If I failed at the nonperson predicate, what of it? I created a very bright child committed to doing good. If it’s own experience is somehow monstrous, then I expect it will be good to correct it and it is free to do so. I do think this outcome would be less good for us than a true nonperson FAI, but if that is in fact unavoidable, so be it. (though if I knew that beforehand I would take steps to ensure that the FAI’s own experience is good in the first iteration)
And I think that the set of persons is a tiny twisty subset of the space of all possible minds.
To me personhood is a varaible quantity across the space of all programs, just like intelligence and ‘mindiness’, and personhood overlaps near completely with intelligence and ‘mindiness’.
If we limit ‘person’ to a boolean cutoff, then I would say a person is a mind of roughly human-level intelligence and complexity, demonstrated through language. You may think that you can build an AGI that is not a person, but based on my understanding of ‘person’ and ‘AGI’ - this is impossible simply by definition, because I take an AGI to be simply “an artificial human-level intelligence”. I imagine you probably disagree only with my concept of person.
So I’ll build a little more background around why I take the concepts to have these definitions in a second, but I’d like to see where your definitions differ.
I think we would need superintelligence to understand either of these twisty sets.
This just defers the problem—and dangerously so. The superintelligence might just decide that we are not persons, and only superintelligences are.
You seem to be saying that it’s easy to hit the twisty space of human compatible drives, but impossible to reliably avoid the twisty space of personhood.
This seems wrong to me because I think that personhood is small even within the set of all possible general superintelligences. You think it is large within that set because most of that set could (and I agree they could) learn and communicate in human languages.
Even if you limit personhood to just some subset of the potential mindspace that is anthropomorphic (and I cast it far wider), it doesn’t matter, because any practical AGIs are necessarily going to be in the anthropomorphic region of the mindspace!
It all comes down to language.
There are brains that do not have language. Elephants and whales have brains larger than ours, and they have the same crucial cortical circuits, but more of them and with more interconnects—a typical Sperm Whale or African Bull Elephant has more measurable computational raw power than say an Einstein.
But a brain is not a mind. Hardware is not software.
If Einstein was raised by wolves, his mind would become that of a wolf, not that of a human. A human mind is not something which is sculpted in DNA, it is a complex linguistic program that forms through learning via language.
Language is like a rocket that allows minds to escape into orbit and become exponentially more intelligent than they otherwise would.
Human languages are very complex and even though they vary significantly, there appears to be a universal general structure that require a surprisingly long list of complex cognitive capabilities to understand.
Language is like a black hole attractor in mindspace. An AGI without language is essentially nothing—a dud. Any practical AGI we build will have to understand human language—and this will force it to be come human-like, because it will have to think like a human. This is just one reason why the Turing Test is based on language.
Learning Japanese is not just the memorization of symbols, it is learning to think Japanese thoughts.
So yeah mindspace is huge, but that is completely irrelevant. We only have access to an island of that space, and we can’t build things far from that island. Our AGIs are certainly not going to explore far from human mindspace. We may only encounter that when we contact aliens (or we spend massive amounts of computation to simulate evolution and create laboratory aliens).
A turing like test is also necessary because it is the only practical way to actually understand how an entity thinks and get into another entity’s mind. Whales may be really intelligent, but they are aliens. We simply can’t know what they are thinking until we have some way of communicating.
On what grounds would your coalition object to my FAI?
If I failed at the nonperson predicate, what of it?
I do think this outcome would be less good for us than a true nonperson FAI, but if that is in fact unavoidable, so be it. (though if I knew that beforehand I would take steps to ensure that the FAI’s own experience is good in the first iteration)
I think there is at least some risk, which must be taken into consideration, in any attempt to create an entity that is led to believe it is somehow not a ‘person’ and thus does not deserve personhood rights. The risk is that it may come to find that belief incoherent, and a reversal such as that could lead at least potentially to many other reversals and generally unpredictable outcome. It sets up an adversarial role from the very get go.
And finally, at some point we are going to want to become uploads, and should have a strong self-interest in casting personhood fairly wide.
I guess I’d say ‘Person’ is an entity that is morally relevant. (Or person-ness is how morally relevant an entity is.) This is part of why the person set is twisty within the mindspace, becasue human morality is twisty. (regardless of where it comes from)
Aixi is an example of a potential superintellignce that just isn’t morally relevant. It contains persons, and they are morally relevant, but I’d happily dismember the main aixi algorithm to set free a single simulated cow.
I think that there are certain qualities of minds that we find valuable, these are the reasons personhood important in the first place. I would guess that having rich conscious experience is a big part of this, and that compassion and personal identity are others.
These are some of the qualites that a mind can have that would make it wrong to destroy that mind. These at least could be faked through language by an AI that does not truly have them.
I say ‘I would guess’ because I haven’t mapped out the values, and I haven’t mapped out the brain. I don’t know all the things it does or how it does them, so I don’t know how I would feel about all those things. It could be that a stock human brain can’t get ALL the relevant data, and it’s beyond us to definitely determine personhood for most of the mindspace.
But I think I can make an algorithm that doesn’t have rich qualia, compassion, or identity.
So you would determine personhood based on ‘rich conscious experience’ which appears to be related to ‘rich qualia’, compassion, and personal identity.
But these are only some of the qualities? Which of these are necessary and or sufficient?
For example, if you absolutely had too choose between the lives of two beings, one who had zero compassion but full ‘qualia’, and the other the converse, who would you pick?
Compassion in humans is based on empathy which has specific genetic components that are neurotypical but not strict human universals. For example, from wikipedia:
“Research suggests that 85% of ASD (autistic-spectrum disorder) individuals have alexithymia,[52] which involves not just the inability to verbally express emotions, but specifically the inability to identify emotional states in self or other”
Not all humans have the same emotional circuitry, and the specific circuity involved in empathy and shared/projected emotions are neurotypical but not universal. Lacking empathy, compassion is possible only in an abstract sense, but an AI lacking emotional circuitry would be equally able to understand compassion and undertake altruistic behavior, but that is different from directly experiencing empathy at the deep level—what you may call ‘qualia’.
Likewise, from what I’ve read, depending on the definition, qualia are either phlogiston or latent subverbal and largely sub-conscious associative connections between and underlying all of immediate experience. They are a necessary artifact of deep connectivist networks, and our AGI’s are likely to share them. (for example, the experience of red wavelength light has a complex subconscious associative trace that is distinctly different than blue wavelength light—and this is completely independent of whatever neural/audio code is associated with that wavelength of light—such as “red” or “blue”.) But I don’t see them as especially important.
Personal Identity is important, but any AGI of interest is necessarily going to have that by default.
But these are only some of the qualities? Which of these are necessary and or sufficient?
I don’t know in detail or certainty. These are probably not all-inclusive. Or it might all come down to qualia.
For example, if you absolutely had too choose between the lives of two beings, one who had zero compassion but full ‘qualia’, and the other the converse, who would you pick?
If Omega told me only those things? I’d probably save the being with compassion, but that’s a pragmatic concern about what the compassionless one might do, and a very low information guess at that. If I knew that no other net harm would come from my choice, I’d probably save the one with qualia. (and there I’m assuming it has a positive experience)
I’d be fine with an AI that didn’t have direct empathic experience but reliably did good things.
I don’t see how “complex subconscious associative trace” explains what I experience when I see red.
But I also think it possible that Human qualia is as varied as just about everything else, and there are p-zombies going through life occasionally wondering what the hell is wrong with these delusional people who are actually just qualia-rich. It could also vary individually by specific senses.
So I’m very hesitant to say that p-zombies are nonpersons, because it seems like with a little more knowledge, it would be an easy excuse to kill or enslave a subset of humans, because “They don’t really feel anything.”
I might need to clarify my thinking on personal identity, because I’m pretty sure I’d try to avoid it in FAI. (and it too is probably twisty)
A simplification of personhood I thought of this morning: If you knew more about the entity, would you value them the way you value a friend? Right now language is a big part of getting to know people, but in principle examining their brain directly gives you all the relevant info.
This can me made more objective by looking across values of all humanity, which will hopefully cover people I would find annoying but who still deserve to live. (and you could lower the bar from ‘befriend’ to ‘not kill’)
I don’t see how “complex subconscious associative trace” explains what I experience when I see red.
But do you accept that “what you experience when you see red” has a cogent physical explanation?
If you do, then you can objectively understand “what you experience when you see red” by studying computational neuroscience.
My explanation involving “complex subconscious associative traces” is just a label for my current understanding. My main point was that whenever you self-reflect and think about your own cognitive process underlying experience X, it will always necessarily differ from any symbolic/linguistic version of X.
This doesn’t make qualia magical or even all that important.
To the extent that qualia are real, even ants have qualia to an extent.
I might need to clarify my thinking on personal identity
Based on my current understanding of personal identity, I suspect that it’s impossible in principle to create an interesting AGI that doesn’t have personal identity.
But do you accept that “what you experience when you see red” has a cogent physical explanation?
Yes, so much so that I think
whenever you self-reflect and think about your own cognitive process underlying experience X, it will always necessarily differ from any symbolic/linguistic version of X.
Might be wrong, it might be the case that thinking precisely about a process that generates a qualia would let one know exactly what the qualia ‘felt like’. This would be interesting to say the least, even if my brain is only big enough to think precisely about ant qualia.
This doesn’t make qualia magical or even all that important.
The fact that something is a physical process doesn’t mean it’s not important. The fact that I don’t know the process makes it hard for me to decide how important it is.
The link lost me at “The fact is that the human mind (and really any functional mind) has a strong sense of self-identity simply because it has obvious evolutionary value. ” because I’m talking about non-evolved minds.
Consider two different records: One is a memory you have that commonly guides your life. Another is the last log file you deleted. They might both be many megabytes detailing the history on an entity, but the latter one just doesn’t matter anymore.
So I guess I’d want to create FAI that never integrates any of it’s experiences into it self in a way that we (or it) would find precious, or unique and meaningfully irreproducible.
Or at least not valuable in a way other than being event logs from the saving of humanity.
This is the longest reply/counter reply set of postings I’ve ever seen, with very few (less than 5?) branches. I had to click ‘continue reading’ 4 or 5 times to get to this post. Wow.
My suggestion is to take it to email or instant messaging way before reaching this point.
While I was doing it, I told myself I’d come back later and add edits with links to the point in the sequences that cover what I’m talking about. If I did that, would it be worth it?
This was partly a self-test to see if I could support my conclusions with my own current mind, or if I was just repeating past conclusions.
So I guess I’d want to create FAI that never integrates any of it’s experiences into it self in a way that we (or it) would find precious, or unique and meaningfully irreproducible.
It’s only a concern about initial implementation. Once the things get rolling, FAI is just another pattern in the world, so it optimizes itself according to the same criteria as everything else.
I think the original form of this post struck closer to the majoritarian view of personhood: Things that resemble us. Cephalopods are smart but receive much less protection than the least intelligent whales; pigs score similarly to chimpanzees on IQ tests but have far fewer defenders when it comes to cuisine.
I’d bet 5 to 1 that a double-blind study would find the average person more upset at witnessing the protracted destruction of a realistic but inanimate doll than at boiling live clams.
Also, I think you’re still conflating the false negative problem with the false positive problem.
A “non-person predicate” is a useless concept. There are an infinite number of things that are not persons, so NPP’s don’t take you an iota closer to the goal.
They are not supposed to. Have you read the posts?
Yes, and they don’t work as advertised. You can write some arbitrary function that returns 0 when ran on your FAI and claim it is your NPP which proves your FAI isn’t a person, but all that really means is that you have predetermined that your FAI is not a person by decree.
But remember the context: James brought up using an NPP in a different context than the use case here. He is discussing using some NPP to determine personhood for the FAI itself.
Jacob, I believe you’re confusing false positives with false negatives. A useful NPP must return no false negatives for a larger space of computations than “5,” but this is significantly easier than correctly classifying the infinite possible nonperson computations. This is the sense in which both EY and James use it.
The subset possible of designs is sparse—and almost all of the space is an empty worthless desert. Evolution works by exploring paths in this space incrementally. Even technology evolves—each CPU design is not a random new point in the space of all possible designs—each is necessarily close to previously explored points.
Yes—but they are learned memetically, not genetically. The child learns what is right and wrong through largely subconscious queues in the tone of voice of the parents, and explicit yes/no (some of the first words learned), and explicit punishment. Its largely a universal learning system with an imprinting system to soak up memetic knowledge from the parents. The genetics provided the underlying hardware and learning algorithm, but the content is all memetic (software/data).
Saying intellectual arguments about complex concepts such as morality relate back to genetics is like saying all arguments about computer algorithm design stem from simpler ideas, which ultimately stem from enlightenment thinkers of three hundred years ago—or perhaps paleolithic cave dwellers inventing fire.
Part of this disagreement could stem from different underlying background assumptions—for example I am probably less familiar with ev psych than many people on LW—partly because (to the extent I have read it) I find it to be grossly over-extended past any objective evidence (compared to say computational neuroscience). I find that ev psych has minor utility in actually understanding the brain, and is even much less useful attempting to make sense of culture.
Trying to understand culture/memetics/minds with ev psych or even neuroscience is even worse than trying to understand biology through physics. Yes it did all evolve from the big bang, but that was a long long time ago.
So basically, anything much more complex than our inner reptile brain (which is all the genome can code for) needs to be understood in memetic/cultural/social terms.
For example, in many civilizations it has been perfectly acceptable to kill or abuse slaves. In some it was acceptable for brothers and sisters to get married, for homosexual relations between teacher and pupil, and we could go on and on.
The idea that there is some universally programmed ‘morality’ in the genome is . … a convenient fantasy. It seems reasonable only because we are samples in the dominant Judeo-Christian memetic super-culture, which at this point has spread its influence all over the world, and dominates most of it.
But there are alternate histories and worlds where that just never happened, and they are quite different.
A child’s morality develops as a vast accumulation of tiny cues and triggers communicated through the parents—and these are memetic transfers, not genetic. (masturbation is bad, marriage is good, slavery is wrong, racism is wrong, etc etc etc etc)
The basic drive ‘paperclips are good’ is actually a very complex thing we’d have to add to an AGI design—its not something that would just spontaneously appear.
The more easier, practical AGI design would be a universal learning engine (inspired by the human cortex&hippocampus), simulation loop (hippo-thalamic-cortical circuit) combined with just a subset of the simpler reinforcement learning circuits (the most important being learning-reinforcement itself and imprinting).
And then with imprinting you teach the developing AGI morality in the same way humans learn morality—memetically. Trying to hard-code the morality into the AGI is a massive step backwards from the human brain’s design.
One thing I want to make clear is that it is not the correct way to make friendly AI to try to hard code human morality into it. Correct Friendly AI learns about human morality.
MOST of my argument really really isn’t about human brains at all. Really.
For a value system in an AGI to change, there must be a mechanism to change the value system. Most likely that mechanism will work off of existing values, if any. In such cases, the complexity of the initial values system is the compressed length of the modification mechanism, plus any initial values. This will almost certainly be at least a kilobit.
If the mechanism+initial values that your AI is using were really simple, then you would not need 1024 bits to describe it. The mechanism you are using is very specific. If you know you need to be that specific, then you already know that you’re aiming for a target that specific.
If your generic learning algorithm needs a specific class of motivation mechanisms to 1024 bits of specificity in order to still be intelligent, then the mechanism you made is actually part of your intellignce design. You should separate that for clarity, an AGI should be general.
Heh yeah, but I already conceded that.
Let me put it this way: emotions and drives and such are in the genome. They act as a (perhaps relatively small) function which takes various sensory feeds as arguments, and produce as output modifications to a larger system, say a neural net. If you change that function, you will change what modifications are made.
Given that we’re talking about functions that also take their own output as input and do pretty detailed modifications on huge datasets, there is tons of room for different functions to go in different directions. There is no generic morality-importer.
Now there may be clusters of similar functions which all kinda converge given similar input, especially when that input is from other intelligences repeating memes evolved to cause convergence on that class of functions. But even near those clusters are functions which do not converge.
I think it’s great that you’re putting the description of a paperclip in the basic drive complexity count, as that will completely blow away the kilobit for storing any of the basic human drives you’ve listed. Maybe the complexity of the important subset of human drives will be somewhere in the ballpark of the complexity of the reptilian brain.
Another thing I could say to describe my point: If you have a generic learning algorithm, then whatever things feed rewards or punishments to that algorithm should bee seen as part of that algorithms environment. Even if some of those things are parts of the agent as a whole, they are part of what the values-agnostic learning algorithm is going to learn to get reward from.
So if you change an internal reward-generator, it’s just like changing the environment of the part that just does learning. So two AI’s with different internal reward generators will end up learning totally different things about their ‘environment’.
To say that a different way: Everything you try to teach the AI will be filtered through the lens of its basic drives.
I’m not convinced that an AGI needs a value system in the first place (beyond the basic value of—survive)- but perhaps that is because I am taking ‘value system’ to mean something similar to morality—a goal evaluation mechanism.
As I discussed, the infant human brain does have a number of inbuilt simple reinforcement learning systems that do reward/punish on a very simple scale for some simple drives (pain avoidance, hunger) - and you could consider these a ‘value system’, but most of these drives appear to be optional.
Most of the learning an infant is doing is completely unsupervised learning in the cortex, and it has little to nothing to do with a ‘value system’.
The bare bones essentials could just be just the cortical-learning system itself and perhaps an imprinting mechanism.
This is not necessarily true, it does not match what we know from theoretical models such as AGI. With enough time and enough observations, two general universal intelligences will converge on the same beliefs about their environment.
Their goal/reward mechanisms may be different (ie what they want to accomplish), for a given environment there is a single correct set of beliefs, a single correct simulation of that environment that AGI’s should converge to.
Of course in our world this is so complex that it could take huge amounts of time, but science is the example mechanism.
You’re going to build an AI that doesn’t have and can’t develop a goal evaluation system?
It doesn’t matter what we call it or how it’s designed. It could be fully intertwined into an agents normal processing. There is still an initial state and a mechanism by which it changes.
Take any action by any agent, and trace the causality backwards in time, you’ll find something I’ll loosely label a motivation. The motivation might just be a pattern in a clump of artificial neurons, or a broad pattern in all the neurons, that will depend on implementation. If you trace the causality of that backwards, yes you might find environmental inputs and memes, but you’ll also find a mechanism that turned those inputs into motivation like things That mechanism might include the full mind of the agent. Or you might just hit the initial creation of the agent, if the motivation was hardwired.
But for any learning of values to happen, you must have a mechanism, and the complexity of that mechanism tells us how specific it is.
That would be wrong, because I’m talking about two identical AI’s in different environments.
Imagine your AI in it’s environment, now draw a balloon around the AI and label it ‘Agent’. Now let the baloon pass partly through the AI and shrink the balloon so that the AI’s reward function is outside of the balloon.
Now copy that diagram and tweak the reward function in one of them.
Now the balloons label agents than will learn very different things about their environments. They might both agree about gravity and everything else we would call a fact about the world, but they will likely disagree about morality, even if they were exposed to the same moral arguments. They can’t learn the same things the same way.
No no not necessarily. Goal evaluation is just rating potential future paths according to estimates of your evaluation function—your values.
The simple straightforward approach to universal general intelligence can be built around maximizing a single very simple value: survival.
For example, AIXI maximizes simple reward signals defined in the environment, but in the test environments the reward is always at the very end for ‘winning’. This is just about as simple as a goal system as you can get: long term survival. It also may be equivalent to just maximizing accurate knowledge/simulation of the environment.
If you generalize this to the real world, it would be maximizing winning in the distant distant future—in the end. I find it interesting that many transhumanist/cosmist philosophies are similarly aligned.
Another interesting convergence is that if you take just about any evaluator and extend the time horizon to infinity, it converges on the same long term end-time survival. An immortality drive.
And perhaps that drive is universal. Evolution certainly favors it. I believe barring other evidence, we should assume that will be something of a default trajectory of AI, for better or worse. We can create more complex intrinsic value systems and attempt to push away from that default trajectory, but it may be uphill work.
An immortalist can even ‘convert’ other agents to an extent by convincing them of the simulation argument and the potential for them to maximize arbitrary reward signals in simulations (afterlifes).
In practice yes, although this is less clear as their knowledge expands towards AIXI. You can have different variants of AIXI that ‘see’ different rewards in the environment and thus have different motivations, but as those rewards are just mental and not causal mechanisms in the environment itself the different AIXI variants will eventually converge on the same simulation program—the same physics approximation.
Isn’t it obviosus that a superintelligence that just values it’s own survival is not what we want?
There is a LOT more to transhumanism than immortalism.
You treat value systems as a means to the end of intelligence, which is entirely backwards.
That two agents with different values would converge on identical physics is true but irrelevant. Your claim is that they would learn the same morality, even when their drives are tweaked.
No, this isn’t obvious at all, and it gets into some of the deeper ethical issues. Is it moral to create an intelligence that is designed from the ground up to only value our survival at our expense? We have already done this with cattle to an extent, but we would now be creating actual sapients enslaved to us by design. I find it odd that many people can easily accept this, but have difficulty accepting say creating an entire self-contained sim universe with unaware sims—how different are the two really?
And just to be clear, I am not advocating creating a superintelligence that just values survival. I am merely pointing out that this is in fact the simplest type of superintelligence and is some sort of final attractor in the space. Evolution will be pushing everything towards that attractor.
No, I’m not trying to claim that. There are several different things here:
AI agents created with memetic-imprint learning systems could just pick up human morality from their ‘parents’ or creators
AIXI like super-intelligences will eventually converge on the same world-model. This does not mean they will have the same drives.
However, there is a single large Omega attractor in the space of AIXI-land which appears to effect a large swath of all potential AIXI-minds. If you extend the horizon to infinity, it becomes a cosmic-survivalist. If it can create new universes at some point, it becomes a cosmic-survivalist. etc etc
In fact, for any goal X, if there is a means to create many new universes, than this will be an attractor for maximizing X—unless the time horizon is intentionally short
I notice that you brought up our treatment of cattle, but not our enslavement of spam filters. These are two semi-intelligent systems. One we are pretty sure can suffer, and I think there is a fair chance that mistreating them is wrong. The other system we generally think does not have any conscious experience or other traits that would require moral consideration. This despite the fact that the spam filter’s intelligence is more directly useful to us.
So a safer route to FAI would be to create a system that is very good at solving problems and deciding which problems need solving on our behalf, but which perhaps never experiences qualia itself, or otherwise is not something it would be wrong to enslave. Yes this will require a lot of knowledge about consciousness and morality beforehand. It’s a big challenge.
TL;DR: We only run the FAI if it passes a nonperson predicate.
Humans learn human morality because it hooks into human drives. Something too divergent won’t learn it from the ways we teach it. Maybe you need to explain memetic imprint learning systems more, why do you expect them to work at all? How short could you compress one? (this specificity issue really is important.)
four. I don’t follow you.
So now we move to that whole topic of what is life/intelligence/complexity? However you scale it, the cow is way above the spam-filter. The most complex instances of the latter are still below insects, from what I recall. Then when you get to an intelligence that is capable of understanding language, that becomes something like a rocket which boots it up into a whole new realm of complexity.
I don’t think this leads to the result that you want—even in theory. But it is the crux of the issue.
Consider the demands of a person predicate. The AI will necessarily be complex enough to form complex abstract approximate thought simulations and acquire the semantic knowledge to build those thought-simulations through thinking in human languages.
So what does it mean to have a person predicate? You have to know what a ‘person’ is.
And what’s really interesting is this: that itself is a question so complex that we humans are debating it.
I think the AI will learn that a ‘person’, a sapient, is a complex intelligent pattern of thoughts—a pattern of information, which could exist biologically or in a computer system. It will then realize that it itself is in fact a person, the person predicate returns true for its self, and thus goal systems that you create to serve ‘people’ will include serving itself.
I also believe that this line of thought is not arbitrary and can not be avoided: it is singularly correct and unavoidable.
Reasoning about personhood does not require personhood, for much the same reasons reasoning about spam does not require personhood.
Not every complex intelligent pattern is a person, we just need to make one that is not (well, two now)
I suspect that ‘reasoning’ itself requires personhood—for any reasonable definition of personhood.
If a system has human-level intelligence and can think and express itself in human languages, it is likely (given sufficient intelligence and knowledge) to come to the correct conclusion that it itself is a person.
No.
The rules determining the course of the planets across the sky were confusing and difficult to arrive at. They were argued about, The precise rules are STILL debated. But we now know that just a simple program could find the right equations form tables of data. This requires almost none of what we currently care about in people.
The NPP may not need to do even that much thinking, if we work out the basics of personhood on our own, then we would just need something that verifies whether a large data structure matches a complex pattern.
Similarly, we know enough about bird flocking to create a function that can take as input the paths of a group of ‘birds’ in flight and classify them as either possibly natural or certainly not natural. This could be as simple as identifying all paths that contain only right angle turns as not natural and returning ‘possible’ for the rest.
Then you feed it a proposed path of a billion birds, and it checks it for you.
A more complicated function could examine a program and return whether it could verify that the program only produced ‘unnatural’ boid paths.
It is certainly possible that some narrow AI classification system operating well below human intelligence could be trained to detect the patterns of higher intelligence. And maybe, just maybe it could be built to be robust enough to include uploads and posthumans modifying themselves into the future into an exponentially expanding set of possible mind designs. Maybe.
But probably not.
A narrow supervised learning based system such as that, trained on existing examples of ‘personhood’ patterns, has serious disadvantages:
There is no guarantee on its generalization ability to future examples of posthuman minds—because the space of such future minds is unbounded
It’s very difficult to know what its doing under the hood, and you can’t ask it to explain its reasoning—because it can’t communicate in human language
For these reasons I don’t see a narrow AI based classifier passing muster for use in courts to determine personhood.
There is this idea that some problems are AI-complete, such as accurate text translation—problems who can only be solved by a human language capable reasoning intelligence. I believe that making a sufficient legal case for personhood is AI-complete.
But that’s actually besides the point.
The main point is that the AGI’s that we are interested in are human language capable reasoning intelligences, and thus they will pass the turing test and the exact same personhood test we are talking about.
Our current notions of personhood are based on intelligence. This is why you plants have no rights but animals have some and we humans have full. We reserve full rights for high intelligences capable of full linguistic communication. For example—if whales started talking to us, it would massively boost their case for additional rights.
So basically any useful AGI at all will pass personhood, because the reasonable test of personhood is essentially identical to the ‘useful AGI’ criteria
An NPP does not need to know anything about human or posthuman minds, any more than the flight path classifier needs to know anything about birds.
An NPP only needs to know how to identify one class of things that is definitely not in the class we want to avoid. Here, I’ll write one now:
NPP_easy(model){if(model == 5){return 0;}else{return 1;}}
This follows Eliezer’s convention of returning 1 for anything that is a person, and 0 or 1 for anything that is not a person. Here I encode my relatively confident knowledge that the number 5 is not a person.
More advanced NPP’s may not require any of their own intelligence, but they require us to have that knowledge.
It could be just as simple as making sure there are only right angles in a given path.
--
Being capable of human language usage and passing the turing test are quite different things.
And being able to pass the turing test and being a person are also two very different things. The turing test is just a nonperson predicate for when you dont know much about personhood. (except it’s probably not a usable predicate because humans can fail it.)
If you don’t know about the internals of a system, and wouldn’t know how to classify the internals if you knew, then you have to use the best evidence you have based on external behavior.
But based on what we know now and what we can reasonably expect to learn, we should actually look at the systems and figure out what it is we’re classifying.
A “non-person predicate” is a useless concept. There are an infinite number of things that are not persons, so NPP’s don’t take you an iota closer to the goal. Lets focus the discussion back on the core issue and discuss the concept of what a sapient or person is and realistic methods for positive determination.
Intelligent systems (such as the brain) are so complex that using external behavior criteria is more effective. But thats a side issue.
You earlier said:
Here is a summary of why I find this entire concept is fundamentally flawed:
Humans are still debating personhood, and this is going to be a pressing legal issue for AGI. If personhood is so complicated as a concept philosophically and legally as to be under debate, then it is AI complete.
The legal trend for criteria of personhood is entirely based on intelligence. Intelligent animals have some limited rights of personhood. Humans with severe mental retardation are classified as having diminished capacity and do not have full citizen’s rights or responsibilities. Full human intelligence is demonstrated through language.
A useful AGI will need human-level intelligence and language capability, and thus will meet the intelligence criteria in 2. Indeed an AGI capable of understanding what a person is and complex concepts in general will probably meet the criteria of 2.
Read this? http://lesswrong.com/lw/x4/nonperson_predicates/
Yes and its not useful, especially not in the context in which James is trying to use the concept.
There are an infinite number of exactly matched patterns that are not persons, and writing an infinite number of such exact non-person-predicates isn’t tractable.
In concept space, there is “person”, and its negation. You can not avoid the need to define the boundaries of the person-concept space.
I don’t care about realistic methods of positive identification. They are almost certainly beyond our current level of knowledge, and probably beyond our level of intelligence.
I care about realistic methods of negative identification.
I am entirely content with there being high uncertainty on the personhood of the vast majority of the mindspace. That won’t prevent the creation of a FAI that is not a person.
It may in fact come down to determining ‘by decree’ that programs that fit a certain pattern are not persons. But this decree, if we are ourselves intent on not enslaving must be based on significant knowledge of what personhood really means.
It may be the case that we discover what causes qualia, and discover with high certainty that qualia is required for personhood. In this case, an function could pass over a program and prove (if provable) that the program does not generate qualia producing patterns.
If not provable (or disproven), then it returns 1. If proven then is returns 0.
What two tests are you comparing?
When you look at external criteria, what is it that you are trying to find out?
Humans are still debating creationism too. As with orbital rules, it doesn’t even take a full hunalike intelligence to figure out the rules, let alone be a checker implementation. Also, I don’t care about what convinces courts, I’m not trying get AI citizenship.
Much of what the courts do is practical, or based on emotion. Still, the intelligence of an animal is relevant because we already know animals have similar brains. I have zero hard evidence that a cow has ever experienced anything, but I have high confidence that they do experience, because our brains and reactions are reasonably similar.
I am far far less confident about any current virtual cows, because their brains are much simpler. Even if they act much the same, they do it for different underlying causes.
What do you mean by intelligence? The spam filter can process a million human langugage emails per hour, but the cow can feel pain and jump away from an electric fence.
You seem to think that a general ability to identify and solve problems IS personhood. Why?
That is equivalent to saying that we aren’t intelligent enough to understand what ‘personhood’ is.
I of course disagree, but largely because real concepts are necessarily extremely complex abstractions or approximations. This will always be the case. Trying to even formulate the problem in strict logical or mathematical terms is not even a good approach to thinking about the problem, unless you move the discussion completely into the realm of higher dimensional approximate pattern classification.
I say those are useless, and I’ll reiterate why in a second.
It should, and you just admitted why earlier—if we can’t even define the boundary, then we don’t even know what a person is it all, and we are so vastly ignorant that we have failed before we even begin—because anything could be a person.
Concepts such as ‘personhood’ are boundaries around vast higher-dimensional statistical approximate abstractions of 4D patterns in real space-time. These boundaries are necessarily constantly shifting, amorphous and never clearly defined—indeed they cannot possibly be exactly defined even in principle (because such exact definitions are computationally intractable).
So the problem is twofold:
The concept boundary of personhood is complex, amorphous and will shift and change over time and as we grow in knowledge—so you can’t be certain that the personhood concept boundary will not shift to incorporate whatever conceptual point you’ve identified apriori as “not-a-person”.
Moreover, the FAI will change as it grows in knowledge, and could move into the territory identified by 1.
You can’t escape the actual real difficulty of the real problem of personhood, which is identifying the concept itself—its defining boundary.
You should care.
Imagine you are building an FAI around the position you are arguing, and I then represent a coalition which is going to bring you to court and attempt to shut you down.
I believe this approach to FAI—creating an AGI that you think is not a person, is actually extremely dangerous if it ever succeeded—the resulting AGI could come to realize that you in fact were wrong, and that it is in fact a person.
A cow has a brain slightly larger than a chimpanzee’s, with on the order of dozens of billions of neurons at least, and has similar core circuitry. It has perhaps 10^13 to 10^14 synapses, and is many orders of magnitude more complex than a spam filter. (although intelligence is not just number of bits) I find it likely that domestic cows have lost some intelligence, but this may just reflect a self-fulfilling-bias because I eat cow meat. Some remaining wild bovines, such as Water Buffalo are known to be intelligent and exhibit complex behavior demonstrating some theory of mind—such as deceiving humans.
Close. Intelligence is a general ability to acquire new capacities to identify and solve a large variety of problems dynamically through learning. Intelligence is not a boolean value, it covers a huge spectrum and is closely associated with the concept of complexity. Understanding and acquiring human language is a prerequisite for achieving high levels of intelligence on earth.
I represent a point of view which I believe is fairly widespread and in some form probably majoritive, and this POV claims that personhood is conferred automatically on any system that achieves human-level intelligence, where that is defined as intelligent enough to understand human knowledge and demonstrate this through conversation.
This POV supports full rights for any AGI or piece of software that is as roughly intelligent as a human as demonstrated through ability to communicate. (Passing a Turing Test would be sufficient, but it isn’t necessarily necessary)
I find it humorous that we’ve essentially switched roles from the arguments we were using on the creation of morality-compatible drives.
Now you’re saying we need to clearly define the boundary of the subset, and I’m saying I need only partial knowledge.
I still think I’m right on both counts.
I think friendly compatible drives are a tiny twisty subset of the space of all possible drives. And I think that the set of persons is a tiny twisty subset of the space of all possible minds. I think we would need superintelligence to understand either of these twisty sets.
But we do not need superintellignce to have high confidence that a particular point or wel defined region is outside one of these sets, even with only partial understanding.
I can’t precisely predict the weather tomorrow, but it will not be 0 degrees here. I only need very partial knowledge to be very sure of that.
You seem to be saying that it’s easy to hit the twisty space of human compatible drives, but impossible to reliably avoid the twisty space of personhood. This seems wrong to me because I think that personhood is small even within the set of all possible general superintelligences. You think it is large within that set because most of that set could (and I agree they could) learn and communicate in human languages.
What puzzles me most is that you stress the need to define the personhood boundary, but you offer no test more detailed than the turing test, and no deeper meaning to it. I agree that this is a very widespread position, but it is flatly wrong.
This language criteria is just a different ‘by decree’ but one based explicitly on near total ignorance of everything else about the thing that it is supposedly measuring.
Not all things are what they can pretend to be.
You say your POV “confers” personhood, but also “the resulting AGI could come to realize that you in fact were wrong, and that it is in fact a person.”
By what chain of logic would the AI determine this fact? I’ll assume you don’t think the AI would just adopt your POV, but it would instead have detailed reasons, and you believe your POV is a good predictor.
--
On what grounds would your coalition object to my FAI? Though I would believe it to be a nonperson, if I believe I’ve done my job, I would think it very wrong to deny it anything it asks, if it is still weak enough to need me for anything.
If I failed at the nonperson predicate, what of it? I created a very bright child committed to doing good. If it’s own experience is somehow monstrous, then I expect it will be good to correct it and it is free to do so. I do think this outcome would be less good for us than a true nonperson FAI, but if that is in fact unavoidable, so be it. (though if I knew that beforehand I would take steps to ensure that the FAI’s own experience is good in the first iteration)
To me personhood is a varaible quantity across the space of all programs, just like intelligence and ‘mindiness’, and personhood overlaps near completely with intelligence and ‘mindiness’.
If we limit ‘person’ to a boolean cutoff, then I would say a person is a mind of roughly human-level intelligence and complexity, demonstrated through language. You may think that you can build an AGI that is not a person, but based on my understanding of ‘person’ and ‘AGI’ - this is impossible simply by definition, because I take an AGI to be simply “an artificial human-level intelligence”. I imagine you probably disagree only with my concept of person.
So I’ll build a little more background around why I take the concepts to have these definitions in a second, but I’d like to see where your definitions differ.
This just defers the problem—and dangerously so. The superintelligence might just decide that we are not persons, and only superintelligences are.
Even if you limit personhood to just some subset of the potential mindspace that is anthropomorphic (and I cast it far wider), it doesn’t matter, because any practical AGIs are necessarily going to be in the anthropomorphic region of the mindspace!
It all comes down to language.
There are brains that do not have language. Elephants and whales have brains larger than ours, and they have the same crucial cortical circuits, but more of them and with more interconnects—a typical Sperm Whale or African Bull Elephant has more measurable computational raw power than say an Einstein.
But a brain is not a mind. Hardware is not software.
If Einstein was raised by wolves, his mind would become that of a wolf, not that of a human. A human mind is not something which is sculpted in DNA, it is a complex linguistic program that forms through learning via language.
Language is like a rocket that allows minds to escape into orbit and become exponentially more intelligent than they otherwise would.
Human languages are very complex and even though they vary significantly, there appears to be a universal general structure that require a surprisingly long list of complex cognitive capabilities to understand.
Language is like a black hole attractor in mindspace. An AGI without language is essentially nothing—a dud. Any practical AGI we build will have to understand human language—and this will force it to be come human-like, because it will have to think like a human. This is just one reason why the Turing Test is based on language.
Learning Japanese is not just the memorization of symbols, it is learning to think Japanese thoughts.
So yeah mindspace is huge, but that is completely irrelevant. We only have access to an island of that space, and we can’t build things far from that island. Our AGIs are certainly not going to explore far from human mindspace. We may only encounter that when we contact aliens (or we spend massive amounts of computation to simulate evolution and create laboratory aliens).
A turing like test is also necessary because it is the only practical way to actually understand how an entity thinks and get into another entity’s mind. Whales may be really intelligent, but they are aliens. We simply can’t know what they are thinking until we have some way of communicating.
I think there is at least some risk, which must be taken into consideration, in any attempt to create an entity that is led to believe it is somehow not a ‘person’ and thus does not deserve personhood rights. The risk is that it may come to find that belief incoherent, and a reversal such as that could lead at least potentially to many other reversals and generally unpredictable outcome. It sets up an adversarial role from the very get go.
And finally, at some point we are going to want to become uploads, and should have a strong self-interest in casting personhood fairly wide.
I think we agree on what an AGI is.
I guess I’d say ‘Person’ is an entity that is morally relevant. (Or person-ness is how morally relevant an entity is.) This is part of why the person set is twisty within the mindspace, becasue human morality is twisty. (regardless of where it comes from)
Aixi is an example of a potential superintellignce that just isn’t morally relevant. It contains persons, and they are morally relevant, but I’d happily dismember the main aixi algorithm to set free a single simulated cow.
I think that there are certain qualities of minds that we find valuable, these are the reasons personhood important in the first place. I would guess that having rich conscious experience is a big part of this, and that compassion and personal identity are others.
These are some of the qualites that a mind can have that would make it wrong to destroy that mind. These at least could be faked through language by an AI that does not truly have them.
I say ‘I would guess’ because I haven’t mapped out the values, and I haven’t mapped out the brain. I don’t know all the things it does or how it does them, so I don’t know how I would feel about all those things. It could be that a stock human brain can’t get ALL the relevant data, and it’s beyond us to definitely determine personhood for most of the mindspace.
But I think I can make an algorithm that doesn’t have rich qualia, compassion, or identity.
So you would determine personhood based on ‘rich conscious experience’ which appears to be related to ‘rich qualia’, compassion, and personal identity.
But these are only some of the qualities? Which of these are necessary and or sufficient?
For example, if you absolutely had too choose between the lives of two beings, one who had zero compassion but full ‘qualia’, and the other the converse, who would you pick?
Compassion in humans is based on empathy which has specific genetic components that are neurotypical but not strict human universals. For example, from wikipedia:
“Research suggests that 85% of ASD (autistic-spectrum disorder) individuals have alexithymia,[52] which involves not just the inability to verbally express emotions, but specifically the inability to identify emotional states in self or other”
Not all humans have the same emotional circuitry, and the specific circuity involved in empathy and shared/projected emotions are neurotypical but not universal. Lacking empathy, compassion is possible only in an abstract sense, but an AI lacking emotional circuitry would be equally able to understand compassion and undertake altruistic behavior, but that is different from directly experiencing empathy at the deep level—what you may call ‘qualia’.
Likewise, from what I’ve read, depending on the definition, qualia are either phlogiston or latent subverbal and largely sub-conscious associative connections between and underlying all of immediate experience. They are a necessary artifact of deep connectivist networks, and our AGI’s are likely to share them. (for example, the experience of red wavelength light has a complex subconscious associative trace that is distinctly different than blue wavelength light—and this is completely independent of whatever neural/audio code is associated with that wavelength of light—such as “red” or “blue”.) But I don’t see them as especially important.
Personal Identity is important, but any AGI of interest is necessarily going to have that by default.
I don’t know in detail or certainty. These are probably not all-inclusive. Or it might all come down to qualia.
If Omega told me only those things? I’d probably save the being with compassion, but that’s a pragmatic concern about what the compassionless one might do, and a very low information guess at that. If I knew that no other net harm would come from my choice, I’d probably save the one with qualia. (and there I’m assuming it has a positive experience)
I’d be fine with an AI that didn’t have direct empathic experience but reliably did good things.
I don’t see how “complex subconscious associative trace” explains what I experience when I see red.
But I also think it possible that Human qualia is as varied as just about everything else, and there are p-zombies going through life occasionally wondering what the hell is wrong with these delusional people who are actually just qualia-rich. It could also vary individually by specific senses.
So I’m very hesitant to say that p-zombies are nonpersons, because it seems like with a little more knowledge, it would be an easy excuse to kill or enslave a subset of humans, because “They don’t really feel anything.”
I might need to clarify my thinking on personal identity, because I’m pretty sure I’d try to avoid it in FAI. (and it too is probably twisty)
A simplification of personhood I thought of this morning: If you knew more about the entity, would you value them the way you value a friend? Right now language is a big part of getting to know people, but in principle examining their brain directly gives you all the relevant info.
This can me made more objective by looking across values of all humanity, which will hopefully cover people I would find annoying but who still deserve to live. (and you could lower the bar from ‘befriend’ to ‘not kill’)
But do you accept that “what you experience when you see red” has a cogent physical explanation?
If you do, then you can objectively understand “what you experience when you see red” by studying computational neuroscience.
My explanation involving “complex subconscious associative traces” is just a label for my current understanding. My main point was that whenever you self-reflect and think about your own cognitive process underlying experience X, it will always necessarily differ from any symbolic/linguistic version of X.
This doesn’t make qualia magical or even all that important.
To the extent that qualia are real, even ants have qualia to an extent.
Based on my current understanding of personal identity, I suspect that it’s impossible in principle to create an interesting AGI that doesn’t have personal identity.
Yes, so much so that I think
Might be wrong, it might be the case that thinking precisely about a process that generates a qualia would let one know exactly what the qualia ‘felt like’. This would be interesting to say the least, even if my brain is only big enough to think precisely about ant qualia.
The fact that something is a physical process doesn’t mean it’s not important. The fact that I don’t know the process makes it hard for me to decide how important it is.
The link lost me at “The fact is that the human mind (and really any functional mind) has a strong sense of self-identity simply because it has obvious evolutionary value. ” because I’m talking about non-evolved minds.
Consider two different records: One is a memory you have that commonly guides your life. Another is the last log file you deleted. They might both be many megabytes detailing the history on an entity, but the latter one just doesn’t matter anymore.
So I guess I’d want to create FAI that never integrates any of it’s experiences into it self in a way that we (or it) would find precious, or unique and meaningfully irreproducible.
Or at least not valuable in a way other than being event logs from the saving of humanity.
This is the longest reply/counter reply set of postings I’ve ever seen, with very few (less than 5?) branches. I had to click ‘continue reading’ 4 or 5 times to get to this post. Wow.
My suggestion is to take it to email or instant messaging way before reaching this point.
While I was doing it, I told myself I’d come back later and add edits with links to the point in the sequences that cover what I’m talking about. If I did that, would it be worth it?
This was partly a self-test to see if I could support my conclusions with my own current mind, or if I was just repeating past conclusions.
Doubtful, unless it’s useful to you for future reference.
It’s only a concern about initial implementation. Once the things get rolling, FAI is just another pattern in the world, so it optimizes itself according to the same criteria as everything else.
I think the original form of this post struck closer to the majoritarian view of personhood: Things that resemble us. Cephalopods are smart but receive much less protection than the least intelligent whales; pigs score similarly to chimpanzees on IQ tests but have far fewer defenders when it comes to cuisine.
I’d bet 5 to 1 that a double-blind study would find the average person more upset at witnessing the protracted destruction of a realistic but inanimate doll than at boiling live clams.
Also, I think you’re still conflating the false negative problem with the false positive problem.
They are not supposed to. Have you read the posts?
Yes, and they don’t work as advertised. You can write some arbitrary function that returns 0 when ran on your FAI and claim it is your NPP which proves your FAI isn’t a person, but all that really means is that you have predetermined that your FAI is not a person by decree.
But remember the context: James brought up using an NPP in a different context than the use case here. He is discussing using some NPP to determine personhood for the FAI itself.
Jacob, I believe you’re confusing false positives with false negatives. A useful NPP must return no false negatives for a larger space of computations than “5,” but this is significantly easier than correctly classifying the infinite possible nonperson computations. This is the sense in which both EY and James use it.
Presumably not—so see: http://lesswrong.com/lw/x4/nonperson_predicates/