Problem one: It would require the human to be able to correctly design utopia (or at least not a dystopia—being able to design a not-dystopia is probably rarer than one might think).
Problem two: There are moral problems in letting an AI simulate a human in sufficiently high detail.
Problem three: In certain cases, the human might command things that they do not want. In particular, if the human-module is simulated as essentially a human within the AI, the AI might wirehead the human and ask if humans in general should be wireheaded (or something like that).
Problem one can be addressed by only allowing certain questions/orders to be given.
Problem two is a real problem, with no solution currently.
Problem three sounds like it isn’t a problem—the initial model the AI has of a human, is not of a wireheaded human (though it is of a wireheadable human). What exactly did you have in mind?
Which leads to the obvious question of whether figuring out the rules about the questions is much simpler than figuring out the rules for morality. Do you have a specific, simple class of questions/orders in mind?
Yes, but it seems to me that your approach is dependent on an ‘immoral’ system: simulating humans in too high detail. In other cases, one might attempt to make a nonperson predicate and eliminate all models that fail, or something. However, your idea seems to depend on simulated humans.
Well, it depends on how the model of the human works and how it is asked questions. That would probably depend a lot on how the original AI structured the model of the human, and we don’t currently have any AIs to test that with. The point is, though, that in certain cases, the AI might compromise the human, for instance by wireheading it or convincing it of a religion or something, and then the compromised human might command destructive things. There’s a huge, hidden amount of trickiness, such as determining how to give the human correct information to decide etc etc.
3 is the general problem of AI’s behaving badly. The way that this approach is supposed to avoid that is by having constructing a “human interpretation module” that is maximally accurate, and then using that module+human instructions to be the motivation of the AI.
Basically I’m using a lot of the module approach (and the “false miracle” stuff to get counterfactuals): the AI that builds the human interpretation module will build it for the purpose of making it accurate, and the one that uses it will have it as part of its motivation. The old problems may rear their heads again if the process is ongoing, but “module X” + “human instructions” + “module X’s interpretation of human instructions” seems rather solid as a one-off initial motivation.
The problem is that the ‘human interpretation module’ might give the wrong results. For instance, if it convinces people that X is morally obligatory, it might interpret that as X being morally obligatory. It is not entirely obvious to me that it would be useful to have a better model. It probably depends on what the original AI wants to do.
I know, but my point is that such a model might be very perverse, such as “Humans do not expect to find out that you presented misleading information.” rather than “Humans do not expect that you present misleading information.”
You’re right. This thing can come up in terms of “predicting human behaviour”, if the AI is sneaky enough. It wouldn’t come up in “compare human models of the world to reality”. So there are subtle nuances there to dig into...
If the human is in of the very few with a capacity or interest in grand world changing schemes, they might have trouble coming up with a genuine utopia. If they are one of the great majority without, all you can expect out of them is incremental changes.
And there isnt a moral dilemma in building the AI in the first place, even though it is ,by hypothesis, a superset of the human? You are making an assumptions or two about qualia, and they are bound to .be unjustified assumptions.
Most people I’ve talked to have one or two world changing schemes that they want to implement. This might be selection bias, though.
It is not at all obvious to me that any optimizer would be personlike. Sure, it would be possible (maybe even easy!) to build a personlike AI, but I’m not sure it would “necessarily” happen. So I don’t know if those problems would be there for an arbitrary AI, but I do know that they would be there for its models of humans.
In a sense you should be confused about qualia/TWAAFFTI, because we know next nothing about the subject. It might be the case that we “qualia” adds some extra level of confusiojn,...although it might alternatively be the case that TWAAFFTI is something that sounds like an explanation without being actually being an explanation. In particular, TWAAFFTI sets no constraints on what kind of algorithm would have morally relevant feelings, which reinforces my original point: if you think an embedded simulation of al human is morally relevant, how can you deny relevance to the host, even at times when it isnt simulating a human?
Maybe it would be clearer if we looked at some already existing maximization processes. Take for instance evolution. Evolution maximizes inclusive genetic fitness. You punish it by not donating sperm/eggs. I don’t care, because evolution is not a personlike thing.
Problem one: It would require the human to be able to correctly design utopia (or at least not a dystopia—being able to design a not-dystopia is probably rarer than one might think).
Problem two: There are moral problems in letting an AI simulate a human in sufficiently high detail.
Problem three: In certain cases, the human might command things that they do not want. In particular, if the human-module is simulated as essentially a human within the AI, the AI might wirehead the human and ask if humans in general should be wireheaded (or something like that).
Problem one can be addressed by only allowing certain questions/orders to be given.
Problem two is a real problem, with no solution currently.
Problem three sounds like it isn’t a problem—the initial model the AI has of a human, is not of a wireheaded human (though it is of a wireheadable human). What exactly did you have in mind?
Which leads to the obvious question of whether figuring out the rules about the questions is much simpler than figuring out the rules for morality. Do you have a specific, simple class of questions/orders in mind?
Yes, but it seems to me that your approach is dependent on an ‘immoral’ system: simulating humans in too high detail. In other cases, one might attempt to make a nonperson predicate and eliminate all models that fail, or something. However, your idea seems to depend on simulated humans.
Well, it depends on how the model of the human works and how it is asked questions. That would probably depend a lot on how the original AI structured the model of the human, and we don’t currently have any AIs to test that with. The point is, though, that in certain cases, the AI might compromise the human, for instance by wireheading it or convincing it of a religion or something, and then the compromised human might command destructive things. There’s a huge, hidden amount of trickiness, such as determining how to give the human correct information to decide etc etc.
3 is the general problem of AI’s behaving badly. The way that this approach is supposed to avoid that is by having constructing a “human interpretation module” that is maximally accurate, and then using that module+human instructions to be the motivation of the AI.
Basically I’m using a lot of the module approach (and the “false miracle” stuff to get counterfactuals): the AI that builds the human interpretation module will build it for the purpose of making it accurate, and the one that uses it will have it as part of its motivation. The old problems may rear their heads again if the process is ongoing, but “module X” + “human instructions” + “module X’s interpretation of human instructions” seems rather solid as a one-off initial motivation.
The problem is that the ‘human interpretation module’ might give the wrong results. For instance, if it convinces people that X is morally obligatory, it might interpret that as X being morally obligatory. It is not entirely obvious to me that it would be useful to have a better model. It probably depends on what the original AI wants to do.
The module is supposed to be a predictive model of what humans mean or expect, rather than something that “convinces” or does anything like that.
I know, but my point is that such a model might be very perverse, such as “Humans do not expect to find out that you presented misleading information.” rather than “Humans do not expect that you present misleading information.”
You’re right. This thing can come up in terms of “predicting human behaviour”, if the AI is sneaky enough. It wouldn’t come up in “compare human models of the world to reality”. So there are subtle nuances there to dig into...
If the human is in of the very few with a capacity or interest in grand world changing schemes, they might have trouble coming up with a genuine utopia. If they are one of the great majority without, all you can expect out of them is incremental changes.
And there isnt a moral dilemma in building the AI in the first place, even though it is ,by hypothesis, a superset of the human? You are making an assumptions or two about qualia, and they are bound to .be unjustified assumptions.
Most people I’ve talked to have one or two world changing schemes that they want to implement. This might be selection bias, though.
It is not at all obvious to me that any optimizer would be personlike. Sure, it would be possible (maybe even easy!) to build a personlike AI, but I’m not sure it would “necessarily” happen. So I don’t know if those problems would be there for an arbitrary AI, but I do know that they would be there for its models of humans.
It is not at all obvious to me that being personlike is necessary to have qualia at all, for all that might be necessary for having personlike qualia.
I dislike the concept of qualia because it seems to me that it’s just a confusing name for “how inputs feel from the inside of an algorithm”.
In a sense you should be confused about qualia/TWAAFFTI, because we know next nothing about the subject. It might be the case that we “qualia” adds some extra level of confusiojn,...although it might alternatively be the case that TWAAFFTI is something that sounds like an explanation without being actually being an explanation. In particular, TWAAFFTI sets no constraints on what kind of algorithm would have morally relevant feelings, which reinforces my original point: if you think an embedded simulation of al human is morally relevant, how can you deny relevance to the host, even at times when it isnt simulating a human?
Maybe it would be clearer if we looked at some already existing maximization processes. Take for instance evolution. Evolution maximizes inclusive genetic fitness. You punish it by not donating sperm/eggs. I don’t care, because evolution is not a personlike thing.