If a genie cares enough about your request to interpret and respond to its naive denotation, it also cares enough to interpret your request’s obvious connotations. The apparently fine line between them is a human construction. Your proposed interpretation only makes sense if the genie is a rules-lawyer with at-least-instrumentally-oppositional interests/incentives, in which case one wonders where those oppositional interests/incentives came from. (Which is where we’re supposed to bring in Omohundro et cetera but meh.)
If a genie cares enough about your request to interpret and respond to its naive denotation, it also cares enough to interpret your request’s obvious connotations.
That doesn’t follow. There just isn’t any reason that the former implies the latter. Either kind of caring is possible but they are not the same thing (and the second is likely more complex than the first).
Your proposed interpretation only makes sense if the genie is a rules-lawyer
This much is true. (Or at least it must be something that follows rules.)
with at-least-instrumentally-oppositional interests/incentives
This isn’t required. It need no oppositional interests/incentives at all beyond, after they are given a request, the desire to honour it. This isn’t a genie trying to thwart someone in order to achieve some other goal. It is just the genie trying to the intent in order to for some other purpose. It is a genie only caring about the request and some jackass asking for something they don’t want. (Rather than ‘oppositional’ it could be called ‘obedient’, where it turns out that isn’t what is desired.)
in which case one wonders where those oppositional interests/incentives came from.
Presumably it got it’s wish granting motives from whoever created it or otherwise constructed the notion of the wish granter genie.
There just isn’t any reason that the former implies the latter. Either kind of caring is possible but they are not the same thing (and the second is likely more complex than the first).
(Very hastily written:) The former doesn’t imply the latter, it’s just that both interpreting denotation and interpreting connotation are within an order of magnitude as difficult as each other and they aren’t going to be represented by a djinn or an AGI as two distinct classes of interpretation, there’s no natural boundary between them. I mean I guess the fables can make the djinns weirdly stunted in that way, but then the analogy to AGIs breaks down, because interpreting denotation but not connotation is unnatural and you’d have to go out of your way to make an AGI that does that. By hypothesis the AGI is already interpreting natural speech, not compiling code. I mean you can argue that denotation and connotation actually are totally different beasts and we should expect minds-in-general to treat them that way, but my impression is that what we know of linguistics suggests that isn’t the case. (ETA: And I mean even just interpreting the “denotation” requires a lot of context already, obviously; why are we taking that subset of context for granted while leaving out only the most important context? Makes sense for a moralistic djinn fable, doesn’t make sense by analogy to AGI.) (ETA2: Annoyed that this purely epistemic question is going to get bogged down in and interpreted in the light of political boo- / yay-AI-risk-prevention stances, arguments-as-soldiers style.)
The former doesn’t imply the latter, it’s just that both interpreting denotation and interpreting connotation are within an order of magnitude as difficult as each other
This much is true. It is somewhat more difficult to implement a connotation honoring genie (because that requires more advanced referencing and interpretation) but both tasks fall under already defined areas of narrow AI. The difference in difficulty is small enough that I more or less ignore it as a trivial ‘implementation detail’. People could create (either as fiction or as AI) either of these things and each have different problems.
Annoyed that this purely epistemic question is going to get bogged down in and interpreted in the light of political boo- / yay-AI-risk-prevention stances, arguments-as-soldiers style.
Your mind reading is in error. To be honest this seems fairly orthogonal to AI-risk-prevention stances. From what I can tell someone with a particular AI stance hasn’t got an incentive either way because both these types of genie are freaking dangerous in their own way. The only difference acknowledging the possibility of connotation honouring genies makes is perhaps to determine which particular failure mode you potentially end up in. Having a connotation honouring genie may be an order of magnitude safer than a literal genie but unless there is almost-FAI-complete code in there in the background as a a safeguard it’s still something I’d only use if I was absolutely desperate. I round off the safety difference between the two to negligible in approximately the same way I round off the implementation difficulty difference.
As a ‘purely epistemic question’ your original claim is just plain false. However, as another valid point that is somewhat which we have both skirted around the edges of explaining adequately. I (think that I) more or less agree with what you are saying in this follow up comment. I suggest that the main way that AI interest influence this conversation is that it promotes (and is also caused by) interest in being accurate about precisely what the expected outcomes of goal systems are and just what the problems of a given system happen to be.
Sorry, didn’t mean to imply you’d be the one mind-killed, just the general audience. From previous interactions I know you’re too rational for that kind of perversion.
Having a connotation honouring genie may be an order of magnitude safer than a literal genie
I actually think it’s many, many orders of magnitude safer, but that’s only because a denotation honoring genie is just egregiously stupid. A connotation honoring genie still isn’t safe unless “connotation-honoring” implies something at least as extensive and philosophically justifiable as causal validity semantics. I honestly expect the average connotation-honoring genie will lie in-between a denotation-honoring genie and a bona fide justifiable AGI—i.e., it will respect human wishes about as much as humans respect, say, alligator wishes, or the wishes of their long-deceased ancestors. On average I expect an Antichrist, not a Clippy. But even if such an AGI doesn’t kill all of us and maybe even helps us on average, the opportunity cost of such an AGI is extreme, and so I nigh-wholeheartedly support the moralistic intuitions that traditionally lead people to use djinn analogies. Still, I worry that the underlying political question really is poisoning the epistemic question in a way that might bleed over into poor policy decisions re AGI. (Drunk again, apologies for typos et cetera.)
Sorry, didn’t mean to imply you’d be the one mind-killed, just the general audience. From previous interactions I know you’re too rational for that kind of perversion.
Thank you for your generosity but in all honesty I have to deny that. I at times notice in myself the influence of social political incentives. I infer from what I do notice (and, where appropriate, resist) that there are other influences that I do not detect.
I honestly expect the average connotation-honoring genie will lie in-between a denotation-honoring genie and a bona fide justifiable AGI—i.e., it will respect human wishes about as much as humans respect, say, alligator wishes, or the wishes of their long-deceased ancestors.
That seems reasonable.
But even if such an AGI doesn’t kill all of us and maybe even helps us on average, the opportunity cost of such an AGI is extreme, and so I nigh-wholeheartedly support the moralistic intuitions that traditionally lead people to use djinn analogies.
I agree that there is potentially significant opportunity cost but perhaps if anything it sounds like I may be more willing to accept this kind of less-than-ideal outcome. For example if right now I was forced to make a choice whether to accept this failed utopia based on a fully connotative honoring artificial djinn or to leave things exactly as they are I suspect I would accept it. It fails as a utopia but it may still be better than the (expected) future we have right now.
I think you have a point Will (an AI that interprets speech like a squish djinn would require deliberate effort and is proposed by no one), but I think that it is possible to construct a valid squish djinn/AI analogy (a squish djinn interpreting a command would be roughly analogous to an AI that is hard coded to execute that command).
Sorry to everyone for the repetitive statements and the resulting wall of text (that unexpectedly needed to be posted as multiple comments since it was to long). Predicting how people will interpret something is non trivial, and explaining concepts redundantly is sometimes a useful way of making people hear what you want them to hear.
Squish djinn is here used to denote a mind that honestly believes that it was actually instructed to squish the speaker (in order to remove regret for example), not a djinn that wants to hurt the speaker and is looking for a loophole. The squish djinn only care about doing what it is requested to do, and does not care at all about the well being of the requester, so it could certainly be referred to as hostile to the speaker (since it will not hesitate to hurt the speaker in order to achieve its goal (of fulfilling the request)). A cartoonish internal monologue of the squish djinn would be: “the speaker clearly does not want to be squished, but I don’t care what the speaker wants, and I see no relation between what the speaker wants and what it is likely to request, so I determine that the speaker requested to be squished, so I will squish” (which sounds very hostile, but contains no will to hurt the speaker). The typical story djinn is unlikely to be a squish djinn (they usually have a motive to hurt or help the speaker, but is restricted by rules (a clever djinn that wants to hurt the speaker might still squish, but not for the same reasons as a squish djinn (such a djinn would be a valid analogy when opposing a proposal of the type “lets build some unsafe mind with selfish goals and impose rules on it” (such a project can never succeed, and the proposer is probably fundamentally confused, but a simple and correct and sufficient counter argument is: “if the project did succeed, the result would be very bad”)))).
To expand on you having a point. I have obviously not seen every AI proposal on the internet, but as far as I know, no one is proposing to build a wish granting AI that parses speech like a squish djinn (and ending up with such an AI would require a deliberate effort). So I don’t think the squish djinn is a valid argument against proposed wish granting AIs. Any proposed or realistic speech interpreting AI would (as you say) parse english speech as english speech. An AI that makes arbitrary distinctions between different types of meaning would need serious deliberate effort, and as far as I know, no one is proposing to do this. This makes the squish djinn analogy invalid as an argument against proposals to build a wish granting AI. It is a basic fact that statements does not have specified “meanings” attached to them, and AI proposals takes this into account. To take an extreme example to make this very clear would be Bill saying: “Steve is an idiot” to two listeners where one listener will predictably think of one Steve and the other listener will predictable think of some other Steve (or a politician making a speech that different demographics will interpret differently and to their own liking). Bill (or the politician) does not have a specific meaning of which Steve (or which message) they are referring to. This speaker is deliberately making a statement in order to have different effects on different audiences. Another standard example is responding to a question about the location of an object with: “look behind you” (anyone that is able to understand english and has no serious mental deficiencies would be able to guess that the meaning is that the object is/might be behind them (as opposed to following the order and be surprised to see the object lying there and think “what a strange coincidence”)). Building an AI that would parse “look behind you” without understanding that the person is actually saying “it is/might be behind you” would require deliberate effort as it would be necessary to painstakingly avoid using most information while trying to understand speech. Tone of voice, body language, eye gaze, context, prior knowledge of the speaker, models of people in general, etc, etc all provide valuable information when parsing speech. And needing to prevent an AI from using this information (even indirectly, for example through models of “what sentences usually mean”) would put enormous additional burdens on an AI project. An example in the current context would be writing: “It is possible to communicate in a way so that one class of people will infer one meaning and take the speaker seriously and another class of people will infer another meaning and dismiss it as nonsense. This could be done by relying on the fact that people differ in their prior knowledge of the speaker and in their ability to understand certain concepts. One can use non standard vocabulary, take non standard strong positions, describe non common concepts, or otherwise give signals indicating that the speaker is a person that should not be taken seriously so that the speaker is dismissed by most people as talking nonsense. But people that knows the speaker would see a discrepancy and look closer (and if they are familiar with the non standard concepts behind all the “don’t listen to me” signs they might infer a completely different message).”.
To expand on the valid AI squish djinn analogy. I think that hard coding an AI that executes a command is practically impossible. But if it did succeeded, it would act sort of like a squish djinn given that command. And this argument/analogy is a valid and sufficient argument against trying to hard code such a command, making it relevant as long as there exists people that propose to hardcode such commands. If someone tried to hardcode an AI to execute such a command, and they succeeded in creating something that had a real world impact, I predict this represents a failure to implement the command (it would result in an AI that does something other than the squish djinn and something other than what the builders expect it to do). So the squish djinn is not a realistic outcome. But it is what would happen if they succeeded, and thus the squish djinn analogy is a valid argument against “command hard coding” projects. I can’t predict what such an AI would actually do since that depends on how the project failed. Intuitively the situation where confused researchers fail to build a squish djinn does not feel very optimal, but making an argument on this basis is more vague, and require that the proposing researchers accepts their own limited technical ability (saying “doing x is clearly technically possible, but you are not clever enough to succeed” to the typical enthusiastic project proposer (that considers themselves to be clever enough to maybe be the first in the world to create a real AI) might not be the most likely argument to succeed (here I assume that the intent is to be understood, and not to lay the groundworks for later smugly saying “I pointed that out a long time ago” (if one later wants to be smug, then one should optimize for being loud, taking clear and strong positions, and not being understood))). The squish djinn analogy is simply a simpler argument. “Either you fail or you get a squish djinn” is true and simple and sufficient to argue against a project. When presenting this argument, you do spend most of the time arguing about what would happen in a situation that will never actually happen (project success). This might sound very strange to an outside observer, but the strangeness is introduced by the project proposers (invalid) assumption that the project can succeed (analogous to some atheist saying: “if god exists, and is omnipotent, then he is not nice, cuz there is suffering”).
(I’m arrogantly/wisely staying neutral on the question of whether or not it is at all useful to in any way engage with the sort of people whose project proposals can be validly argued against using squish djinn analogies)
(jokes often work by deliberately being understood in different ways at different times by the same listener (the end of the joke deliberately changes the interpretation of the beginning of the joke (in a way that makes fun of someone)). In this case the meaning of the beginning of the joke is not one thing or the other thing. The listener is not first failing to understand what was said and then, after hearing the end, succeeding to understand it. The speaker is intending the listener to understand the first meaning until reaching the end, so the listener is not “first failing to encode the transmission”. There is no inherently true meaning of the beginning of the joke, no inherently true person that this speaker is actually truly referring to. Just a speaker that intends to achieve certain effects on an audience by saying things (and if the speaker is successful, then at the beginning of the joke the listener infers a different meaning from what it infers after hearing the end of the joke). One way to illuminate the concepts discussed above would be to write: “on a somewhat related note, I once considered creating the username “New_Willsome” and to start posting things that sounded like you (for the purpose of demonstrating that if you counter a ban by using sock puppets, you loose your ability to stop people from speaking in your name (I was considering the options of actually acting like I think you would have acted, and the option of including subtle distortions to what I think you would have said, and the option of doing my best to give better explanations of the concepts that you talk about)). But then a bunch of usernames similar to yours showed up and were met with hostility, and I was in a hurry, and drunk, and bat shit crazy, and God told me not to do it, and I was busy writing fanfic, so I decided not to do it (the last sentence is jokingly false. I was not actually in a hurry … :) … )”)
“Wishes” are just collections of coded sounds intended to help people deduce our desires. Many people (not necessarily you, IDK) seem to model the genie as attempting to attack us while maintaining plausible deniability that it simply misinterpreted our instructions, which, naturally, does occasionally happen because there’s only so much information in words and we’re only so smart.
In other words, it isn’t trying to understand what we mean; it’s trying to hurt us without dropping the pretense of trying to understand what we mean. And that’s pretty anthropomorphic, isn’t it?
If your genie is using your vocal emissions as information toward the deduction of your extrapolated volition, then I’d say your situation is good.
Your problems start if it works more by attempting to extract a predicate from your sentence by matching vocal signals against known syntax and dictionaries, and output an action that maximises the probability of that predicate being true with respect to reality.
To put it simply, I think that “understanding what we mean” is really a complicated notion that involves knowing what constitutes true desires (as opposed to, say, akrasia), and of course having a goal system that actually attempts to realize those desires.
Yes, that’s the essence of it. People do it all the time. Generally, all sorts of pseudoscientific scammers try to maintain image of honest self deception; in the medical scams in particular, the crime is just so heinous and utterly amoral (killing people for cash) that pretty much everyone goes well out of their way to be able to pretend at ignorance, self deception, misinterpretation, carelessness and enthusiasm. But why would some superhuman AI need plausible deniability?
Many people (not necessarily you, IDK) seem to model the genie as attempting to attack us while maintaining plausible deniability that it simply misinterpreted our instructions, which, naturally, does occasionally happen because there’s only so much information in words and we’re only so smart.
This is something that people do (and some forms of wish granters do implement this form of ‘malicious obedience’). However this is not what is occurring in this particular situation. This is mere obedience, not malicious obedience. An entirely different (and somewhat lesser) kind of problem. (Note that this reply is to your point, not to Will’s point which is not quite the same and which I mostly agree with.)
You are hoping for some sort of benevolent power that does what is good for us using all information available including prayers and acting in our best interests. That would indeed be an awesome thing and if I were building something it is would be what I created. But it would be a different kind of creature to either a genie as in the initial example, genie as your reply assumes or the genie that is (probably) just as easy to create and specify (to within an order of magnitute or two).
In other words, it isn’t trying to understand what we mean; it’s trying to hurt us without dropping the pretense of trying to understand what we mean. And that’s pretty anthropomorphic, isn’t it?
Not especially. That is, it is generic agency, not particularly humanlike agency. It is possible to create a genie that does try to understand what me mean. It is also possible to create an agent that that does understand what we mean then tries to the worst for us within the confines of literal meaning. Either of these goal systems could be described as anthropomorphic.
Well, a genie isn’t going to care about what we think unless it was designed to do so, which seems like a very human thing to make it do. But whatever.
As for the difference between literal and malicious genies … I’m just not sure what a “literal” genie is supposed to be doing, if it’s not deducing my desires based on audio input. Interpreting things “literally” is a mistake people make while trying to do this; a merely incompetent genie might make the same mistake, but why should we pay any more attention to that mistake rather than, say, mishearing us, or mistaking parts of our instructions for sarcasm?
Exactly. There isn’t a literal desire in an audio waveform, nor in words. And there’s a literal genie: the compiler. You have to be very verbose with it, though—because it doesn’t model what you want, it doesn’t cull down the space of possible programs down to much smaller space of programs you may want, and you have to point into much larger space, for which you use much larger index, i.e. write long computer programs.
Well, the compiler would not process right your normal way of speaking, because the normal way of speaking requires modelling of the speaker for interpretation.
An image from the camera can mean a multitude of things. It could be an image of a cat, or a dog. An image is never literally a cat or a dog, of course. To tell apart cats and dogs with good fidelity, one has to model the processes producing the image, and classify those based on some part of the model—the animal—the data of interest is a property of the process which produced the input. Natural processing of the normal manner of speaking of language is done using same general mechanisms—one has to take in the data and model the process producing the data, to obtain properties of the process which would be actually meaningful, and since humans all have this ability, the natural language does not—in normal manner of speaking—have any defined literal meaning that is naturally separate from some subtle meaning or intent.
You have to be very verbose with it, though—because it doesn’t model what you want, it doesn’t cull down the space of possible programs down to much smaller space of programs you may want,
Have you used ML? I’ve been told by its adherents that it does a good job of doing just that.
I’ve been told by [insert language here] advocates that it does a good job of [insert anything]. The claims are inversely proportional to popularity. Basically, no programming language what so ever infers anything about any sort of high level intent (and no, type of expression is not a high level intent), so they’re all pretty much equal except some are more unusable than others and subsequently less used. Programming currently works as following: human, using a mental model of the environment, makes a string that gets computer to do something. Most types of cleverness put into in “how compiler works” part thus can be expected to decrease, rather than increase productivity, and indeed that’s precisely what happens with those failed attempts at a better language.
Basically, no programming language what so ever infers anything about any sort of high level intent (and no, type of expression is not a high level intent),
The phrase they (partially tongue in cheek) used was “compile time correctness checking”, i.e., the criterion of being a syntactically correct ML program is better approximation to the space of programs you may want than is the case for most other languages.
a larger proportion of the strings that fail to compile in ML are programs that exhibit high-level behavior that you don’t want?
This formulation is missing the programmer’s mind. The claim that a programming language is better in this way is that, for a given intended result, the set of strings that
a programmer would believe achieve the desired behavior,
compile*, and
do not exhibit the desired behavior
is smaller than for other languages — because there are fewer ways to write program fragments that deviate from the obvious-to-the-(reader|writer) behavior.
Is it harder to write a control program for a wind turbine that causes excessive fatigue cracking in ML as compared to any other language?
The claim is yes, given that the programmer is intending to write a program which does not cause excessive fatigue cracking.
(I’m not familiar with ML; I do not intend to advocate it here. I am attempting to explicate the general thinking behind any effort to create/advocate a better-in-this-dimension programming language.)
* for ’dynamic” languages, substitute “does not signal an error on a typical input”, i.e., is not obviously broken when trivially tested
Suppose that the programmer is unaware of the production-line issues which result in stress concentration on turbine blades and create the world such that turbines which cycle more often have larger fatigue cracks. Suppose the programmer is also unaware of the lack of production-line issues which result in larger fatigue cracks on turbines that were consistently overspeed.
The programmer is aware that both overspeed and cyclical operations will result in the growth of two different types of cracks, and that the ideal solution uses both cycling the turbine and tolerating some amount of overspeed operation.
In that case, I don’t find it reasonable that the choice of programming language should have any effect on the belief of the programmer that fatigue cracks will propagate; the only possible benefit would be making the programmer more sure that the string was a program which controls turbines. The high-level goals of the programmer aren’t often within the computer.
at-least-instrumentally-oppositional interests/incentives, in which case one wonders where those oppositional interests/incentives came from.
All you need is a cost function. If the genie prefers achieving goals sooner rather than later, squishing you is a ‘better’ solution along that direction to remove your capacity for regret. Or if it prefers using less effort rather than more. Etc.
I’m confused by this line of defense because I think “I is the entity standing here right now” is sufficient to denote that the present moment of the wisher, as they make the wish, should not regret the wish. So making the future wisher not regret the wish, eg by killing them, breaks the denotation, because the present wisher will presumably regret that, once counter-factually informed about that aspect of the future.
If that’s not what you intended to denote, I’m curious what you did, and doubly curious what you intended to connote.
Well, yes, that is one way to remove the capacity for regret...
I mentally merged the possibility pump and the Mehtopia AI....say, a sloppy code mistake, or a premature compileandrun, resulting in the “do not tamper with minds” rule not getting incorporated correctly, even though “don’t kill humans” gets incorporated.
I assume what Will_Pearson meant to say was “would not regret making this wish”, which fits with the specification of “I is the entity standing here right now”. Basically such that: if before finishing/unboxing the AI, you had known exactly what would result from doing so, you would still have built the AI. (and it’s supposed the find out of that set of possibly worlds the one you would most like, or… something along those lines))
I’m not sure that would rule out every bad outcome, but… I think it probably would. Besides the obvious “other humans have different preferences from the guy building the AI”- maybe the AI is ordered to do a similar thing for each human individually- can anyone think of ways this would go badly?
|I wish that the future will turn out in such a way that I do not regret making this wish
… wish granted. the genie just removed the capacity for regret from your mind. MWAHAHAH!
Easier to do by just squishing someone, actually.
If a genie cares enough about your request to interpret and respond to its naive denotation, it also cares enough to interpret your request’s obvious connotations. The apparently fine line between them is a human construction. Your proposed interpretation only makes sense if the genie is a rules-lawyer with at-least-instrumentally-oppositional interests/incentives, in which case one wonders where those oppositional interests/incentives came from. (Which is where we’re supposed to bring in Omohundro et cetera but meh.)
Right, if you want a world that’s all naive denotation, zero obvious connotation, that’s computer programming!
That doesn’t follow. There just isn’t any reason that the former implies the latter. Either kind of caring is possible but they are not the same thing (and the second is likely more complex than the first).
This much is true. (Or at least it must be something that follows rules.)
This isn’t required. It need no oppositional interests/incentives at all beyond, after they are given a request, the desire to honour it. This isn’t a genie trying to thwart someone in order to achieve some other goal. It is just the genie trying to the intent in order to for some other purpose. It is a genie only caring about the request and some jackass asking for something they don’t want. (Rather than ‘oppositional’ it could be called ‘obedient’, where it turns out that isn’t what is desired.)
Presumably it got it’s wish granting motives from whoever created it or otherwise constructed the notion of the wish granter genie.
(Very hastily written:) The former doesn’t imply the latter, it’s just that both interpreting denotation and interpreting connotation are within an order of magnitude as difficult as each other and they aren’t going to be represented by a djinn or an AGI as two distinct classes of interpretation, there’s no natural boundary between them. I mean I guess the fables can make the djinns weirdly stunted in that way, but then the analogy to AGIs breaks down, because interpreting denotation but not connotation is unnatural and you’d have to go out of your way to make an AGI that does that. By hypothesis the AGI is already interpreting natural speech, not compiling code. I mean you can argue that denotation and connotation actually are totally different beasts and we should expect minds-in-general to treat them that way, but my impression is that what we know of linguistics suggests that isn’t the case. (ETA: And I mean even just interpreting the “denotation” requires a lot of context already, obviously; why are we taking that subset of context for granted while leaving out only the most important context? Makes sense for a moralistic djinn fable, doesn’t make sense by analogy to AGI.) (ETA2: Annoyed that this purely epistemic question is going to get bogged down in and interpreted in the light of political boo- / yay-AI-risk-prevention stances, arguments-as-soldiers style.)
This much is true. It is somewhat more difficult to implement a connotation honoring genie (because that requires more advanced referencing and interpretation) but both tasks fall under already defined areas of narrow AI. The difference in difficulty is small enough that I more or less ignore it as a trivial ‘implementation detail’. People could create (either as fiction or as AI) either of these things and each have different problems.
Your mind reading is in error. To be honest this seems fairly orthogonal to AI-risk-prevention stances. From what I can tell someone with a particular AI stance hasn’t got an incentive either way because both these types of genie are freaking dangerous in their own way. The only difference acknowledging the possibility of connotation honouring genies makes is perhaps to determine which particular failure mode you potentially end up in. Having a connotation honouring genie may be an order of magnitude safer than a literal genie but unless there is almost-FAI-complete code in there in the background as a a safeguard it’s still something I’d only use if I was absolutely desperate. I round off the safety difference between the two to negligible in approximately the same way I round off the implementation difficulty difference.
As a ‘purely epistemic question’ your original claim is just plain false. However, as another valid point that is somewhat which we have both skirted around the edges of explaining adequately. I (think that I) more or less agree with what you are saying in this follow up comment. I suggest that the main way that AI interest influence this conversation is that it promotes (and is also caused by) interest in being accurate about precisely what the expected outcomes of goal systems are and just what the problems of a given system happen to be.
Sorry, didn’t mean to imply you’d be the one mind-killed, just the general audience. From previous interactions I know you’re too rational for that kind of perversion.
I actually think it’s many, many orders of magnitude safer, but that’s only because a denotation honoring genie is just egregiously stupid. A connotation honoring genie still isn’t safe unless “connotation-honoring” implies something at least as extensive and philosophically justifiable as causal validity semantics. I honestly expect the average connotation-honoring genie will lie in-between a denotation-honoring genie and a bona fide justifiable AGI—i.e., it will respect human wishes about as much as humans respect, say, alligator wishes, or the wishes of their long-deceased ancestors. On average I expect an Antichrist, not a Clippy. But even if such an AGI doesn’t kill all of us and maybe even helps us on average, the opportunity cost of such an AGI is extreme, and so I nigh-wholeheartedly support the moralistic intuitions that traditionally lead people to use djinn analogies. Still, I worry that the underlying political question really is poisoning the epistemic question in a way that might bleed over into poor policy decisions re AGI. (Drunk again, apologies for typos et cetera.)
Thank you for your generosity but in all honesty I have to deny that. I at times notice in myself the influence of social political incentives. I infer from what I do notice (and, where appropriate, resist) that there are other influences that I do not detect.
That seems reasonable.
I agree that there is potentially significant opportunity cost but perhaps if anything it sounds like I may be more willing to accept this kind of less-than-ideal outcome. For example if right now I was forced to make a choice whether to accept this failed utopia based on a fully connotative honoring artificial djinn or to leave things exactly as they are I suspect I would accept it. It fails as a utopia but it may still be better than the (expected) future we have right now.
I think you have a point Will (an AI that interprets speech like a squish djinn would require deliberate effort and is proposed by no one), but I think that it is possible to construct a valid squish djinn/AI analogy (a squish djinn interpreting a command would be roughly analogous to an AI that is hard coded to execute that command).
Sorry to everyone for the repetitive statements and the resulting wall of text (that unexpectedly needed to be posted as multiple comments since it was to long). Predicting how people will interpret something is non trivial, and explaining concepts redundantly is sometimes a useful way of making people hear what you want them to hear.
Squish djinn is here used to denote a mind that honestly believes that it was actually instructed to squish the speaker (in order to remove regret for example), not a djinn that wants to hurt the speaker and is looking for a loophole. The squish djinn only care about doing what it is requested to do, and does not care at all about the well being of the requester, so it could certainly be referred to as hostile to the speaker (since it will not hesitate to hurt the speaker in order to achieve its goal (of fulfilling the request)). A cartoonish internal monologue of the squish djinn would be: “the speaker clearly does not want to be squished, but I don’t care what the speaker wants, and I see no relation between what the speaker wants and what it is likely to request, so I determine that the speaker requested to be squished, so I will squish” (which sounds very hostile, but contains no will to hurt the speaker). The typical story djinn is unlikely to be a squish djinn (they usually have a motive to hurt or help the speaker, but is restricted by rules (a clever djinn that wants to hurt the speaker might still squish, but not for the same reasons as a squish djinn (such a djinn would be a valid analogy when opposing a proposal of the type “lets build some unsafe mind with selfish goals and impose rules on it” (such a project can never succeed, and the proposer is probably fundamentally confused, but a simple and correct and sufficient counter argument is: “if the project did succeed, the result would be very bad”)))).
To expand on you having a point. I have obviously not seen every AI proposal on the internet, but as far as I know, no one is proposing to build a wish granting AI that parses speech like a squish djinn (and ending up with such an AI would require a deliberate effort). So I don’t think the squish djinn is a valid argument against proposed wish granting AIs. Any proposed or realistic speech interpreting AI would (as you say) parse english speech as english speech. An AI that makes arbitrary distinctions between different types of meaning would need serious deliberate effort, and as far as I know, no one is proposing to do this. This makes the squish djinn analogy invalid as an argument against proposals to build a wish granting AI. It is a basic fact that statements does not have specified “meanings” attached to them, and AI proposals takes this into account. To take an extreme example to make this very clear would be Bill saying: “Steve is an idiot” to two listeners where one listener will predictably think of one Steve and the other listener will predictable think of some other Steve (or a politician making a speech that different demographics will interpret differently and to their own liking). Bill (or the politician) does not have a specific meaning of which Steve (or which message) they are referring to. This speaker is deliberately making a statement in order to have different effects on different audiences. Another standard example is responding to a question about the location of an object with: “look behind you” (anyone that is able to understand english and has no serious mental deficiencies would be able to guess that the meaning is that the object is/might be behind them (as opposed to following the order and be surprised to see the object lying there and think “what a strange coincidence”)). Building an AI that would parse “look behind you” without understanding that the person is actually saying “it is/might be behind you” would require deliberate effort as it would be necessary to painstakingly avoid using most information while trying to understand speech. Tone of voice, body language, eye gaze, context, prior knowledge of the speaker, models of people in general, etc, etc all provide valuable information when parsing speech. And needing to prevent an AI from using this information (even indirectly, for example through models of “what sentences usually mean”) would put enormous additional burdens on an AI project. An example in the current context would be writing: “It is possible to communicate in a way so that one class of people will infer one meaning and take the speaker seriously and another class of people will infer another meaning and dismiss it as nonsense. This could be done by relying on the fact that people differ in their prior knowledge of the speaker and in their ability to understand certain concepts. One can use non standard vocabulary, take non standard strong positions, describe non common concepts, or otherwise give signals indicating that the speaker is a person that should not be taken seriously so that the speaker is dismissed by most people as talking nonsense. But people that knows the speaker would see a discrepancy and look closer (and if they are familiar with the non standard concepts behind all the “don’t listen to me” signs they might infer a completely different message).”.
To expand on the valid AI squish djinn analogy. I think that hard coding an AI that executes a command is practically impossible. But if it did succeeded, it would act sort of like a squish djinn given that command. And this argument/analogy is a valid and sufficient argument against trying to hard code such a command, making it relevant as long as there exists people that propose to hardcode such commands. If someone tried to hardcode an AI to execute such a command, and they succeeded in creating something that had a real world impact, I predict this represents a failure to implement the command (it would result in an AI that does something other than the squish djinn and something other than what the builders expect it to do). So the squish djinn is not a realistic outcome. But it is what would happen if they succeeded, and thus the squish djinn analogy is a valid argument against “command hard coding” projects. I can’t predict what such an AI would actually do since that depends on how the project failed. Intuitively the situation where confused researchers fail to build a squish djinn does not feel very optimal, but making an argument on this basis is more vague, and require that the proposing researchers accepts their own limited technical ability (saying “doing x is clearly technically possible, but you are not clever enough to succeed” to the typical enthusiastic project proposer (that considers themselves to be clever enough to maybe be the first in the world to create a real AI) might not be the most likely argument to succeed (here I assume that the intent is to be understood, and not to lay the groundworks for later smugly saying “I pointed that out a long time ago” (if one later wants to be smug, then one should optimize for being loud, taking clear and strong positions, and not being understood))). The squish djinn analogy is simply a simpler argument. “Either you fail or you get a squish djinn” is true and simple and sufficient to argue against a project. When presenting this argument, you do spend most of the time arguing about what would happen in a situation that will never actually happen (project success). This might sound very strange to an outside observer, but the strangeness is introduced by the project proposers (invalid) assumption that the project can succeed (analogous to some atheist saying: “if god exists, and is omnipotent, then he is not nice, cuz there is suffering”).
(I’m arrogantly/wisely staying neutral on the question of whether or not it is at all useful to in any way engage with the sort of people whose project proposals can be validly argued against using squish djinn analogies)
(jokes often work by deliberately being understood in different ways at different times by the same listener (the end of the joke deliberately changes the interpretation of the beginning of the joke (in a way that makes fun of someone)). In this case the meaning of the beginning of the joke is not one thing or the other thing. The listener is not first failing to understand what was said and then, after hearing the end, succeeding to understand it. The speaker is intending the listener to understand the first meaning until reaching the end, so the listener is not “first failing to encode the transmission”. There is no inherently true meaning of the beginning of the joke, no inherently true person that this speaker is actually truly referring to. Just a speaker that intends to achieve certain effects on an audience by saying things (and if the speaker is successful, then at the beginning of the joke the listener infers a different meaning from what it infers after hearing the end of the joke). One way to illuminate the concepts discussed above would be to write: “on a somewhat related note, I once considered creating the username “New_Willsome” and to start posting things that sounded like you (for the purpose of demonstrating that if you counter a ban by using sock puppets, you loose your ability to stop people from speaking in your name (I was considering the options of actually acting like I think you would have acted, and the option of including subtle distortions to what I think you would have said, and the option of doing my best to give better explanations of the concepts that you talk about)). But then a bunch of usernames similar to yours showed up and were met with hostility, and I was in a hurry, and drunk, and bat shit crazy, and God told me not to do it, and I was busy writing fanfic, so I decided not to do it (the last sentence is jokingly false. I was not actually in a hurry … :) … )”)
Actually, I think Will has a point here.
“Wishes” are just collections of coded sounds intended to help people deduce our desires. Many people (not necessarily you, IDK) seem to model the genie as attempting to attack us while maintaining plausible deniability that it simply misinterpreted our instructions, which, naturally, does occasionally happen because there’s only so much information in words and we’re only so smart.
In other words, it isn’t trying to understand what we mean; it’s trying to hurt us without dropping the pretense of trying to understand what we mean. And that’s pretty anthropomorphic, isn’t it?
If your genie is using your vocal emissions as information toward the deduction of your extrapolated volition, then I’d say your situation is good.
Your problems start if it works more by attempting to extract a predicate from your sentence by matching vocal signals against known syntax and dictionaries, and output an action that maximises the probability of that predicate being true with respect to reality.
To put it simply, I think that “understanding what we mean” is really a complicated notion that involves knowing what constitutes true desires (as opposed to, say, akrasia), and of course having a goal system that actually attempts to realize those desires.
Yes, that’s the essence of it. People do it all the time. Generally, all sorts of pseudoscientific scammers try to maintain image of honest self deception; in the medical scams in particular, the crime is just so heinous and utterly amoral (killing people for cash) that pretty much everyone goes well out of their way to be able to pretend at ignorance, self deception, misinterpretation, carelessness and enthusiasm. But why would some superhuman AI need plausible deniability?
This is something that people do (and some forms of wish granters do implement this form of ‘malicious obedience’). However this is not what is occurring in this particular situation. This is mere obedience, not malicious obedience. An entirely different (and somewhat lesser) kind of problem. (Note that this reply is to your point, not to Will’s point which is not quite the same and which I mostly agree with.)
You are hoping for some sort of benevolent power that does what is good for us using all information available including prayers and acting in our best interests. That would indeed be an awesome thing and if I were building something it is would be what I created. But it would be a different kind of creature to either a genie as in the initial example, genie as your reply assumes or the genie that is (probably) just as easy to create and specify (to within an order of magnitute or two).
Not especially. That is, it is generic agency, not particularly humanlike agency. It is possible to create a genie that does try to understand what me mean. It is also possible to create an agent that that does understand what we mean then tries to the worst for us within the confines of literal meaning. Either of these goal systems could be described as anthropomorphic.
Well, a genie isn’t going to care about what we think unless it was designed to do so, which seems like a very human thing to make it do. But whatever.
As for the difference between literal and malicious genies … I’m just not sure what a “literal” genie is supposed to be doing, if it’s not deducing my desires based on audio input. Interpreting things “literally” is a mistake people make while trying to do this; a merely incompetent genie might make the same mistake, but why should we pay any more attention to that mistake rather than, say, mishearing us, or mistaking parts of our instructions for sarcasm?
Exactly. There isn’t a literal desire in an audio waveform, nor in words. And there’s a literal genie: the compiler. You have to be very verbose with it, though—because it doesn’t model what you want, it doesn’t cull down the space of possible programs down to much smaller space of programs you may want, and you have to point into much larger space, for which you use much larger index, i.e. write long computer programs.
So, sorry—what is this “literal genie” doing, exactly? Is it trying to use my natural-language input as code, which is run to determine it’s actions?
Well, the compiler would not process right your normal way of speaking, because the normal way of speaking requires modelling of the speaker for interpretation.
An image from the camera can mean a multitude of things. It could be an image of a cat, or a dog. An image is never literally a cat or a dog, of course. To tell apart cats and dogs with good fidelity, one has to model the processes producing the image, and classify those based on some part of the model—the animal—the data of interest is a property of the process which produced the input. Natural processing of the normal manner of speaking of language is done using same general mechanisms—one has to take in the data and model the process producing the data, to obtain properties of the process which would be actually meaningful, and since humans all have this ability, the natural language does not—in normal manner of speaking—have any defined literal meaning that is naturally separate from some subtle meaning or intent.
Have you used ML? I’ve been told by its adherents that it does a good job of doing just that.
I’ve been told by [insert language here] advocates that it does a good job of [insert anything]. The claims are inversely proportional to popularity. Basically, no programming language what so ever infers anything about any sort of high level intent (and no, type of expression is not a high level intent), so they’re all pretty much equal except some are more unusable than others and subsequently less used. Programming currently works as following: human, using a mental model of the environment, makes a string that gets computer to do something. Most types of cleverness put into in “how compiler works” part thus can be expected to decrease, rather than increase productivity, and indeed that’s precisely what happens with those failed attempts at a better language.
The phrase they (partially tongue in cheek) used was “compile time correctness checking”, i.e., the criterion of being a syntactically correct ML program is better approximation to the space of programs you may want than is the case for most other languages.
In other words, a larger proportion of the strings that fail to compile in ML are programs that exhibit high-level behavior that you don’t want?
Is it harder to write a control program for a wind turbine that causes excessive fatigue cracking in ML as compared to any other language?
This formulation is missing the programmer’s mind. The claim that a programming language is better in this way is that, for a given intended result, the set of strings that
a programmer would believe achieve the desired behavior,
compile*, and
do not exhibit the desired behavior
is smaller than for other languages — because there are fewer ways to write program fragments that deviate from the obvious-to-the-(reader|writer) behavior.
The claim is yes, given that the programmer is intending to write a program which does not cause excessive fatigue cracking.
(I’m not familiar with ML; I do not intend to advocate it here. I am attempting to explicate the general thinking behind any effort to create/advocate a better-in-this-dimension programming language.)
* for ’dynamic” languages, substitute “does not signal an error on a typical input”, i.e., is not obviously broken when trivially tested
Suppose that the programmer is unaware of the production-line issues which result in stress concentration on turbine blades and create the world such that turbines which cycle more often have larger fatigue cracks. Suppose the programmer is also unaware of the lack of production-line issues which result in larger fatigue cracks on turbines that were consistently overspeed.
The programmer is aware that both overspeed and cyclical operations will result in the growth of two different types of cracks, and that the ideal solution uses both cycling the turbine and tolerating some amount of overspeed operation.
In that case, I don’t find it reasonable that the choice of programming language should have any effect on the belief of the programmer that fatigue cracks will propagate; the only possible benefit would be making the programmer more sure that the string was a program which controls turbines. The high-level goals of the programmer aren’t often within the computer.
Why would there be some creating agency involved any more than we need a “whoever” to explain where human characteristics come from?
All you need is a cost function. If the genie prefers achieving goals sooner rather than later, squishing you is a ‘better’ solution along that direction to remove your capacity for regret. Or if it prefers using less effort rather than more. Etc.
I’m confused by this line of defense because I think “I is the entity standing here right now” is sufficient to denote that the present moment of the wisher, as they make the wish, should not regret the wish. So making the future wisher not regret the wish, eg by killing them, breaks the denotation, because the present wisher will presumably regret that, once counter-factually informed about that aspect of the future.
If that’s not what you intended to denote, I’m curious what you did, and doubly curious what you intended to connote.
Well, yes, that is one way to remove the capacity for regret...
I mentally merged the possibility pump and the Mehtopia AI....say, a sloppy code mistake, or a premature compileandrun, resulting in the “do not tamper with minds” rule not getting incorporated correctly, even though “don’t kill humans” gets incorporated.
I assume what Will_Pearson meant to say was “would not regret making this wish”, which fits with the specification of “I is the entity standing here right now”. Basically such that: if before finishing/unboxing the AI, you had known exactly what would result from doing so, you would still have built the AI. (and it’s supposed the find out of that set of possibly worlds the one you would most like, or… something along those lines)) I’m not sure that would rule out every bad outcome, but… I think it probably would. Besides the obvious “other humans have different preferences from the guy building the AI”- maybe the AI is ordered to do a similar thing for each human individually- can anyone think of ways this would go badly?