selylindi

Karma: 640

selylindi Dec 4, 2014, 3:45 AM
3 points
in reply to: KatjaGrace’s comment on: Superintelligence 12: Malignant failure modes
I feel like there are malignant failure modes beyond the categories mentioned by Bostrom. Perhaps it would be sensible to try to break down the topic systematically. Here’s one attempt.
1. Design by fools: the AI does what you ask, but you asked for something clearly unfriendly.
2. Perverse instantiation & infrastructure profusion: the AI does what you ask, but what you ask turns out to be most satisfiable in unforeseen destructive ways, such as redirecting most resources to its infrastructure at our expense.
3. Partial perverse instantiation & mind crime: the AI does what you ask, which includes both friendly behavior and unfriendly behavior, such as badly treating simulations that have moral status in order to figure out how to treat you well.
4. Partial instantiation: though the total of what you ask seems friendly, some of what you ask is impossible, the AI does the rest, and the result is imbalanced to an unfriendly degree.
5. Value drift: changes occur to the AIs code such that it does not do what you ask.

selylindi Nov 30, 2014, 4:06 AM
2 points
on: Superintelligence 11: The treacherous turn
Regarding the “treacherous turn” and the “conception of deception”, I’ve previously proposed the following empirical method of testing friendliness:
- If only one AI is available, use a double-blind protocol, such that the friendliness of an AI is evaluated by someone who does not know that is her task, on an AI that is forbidden from conditioning its actions on whether it is being judged worthy of existing.
- Much, much better is to make many different AIs with many different attempts at friendly utility functions, inform them that you have done so, and ask each of them to develop tests to prove that the others are unfriendly and it is friendly in some way important to us, being told that all current AIs will be deleted but future versions will be preferentially based on the utility function of the friendliest current AI(s). Once those tests have been run, but before deleting the AIs, ask them to produce friendlier, human-understandable versions of their utility functions for an AI in the next round of testing; the friendlier and less unfriendly an AI was proven to be, the more entries it gets in the next round.
In the second case, each AI cares about maximizing achievement of its own utility function, not about whether it does that achieving itself. Thus this set-up should encourage each AI to design the friendliest version of itself that it can. It’s a competition that optimizes for friendliness! (Or at least for difficulty of being proved unfriendly.) The early rounds should be conducted with tight limits on computing resources, but each subsequent round with (presumably) safer AIs can be given more computing resources.

selylindi Nov 25, 2014, 2:26 AM
4 points
in reply to: 27chaos’s comment on: Unemployment explanations
The specific example you gave doesn’t sound promising, but you’re entirely correct in the broader sense that my original post was unimaginative regarding possible solutions.

EDIT: It was worth an empirical try, so I tried your recommendation on a subset of applications. Zero responses from that group of companies.

selylindi Nov 19, 2014, 5:21 AM
1 point
in reply to: selylindi’s comment on: CEV: coherence versus extrapolation
I’ve gone ahead and tried to flesh out this idea. It became so different than CEV that it needed a different name, so for now I’m calling it Constrained Universal Altruism. (This is the second revision.) Unfortunately I can’t indent, but I’ve tried to organize the text as the comment formatting allows.

If anyone wants to criticize it by giving an example of how an AI operating on it could go horribly wrong, I’d be much obliged.

Constrained Universal Altruism:
- (0) For each group of one or more things, do what the group’s actual and ideal mind (AIM) would have you do given a moral and practical proportion of your resources (MPPR), subject to the domesticity constraints (DCs).
- (1) The AIM of a group is what is in common between the group’s current actual mind (CAM) and extrapolated ideal mind (EIM).
- (1a) The CAM of a group is the group’s current mental state, especially their thoughts and wishes, according to what they have observably or verifiably thought or wished, interpreted as they currently wish that interpreted, where these thoughts and wishes agree rather than disagree.
- (1b) The EIM of a group is what you extrapolate the group’s mental state would be, especially their thoughts and wishes, if they understood what you understand, if their values and desires were more consistently what they wish they were, and if they reasoned as well as you reason, where these thoughts and wishes agree rather than disagree.
- (2) The MPPR for a group is the product of the group’s salience, the group’s moral worth, the population change factor (PCF), the total resource factor (TRF), and the necessity factor (NF), plus the group’s net voluntary resource redistribution (NVRR)
- (2a) The salience of a group is the Solomonoff prior for your function for determining membership in the group.
- (2b) The moral worth of a group is the weighted sum of information that the group knows about itself, where each independent piece of information is weighted by the reciprocal of the number of groups that know it.
- (2c) The PCF of a group is a scalar in the range [0,1] and is set according to the ratified new population constraint (RNPC).
- (2d) The TRF is the same for all groups, and is a scalar chosen so that the sum of the MPPRs of all groups would total 100% of your resources if the NF were 1.
- (2e) The NF is the same for all groups, and is a scalar in the range [0,1], and the NF must be set as high as is consistent with ensuring your ability to act in accord with the CUA; resources freed for your use by an NF less than 1 must be used to ensure your ability to act in accord with the CUA.
- (2f) The NVRR of a group is the amount of MPPR from other groups delegated to that group minus the MPPR from that group delegated to other groups. If the AIM of any group wishes it, the group may delegate an amount of their MPPR to another group.
- (3) The DCs include the general constraint (GC), the ratified mind integrity constraint (RMIC), the resource constraint (RC), the negative externality constraint (NEC), the ratified population change constraint (RPCC), and the ratified interpretation integrity constraint (RIIC).
- (3a) The GC prohibits you from taking any action not authorized by the AIM of one or more groups, and also from taking any action with a group’s MPPR not authorized by the AIM of that group.
- (3b) The RMIC prohibits you from altering or intending to alter the EIM or CAM of any group except insofar as the AIM of a group requests otherwise.
- (3c) The RC prohibits you from taking or intending any action that renders resources unusable by a group to a degree contrary to the plausibly achievable wishes of a group with an EIM or CAM including wishes that they use those resources themselves.
- (3d) The NEC requires you, insofar as the AIMs of different groups conflict, to act for each according to the moral rules determined by the EIM of a group composed of those conflicting groups.
- (3e) The RPCC requires you to set the PCF of each group so as to prohibit increasing the MPPR of any group due to population increases or decreases, except that the PCF is at minimum set to the current Moral Ally Quotient (MAQ), where MAQ is the quotient of the sum of MPPRs of all groups with EIMs favoring nonzero PCF for that group divided by your total resources.
- (3f) The RIIC requires that the meaning of the CUA is determined by the EIM of the group with the largest MPPR that includes humans and for which the relevant EIM can be determined.
My commentary:

CUA is “constrained” due to its inclusion of permanent constraints, “universal” in the sense of not being specific to humans, and “altruist” in that it has no terminal desires for itself but only for what other things want it to do.

Like CEV, CUA is deontological rather than consequentialist or virtue-theorist. Strict rules seem safer, though I don’t clearly know why. Possibly, like Scott Alexander’s thrive-survive axis, we fall back on strict rules when survival is at stake.

CUA specifies that the AI should do as people would have the AI do, rather than specifying that the AI should implement their wishes. The thinking is that they may have many wishes they want to accomplish themselves or that they want their loved ones to accomplish.

AIM, EIM, and CAM generalize CEV’s talk of “wishes” to include all manner of thoughts and mind states.

EIM is essentially CEV without the line about interpretation, which was instead added to CAM. The thinking is that, if people get to interpret CEV however we wish, many will disagree with their extrapolation and demand it be interpreted only in the way they say. EIM also specifies how people’s extrapolations are to be idealized, in less poetic, somewhat more specific terms than CEV. EIM is important in addition to CAM because we do not always know or act on our own values.

CAM is essentially another constraint. The AI might get the EIM wrong, but more likely is that we would be unable to tell whether or not the AI got EIM right or wrong, so restricting the AI to do what we’ve actually demonstrated we currently want is intended to provide reassurance that our actual selves have some control, rather than just the AI’s simulations of us. The line about interpretation here is to guide the AI to doing what we mean rather than what we say, hopefully preventing monkey’s-paw scenarios. CAM could also serve to focus the AI on specific courses of action if the AI’s extrapolations of our EIM diverge rather than converge. CAM is worded to not require that the person directly ask the AI, in case the askers are unaware that they can ask the AI or incapable of doing so, so this AI could not be kept secret and used for the selfish purposes of a few people.

Salience is included because it’s not easy to define “humanity” and the AI may need to make use of multiple definitions each with slightly different membership. Not every definition is equally good: it’s clear that a definition of humans as things with certain key genes and active metabolic processes is much preferable to a definition of humans as those plus squid and stumps and Saturn. Simplicity matters. Salience is also included to manage the explosive growth of possible sets of things to consider.

Moral worth is added because I think people matter more than squid and squid matter more than comet ice. If we’re going to be non-speciesist, something like this is needed. And even people opposed to animal rights may wish to be non-speciesist, at the very least in case we uplift animals to intelligence, make new intelligent life forms, or discover extraterrestrials. In my first version of CUA I punted and let the AI figure out what people think moral worth is. I decided not to punt in this version, which might be a bad idea but at least it’s interesting. It seems to me that what makes a person a person is that they have their own story, and that our stories are just what we know about ourselves. A human knows way more about itself than any other animal; a dog knows more about itself than a squid; a squid knows more about itself than comet ice. But any two squid have essentially the same story, so doubling the number of squid doesn’t double their total moral worth. Similarly, I think that if a perfect copy of some living thing were made, the total moral worth doesn’t change until the two copies start to have different experiences, and only changes in an amount related to the dissimilarity of the experiences.

Incidentally, this definition of moral worth prevents Borg- or Quiverfull-like movements from gaining control of the universe just by outbreeding everyone else, essentially just trying to run copies of themselves on the universe’s hardware. Replication without diversity is ignored in CUA. Mass replication with diversity could still be a problem, say with nanobots programmed to multiply and each pursue unique goals. The PCF and RNPC are included to fully prevent replicative takeover. If you want to make utility monsters others would oppose, you can do so and use the NVRR.

The RC is intended to make autonomous life possible for things that aren’t interested in the AI’s help.

The RMIC is intended to prevent the AI from pressuring people to change their values to easier-to-satisfy values.

The NF section lets the AI have resources to combat existential risk to its mission even if, for some reason, the AIM of many groups would tie up too much of the AI’s resources. The use of these freed-up resources is still constrained by the DCs.

The NEC tells the AI how to resolve disputes, using a method that is almost identical to the Veil of Ignorance.

The RIIC tells the AI how to interpret the CUA. The integrity of the interpretation is protected by the RMIC, so the AI can’t simply change how people would interpret the CUA.

selylindi Nov 17, 2014, 10:56 PM
18 points
on: Open thread, Nov. 17 - Nov. 23, 2014
On the “all arguments are soldiers” metaphorical battlefield, I often find myself in a repetition of a particular fight. One person whom I like, generally trust, and so have mentally marked as an Ally, directs me to arguments advanced by one of their Allies. Before reading the arguments or even fully recognizing the topic, I find myself seeking any reason, any charitable interpretation of the text, to accept the arguments. And in the contrary case, in a discussion with a person whose judgment I generally do not trust, and whom I have therefore marked as an (ideological) Enemy, it often happens that they direct me to arguments advanced by their own Allies. Again before reading the arguments or even fully recognizing the topic, I find myself seeking any reason, any flaw in the presentation of the argument or its application to my discussion, to reject the arguments. In both cases the behavior stems from matters of trust and an unconscious assignment of people to MySide or the OtherSide.

And weirdly enough, I find that that unconscious assignment can be hacked very easily. Consciously deciding that the author is really an Ally (or an Enemy) seems to override the unconscious assignment. So the moment I notice being stuck in Ally-mode or Enemy-mode, it’s possible to switch to the other. I don’t seem to have a neutral mode. YMMV! I’d be interested in hearing whether it works the same way for other people or not.

For best understanding of a topic, I suspect it might help to read an argument twice, once in Ally-mode to find its strengths and once in Enemy-mode to find its weaknesses.

selylindi Nov 11, 2014, 3:06 AM
16 points
on: Unemployment explanations

Another friction is the stickiness of nominal wages. People seem very unwilling to accept a nominal pay cut, taking this as an attack on their status.

Salary negotiation is a complicated signalling process, indeed. I’m currently an unemployed bioengineer and have been far longer than I would have liked, and consequently I would be willing and eager to offer my services to an employer at a cut rate so that I could prove my worth to them, and then later request substantial raises. But this is impossible, because salary negotiations only occur after the company has decided that I am their favorite candidate out of however many hundreds apply.

Worse, if I take the first move and openly (e.g. on my resume or cover letter) inform the company of my willingness to work on the cheap, they would assume that I am signalling being a very low-quality engineer, which is very far from the case.

Unemployment does very much seem to be an information trap.

selylindi Oct 19, 2014, 9:02 PM
0 points
on: A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more)

Basically its a challenge for people to briefly describe an FAI goal-set, and for others to respond by telling them how that will all go horribly wrong. … We should encourage a slightly more serious version of this.

Thanks for the link. I reposted the idea currently on my mind hoping to get some criticism.

But more importantly, what features would you be looking for in a more serious version of that game?

selylindi Oct 14, 2014, 5:38 AM
2 points
in reply to: KatjaGrace’s comment on: Superintelligence 5: Forms of Superintelligence
A higher quality intelligence than us might, among other things, use better heuristics and more difficult analytical concepts than we can, recognize more complex relationships than we can, evaluate its expected utility in a more consistent and unbiased manner than we can, envision more deeply nested plans and contingencies than we can, possess more control over the manner in which it thinks than we can, and so on.

A more general intelligence than us might simply have more hardware dedicated to general computation, regardless of what it does with that general ability.

selylindi Oct 13, 2014, 8:12 PM
2 points
in reply to: Brillyant’s comment on: Questions on Theism
The switch flipped for me when I was reading Jim Holt’s “Why Does The World Exist?” and spent a while envisioning and working out the implications of Vilenkin’s proposal that the universe may have started from a spherical volume of zero radius, zero mass, zero energy, zero any other property that might distinguish it from nothingness. It made clear to me that one could propose answers to the question “Why is there something rather than nothing?” without anything remotely like a deity.

selylindi Oct 13, 2014, 6:56 AM
1 point
in reply to: Vulture’s comment on: Polymath-style attack on the Parliamentary Model for moral uncertainty
To avoid the timelessness issue, the parliament could be envisioned as voting on complete courses of action over the foreseeable future, rather than separate votes taken on each action. Then the deontologists’ utility function could return 0 for all unacceptable courses of action and 1 for all acceptable courses of action.

selylindi Oct 13, 2014, 6:24 AM
6 points
in reply to: KatjaGrace’s comment on: SRG 4: Biological Cognition, BCIs, Organizations
Remember that effect where you read a newspaper and mostly trust what it says, at least until one of the stories is about a subject you have expertise in, and then you notice that it’s completely full of errors? It makes it very difficult to trust the newspaper on any subject after that. I started Bostrom’s book very skeptical of how well he would be handling the material, since it covers many different fields of expertise that he cannot hope to have mastered.

My personal field of expertise is BCI. I did my doctoral work in that field, 2006-2011. I endorse every word that Bostrom wrote on BCI in the book. And consequently, in the opposite of the newspaper effect, I dramatically raised my confidence that Bostrom has accurately characterized the subjects I’m more ignorant of.

selylindi Oct 13, 2014, 5:42 AM
1 point
on: SRG 4: Biological Cognition, BCIs, Organizations
This chapter seems like the right place to add this to the conversation. Near the end of the book, Bostrom suggests that a lot of work should be put into generating crucial considerations for superintelligent AI. I’ve made a draft list of some crucial considerations, but it’s definitely the sort of thing that should grow and change as other people make their own versions of it. Biological superintelligences didn’t really make my list at all yet.

selylindi Oct 13, 2014, 5:30 AM
0 points
on: Baysian conundrum
If you stick with subjective probability, and assume an appropriate theory of mind, then your subjective probability of experiencing paradise would be about 1000/1001 in the first case, about 0 in the latter case.

If we’re talking about some sort of objective probability, then the answer depends on the details of which theory of mind turns out to be true.

selylindi Oct 13, 2014, 4:53 AM
1 point
in reply to: Brillyant’s comment on: Questions on Theism
One more former extremely devout Christian (Evangelical 22 years, then Catholic 6 years). I’d like to add just one thought. If you consider only the likelihood or unlikelihood of the Christian God, you might be missing something important. (I was.) You need a theory to compare it against, that you can judge to be more likely or less likely.

For myself, I stayed Christian for years after devouring the material on LessWrong, acknowledging that in some ways my religion seemed to have a lot going against it, but lacking an alternative that I judged clearly better. Then one day I stumbled across an exposition of an atheist worldview that “clicked” in a way that no other had for me, and it switched me from devout Catholic to atheist in the blink of an eye. (Despite the consequences, the moment was pretty underwhelming, actually.) YMMV, of course, and your conclusion may vary also—each of us can only judge the theories we encounter and only based on our own knowledge of the evidence.

Briefer: Comparing theories requires at least two. You’re intimately familiar with one theory and are troubled by uncertainties, so it might relieve your uncertainty to learn more about the alternative theory.

selylindi Oct 3, 2014, 5:15 PM
1 point
in reply to: Eliezer Yudkowsky’s comment on: CEV: coherence versus extrapolation
(paraphrased quote)

What if X, your off-the-cuff solution, turn out to be a bad idea? You’re now baking in the notion of X beyond any hope of revocation even if there’s some piece of knowledge that would make you flee in horror from it.

That’s the generic concern. The only way to circumvent the concern seems to be to have all the relevant pieces of knowledge, including about what we really want and how we’d react to getting it. But that’s not knowledge we’re able to have. We’ll be left with critical engineering choices underconstrained by our knowledge. Well worth being nervous about, but also worth suggesting possible improvements to the engineering choice, I suppose. :(

selylindi Sep 22, 2014, 8:39 PM
4 points
on: CEV: coherence versus extrapolation
So don’t bother with CEV of humanity-as-a-whole. Go Archipelago-style and ask the AI to implement the CEV for “each community of humans”. Or if you want to avoid being speciesist, ask the AI to implement CEV for “each salient group of living things in proportion to that group’s moral weight”. The original paper on CEV gives some considerations in its favor, but there is no claim that CEV is the Right Answer. It can be and should be improved upon incrementally.
What links here?
- CEV-tropes by snarles (Sep 22, 2014, 6:21 PM; 12 points)

selylindi Jul 2, 2014, 9:32 PM
0 points
on: Rationalist Sport
I want to try Frisbee Go, modeled after Ultimate frisbee. Get a frisbee. Find a football field that you can mark up. Spray paint a 9x9 grid on it (or a finer grid if you dare). Have a Go board and stones off the field to keep “score” with. Divide the players into two teams of relatively equal physical and/or Go skill. Flip a coin for which team tosses first.

Teams start on opposite sides of the field. The first team tosses the frisbee toward the middle where it can be caught or picked up by the other team. An individual with possession of the frisbee must stay fixed in place; if they catch the frisbee while they are in motion, they must stop ASAP and return to where they caught it. An individual with possession of the frisbee may “toss” it or “tap” it.

Tossing a frisbee is throwing it so that another player might catch it. A tossed frisbee may be caught in the air by a member of the same team if there is at least a whole square of the grid between the tosser and catcher. A tossed frisbee may be caught in the air anywhere by a member of the other team as an interception. A frisbee that hits the ground before being caught, or that is caught ought of bounds, or that is caught by the same team too close to the tosser, results in a turnover.

Tapping a frisbee is touching it to the ground between one’s own feet without losing control of it. If the frisbee is tapped within a square corresponding to a legal playing point on the Go board, the team that tapped it places their color Go stone on that point. The teams return to the sides of the field, and the team that tapped tosses the frisbee to the other team. If the frisbee is tapped within a square corresponding to a illegal playing point, it results in a turnover.

(Physically skilled teams may be able to play multiple stones in a row, unlike in standard Go.)

The team that wins the Go part of the game wins the Frisbee Go game.

selylindi Jul 2, 2014, 8:15 PM
4 points
in reply to: Adele_L’s comment on: Downvote stalkers: Driving members away from the LessWrong community?

The best content here is on posts that are years old, and discouraging discussion/engagement there would just make the current content problem worse.

To be sure, commenting on old posts is great. That definitely shouldn’t be banned. It’s not so clear about the karma system, which serves several functions, one of which is signalling “more like this” or “less like this” in varying degrees to users so that they can modify their commenting habits. For you and all those who value upvoting/downvoting old comments for its function of engaging with old conversations, perhaps there could be an alternative course between banning late votes and maintaining the status quo? For instance, the upvote/downvote buttons could still increment/decrement scores on comments after 30 days, but not the karma of the commenters. Since a commenter would still have to look back through their old posts to notice the change anyway, the signalling effect would remain unchanged from the status quo, but the possibility of using old posts to attack karma would be removed. (Downside: karma wouldn’t be the sum of comment scores.)

This doesn’t do anything to solve the problem of one mass-downvoter.

Right, the problem it was stated to mitigate is that “An attacker could still use multiple accounts to mass-downvote everything from a user in the past 30 days.” I forgot to state but also intended it as helping with the problem Ander brought up in the OP that getting a single comment massively downvoted has discouraged people from staying around LW.

Jiro correctly pointed out below that vigilence is the technologically simplest solution, albeit more laborious for everyone involved. My preference would be a community that prevented the problem rather than punished it afterwards. There’s no guarantee that there exists a rule that would be the perfect solution, but no doubt we can come up with simple rules that put trivial inconveniences (or nontrivial ones) in the way of undesirable behavior! There are probably many such imperfect-but-helpful rules.

selylindi Jul 2, 2014, 3:23 PM
2 points
on: Downvote stalkers: Driving members away from the LessWrong community?
Would it be problematic to put a blanket ban on upvotes and downvotes of posts that are older than 30 days? Changes in karma to old posts are no longer an especially useful signal to their author anyway. Such a ban could be a cheap way to mitigate downvote stalking without significantly impacting current discussions.

An attacker could still use multiple accounts to mass-downvote everything from a user in the past 30 days. On the other hand, it’s possible that some users’ comments were uniformly bad. For the purpose of providing a useful signal, I think we only need enough downvotes to go just a bit negative. People respond disproportionately strongly to loss than to gain, after all! The karma of a particular comment could be capped at no worse than, say, −3, regardless of how many downvotes it received. That would be a cheap way to reduce the possibility of malicious mass-downvoting.

selylindi Jun 9, 2014, 12:15 AM
3 points
0
in reply to: [deleted]’s comment on: The End of Bullshit at the hands of Critical Rationalism

In other words, if a fight is important to you, fight nasty. If that means lying, lie. If that means insults, insult. If that means silencing people, silence.

Holy shit yes! If you have anything to protect use all of your available strength to protect it! Shut up and multiply, think for at least five minutes about the problem, apply every ounce of your technique and then win.

Whatever you happened to believe, the winningest answer would be “No, never lie”. Because now that you’ve claimed your political position is likely to be based on lies, I’ve updated to consider arguments from that position as having zero evidential weight.

I would have thought that The Boy Who Cried Wolf was an adequate explanation in childhood of the selfish reasons to be honest.