Sounds like you agree with both me and Ninety-Three about the descriptive claim that the Shapley Value has, in fact, been changed, and have not yet expressed any position regarding the normative claim that this is a problem?
Dweomite
I’m not sure what you’re trying to say.
My concern is that if Bob knows that Alice will consent to a Shapley distribution, then Bob can seize more value for himself without creating new value. I feel that a person or group shouldn’t be able to get a larger share by intentionally hobbling themselves.
If B1 and B2 structure their cartel such that each of them gets a veto over the other, then the synergies change so that A+B1 and A+B2 both generate nothing, and you need A+B1+B2 to make the $100, which means B1 and B2 each now have a Shapley value of $33.3 (up from $25).
Also, I wouldn’t describe the original Shapley Values as “no coordination”. With no coordination, there’s no reason the end result should involve paying any non-zero amount to both B1 and B2, since you only need one of them to assent. I think Shapley Values represent a situation that’s more like “everyone (including Alice) coordinates”.
A problem I have with Shapley Values is that they can be exploited by “being more people”.
Suppose Alice and Bob can make a joint venture with a payout of $300. Synergies:
A: $0
B: $0
A+B: $300
Shapley says they each get $150. So far, so good.
Now suppose Bob partners with Carol and they make a deal that any joint ventures require both of them to approve; they each get a veto. Now the synergies are:
A+B: $0 (Carol vetoes)
A+C: $0 (Bob vetoes)
B+C: $0 (venture requires Alice)
A+B+C: $300
Shapley now says Alice, Bob, and Carol each get $100, which means Bob+Carol are getting more total money ($200) than Bob alone was ($150), even though they are (together) making exactly the same contribution that Bob was paid $150 for making in the first example.
(Bob personally made less, but if he charges Carol a $75 finder’s fee then Bob and Carol both end up with more money than in the first example, while Alice ends up with less.)
By adding more partners to their coalition (each with veto power over the whole collective), the coalition can extract an arbitrarily large share of the value.
Seems like that guy has failed to grasp the fact that some things are naturally more predictable than others. Estimating how much concrete you need to build a house is just way easier than estimating how much time you need to design and code a large novel piece of software (even if the requirements don’t change mid-project).
Is that error common? I can only recall encountering one instance of it with surety, and I only know about that particular example because it was signal-boosted by people who were mocking it.
I’m confused about how continuity poses a problem for “This sentence has truth value in [0,1)” without also posing an equal problem for “this sentence is false”, which was used as the original motivating example.
I’d intuitively expect “this sentence is false” == “this sentence has truth value 0″ == “this sentence does not have a truth value in (0,1]”
On my model, the phrase “I will do X” can be either a plan, a prediction, or a promise.
A plan is what you intend to do.
A prediction is what you expect will happen. (“I intend to do my homework after dinner, but I expect I will actually be lazy and play games instead.”)
A promise is an assurance. (“You may rely upon me doing X.”)
How about this: I train on all available data, but only report performance for the lots predicted to be <$1000?
This still feels squishy to me (even after your footnote about separately tracking how many lots were predicted <$1000). You’re giving the model partial control over how the model is tested.
The only concrete abuse I can immediately come up with is that maybe it cheats like you predicted by submitting artificially high estimates for hard-to-estimate cases, but you miss it because it also cheats in the other direction by rounding down its estimates for easier-to-predict lots that are predicted to be just slightly over $1000.
But just like you say that it’s easier to notice leakage than to say exactly how (or how much) it’ll matter, I feel like we should be able to say “you’re giving the model partial control over which problems the model is evaluated on, this seems bad” without necessarily predicting how it will matter.
My instinct would be to try to move the grading closer to the model’s ultimate impact on the client’s interests. For example, if you can determine what each lot in your data set was “actually worth (to you)”, then perhaps you could calculate how much money would be made or lost if you’d submitted a given bid (taking into account whether that bid would’ve won), and then train the model to find a bidding strategy with the highest expected payout.
But I can imagine a lot of reasons you might not actually be able to do that: maybe you don’t know the “actual worth” in your training set, maybe unsuccessful bids have a hard-to-measure opportunity cost, maybe you want the model to do something simpler so that it’s more likely to remain useful if your circumstances change.
Also you sound like you do this for a living so I have about 30% probability you’re going to tell me that my concerns are wrong-headed for some well-studied reason I’ve never heard of.
I think you’re still thinking in terms of something like formalized political power, whereas other people are thinking in terms of “any ability to affect the world”.
Suppose a fantastically powerful alien called Superman comes to earth, and starts running around the city of Metropolis, rescuing people and arresting criminals. He has absurd amounts of speed, strength, and durability. You might think of Superman as just being a helpful guy who doesn’t rule anything, but as a matter of capability he could demand almost anything from the rest of the world and the rest of the world couldn’t stop him. Superman is de facto ruler of Earth; he just has a light touch.
If you consider that acceptable, then you aren’t objecting to “god-like status and control”, you just have opinions about how that control should be exercised.
If you consider that UNacceptable, then you aren’t asking for Superman to behave in certain ways, you are asking for Superman to not exist (or for some other force to exist that can check him).
Most humans (probably including you) are currently a “prisoner” of a coalition of humans who will use armed force to subdue and punish you if you take any actions that the coalition (in its sole discretion) deems worthy of such punishment. Many of these coalitions (though not all of them) are called “governments”. Most humans seem to consider the existence of such coalitions to be a good thing on balance (though many would like to get rid of certain particular coalitions).
I will grant that most commenters on LessWrong probably want Superman to take a substantially more interventionist approach than he does in DC Comics (because frankly his talents are wasted stopping petty crime in one city).
Most commenters here still seem to want Superman to avoid actions that most humans would disapprove of, though.
Then we’re no longer talking about “the way humans care about their friends”, we’re inventing new hypothetical algorithms that we might like our AIs to use. Humans no longer provide an example of how that behavior could arise naturally in an evolved organism, nor a case study of how it works out for people to behave that way.
My model is that friendship is one particular strategy for alliance-formation that happened to evolve in humans. I expect this is natural in the sense of being a local optimum (in the ancestral environment), but probably not in the sense of being simple to formally define or implement.
I think friendship is substantially more complicated than “I care some about your utility function”. For instance, you probably stop valuing their utility function if they betray you (friendship can “break”). I also think the friendship algorithm includes a bunch of signalling to help with coordination (so that you understand the other person is trying to be friends), and some less-pleasant stuff like evaluations of how valuable an ally the other person is and how the friendship will affect your social standing.
Friendship also appears to include some sort of check that the other person is making friendship-related-decisions using system 1 instead of system 2--possibly as a security feature to make it harder for people to consciously exploit (with the unfortunate side-effect that we penalize system-2-thinkers even when they sincerely want to be allies), or possibly just because the signalling parts evolved for system 1 and don’t generalize properly.
(One could claim that “the true spirit of friendship” is loving someone unconditionally or something, and that might be simple, but I don’t think that’s what humans actually implement.)
You appear to be thinking of power only in extreme terms (possibly even as an on/off binary). Like, that your values “don’t have power” unless you set up a dictatorship or something.
But “power” is being used here in a very broad sense. The personal choices you make in your own life are still a non-zero amount of power to whatever you based those choices on. If you ever try to persuade someone else to make similar choices, then you are trying to increase the amount of power held by your values. If you support laws like “no stealing” or “no murder” then you are trying to impose some of your values on other people through the use of force.
I mostly think of government as a strategy, not an end. I bet you would too, if push came to shove; e.g. you are probably stridently against murdering or enslaving a quarter of the population, even if the measure passes by a two-thirds vote. My model says almost everyone would endorse tearing down the government if it went sufficiently off the rails that keeping it around became obviously no longer a good instrumental strategy.
Like you, I endorse keeping the government around, even though I disagree with it sometimes. But I endorse that on the grounds that the government is net-positive, or at least no worse than [the best available alternative, including switching costs]. If that stopped being true, then I would no longer endorse keeping the current government. (And yes, it could become false due to a great alternative being newly-available, even if the current government didn’t get any worse in absolute terms. e.g. someone could wait until democracy is invented before they endorse replacing their monarchy.)
I’m not sure that “no one should have the power to enforce their own values” is even a coherent concept. Pick a possible future—say, disassembling the earth to build a Dyson sphere—and suppose that at least one person wants it to happen, and at least one person wants it not to happen. When the future actually arrives, it will either have happened, or not—which means at least one person “won” and at least one person “lost”. What exactly does it mean for “neither of those people had the power to enforce their value”, given that one of the values did, in fact, win? Don’t we have to say that one of them clearly had enough power to stymie the other?
You could say that society should have a bunch of people in it, and that no single person should be able to overpower everyone else combined. But that doesn’t prevent some value from being able to overpower all other values, because a value can be endorsed by multiple people!
I suppose someone could hypothetically say that they really only care about the process of government and not the result, such that they’ll accept any result as long as it is blessed by the proper process. Even if you’re willing to go to that extreme, though, that still seems like a case of wanting “your values” to have power, just where the thing you value is a particular system of government. I don’t think that having this particular value gives you any special moral high ground over people who value, say, life and happiness.
I also think that approximately no one actually has that as a terminal value.
In the context of optimization, values are anything you want (whether moral in nature or otherwise).
Any time a decision is made based on some value, you can view that value as having exercised power by controlling the outcome of that decision.
Or put more simply, the way that values have power, is that values have people who have power.
I feel like your previous comment argues against that, rather than for it. You said that people who are trapped together should be nice to each other because the cost of a conflict is very high. But now you’re suggesting that ASIs that are metaphorically trapped together would aggressively attack each other to enforce compliance with their own behavioral standards. These two conjectures do not really seem allied to me.
Separately, I am very skeptical of aliens warring against ASIs to acausally protect us. I see multiple points where this seems likely to fail:
Would aliens actually take our side against an ASI merely because we created it? If humans hear a story about an alien civilization creating a successor species, and then the successor species overthrowing its creators, I do not expect humans to automatically be on the creators’ side in this story. I expect humans will take a side mostly based on how the two species were treating each other (overthrowing abusive masters is usually portrayed as virtuous in our fiction), and that which one of them is the creator will have little weight. I do not think “everyone should be aligned with their creators” is a principle that humans would actually endorse (except by motivated reasoning, in situations where it benefits us).
Also note that humans are not aligned with the process that produced us (evolution) and approximately no humans think this is a problem
Even if the aliens sympathize with us, would they care enough to take expensive actions about it?
Even if the aliens would war to save us, would the ASI predict that? It can only acausally save us if the ASI successfully predicts the policy. Otherwise, the war might still happen, but that doesn’t help us.
Even if the ASI predicts this, will it comply? This seems like what dath ilan would consider a “threat”, in that the aliens are punishing the ASI rather than enacting their own BATNA. It may be decision-theoretically correct to ignore the threat.
This whole premise, of us being saved at the eleventh hour by off-stage actors, seems intuitively like the sort of hypothesis that would be more likely to be produced by wishful thinking than by sober analysis, which would make me distrust it even if I couldn’t see any specific problems with it.
I don’t see why either expecting or not-expecting to meet other ASIs would make it instrumental to be nice to humans.
I have an intuition like: Minds become less idiosyncratic as they grow up.
A couple of intuition pumps:
(1) If you pick a game, and look at novice players of that game, you will often find that they have rather different “play styles”. Maybe one player really likes fireballs and another really like crossbows. Maybe one player takes a lot of risks and another plays it safe.
Then if you look at experts of that particular game, you will tend to find that their play has become much more similar. I think “play style” is mostly the result of two things: (a) playing to your individual strengths, and (b) using your aesthetics as a tie-breaker when you can’t tell which of two moves is better. But as you become an expert, both of these things diminish: you become skilled at all areas of the game, and you also become able to discern even small differences in quality between two moves. So your “play style” is gradually eroded and becomes less and less noticeable.
(2) Imagine if a society of 3-year-olds were somehow in the process of creating AI, and they debated whether their AI would show “kindness” to stuffed animals (as an inherent preference, rather than an instrumental tool for manipulating humans). I feel like the answer to this should be “lol no”. Showing “kindness” to stuffed animals feels like something that humans correctly grow out of, as they grow up.
It seems plausible to me that something like “empathy for kittens” might be a higher-level version of this, that humans would also grow out of (just like they grow out of empathy for stuffed animals) if the humans grew up enough.
(Actually, I think most humans adults still have some empathy for stuffed animals. But I think most of us wouldn’t endorse policies designed to help stuffed animals. I’m not sure exactly how to describe the relation that 3-year-olds have to stuffed animals but adults don’t.)
I sincerely think caring about kittens makes a lot more sense than caring about stuffed animals. But I’m uncertain whether that means we’ll hold onto it forever, or just that it takes more growing-up in order to grow out of it.
Paul frames this as “mostly a question about idiosyncrasies and inductive biases of minds rather than anything that can be settled by an appeal to selection dynamics.” But I’m concerned that might be a bit like debating the odds of whether your newborn human will one day come to care for stuffed animals, instead of whether they will continue to care for them after growing up. It can be very likely that they will care for a while, and also very likely that they will stop.
I strongly suspect it is possible for minds to become quite a lot more grown-up than humans currently are.
(I think Habryka may have been saying something similar to this.)
Still, I notice that I’m doing a lot of hand-waving here and I lack a gears-based model of what “growing up” actually entails.
Speaking as a developer, I would rather have a complete worked-out example as a baseline for my modifications than a box of loose parts.
I do not think that the designer mindset of unilaterally specifying neutral rules to provide a good experience for all players is especially similar to the negotiator mindset of trying to make the deal that will score you the most points.
I haven’t played Optimal Weave yet, but my player model predicts that a nontrivial fraction of players are going to try to trick each other during their first game. Also I don’t think any hidden info or trickery is required in order for rule disagreements to become an issue.
then when they go to a meetup or a con, anyone they meet will have a different version
No, that would actually be wonderful. We can learn from each other and compile our best findings.
That’s...not the strategy I would choose for playtesting multiple versions of a game. Consider:
Testers aren’t familiar with the mainline version and don’t know how their version differs from it, so can’t explain what their test condition is or how their results differ
You don’t know how their version differs either, or even whether it differs, except by getting them to teach you their full rules.
There’s a high risk they will accidentally leave out important details of the rules—even professional rulebooks often have issues, and that’s not what you’ll be getting. So interpreting whatever feedback you get will be a significant issue.
You can’t guarantee that any particular version gets tested
You can’t exclude variants that you believe are not worth testing
You can’t control how much testing is devoted to each version
Many players may invent bad rules and then blame their bad experience on your game, or simply refuse to play at all if you’re going to force them to invent rules, so you end up with a smaller and less-appreciative playerbase overall
The only real advantage I see to this strategy is that it may result in substantially more testers than asking for volunteers. But it accomplishes that by functionally deceiving your players about the fact that they’re testing variants, which isn’t a policy I endorse, either on moral or pragmatic grounds.
Most of the people that you’ve tricked into testing for you will never actually deliver any benefits to you. Even among volunteers, only a small percentage of playtesters actually deliver notable feedback (perhaps a tenth, depending on how you recruit). Among people who wouldn’t have volunteered, I imagine the percentage will be much lower.
[failed line of thought, don’t read]
Maybe limit it to bringing 1 thing with you? But notice this permits “stealing” items from other players, since “being carried” is not a persistent state.
“longer descriptions of the abilities”
I’d like that. That would be a good additional manual page, mostly generated.
If you’re imagine having a computer program generate this, I’m not sure how that could work. The purpose is not merely to be verbose, but to act as a FAQ for each specific ability, hopefully providing a direct answer whatever question prompted them to look that ability up.
If you aren’t familiar with this practice, maybe take a look at the Dominion rulebook as an example.
I think I could take a stab at a summary.
This is going to elide most of the actual events of the story to focus on the “main conflict” that gets resolved at the end of the story. (I may try to make a more narrative-focused outline later if there’s interest, but this is already quite a long comment.)
As I see it, the main conflict (the exact nature of which doesn’t become clear until quite late) is mainly driven by two threads that develop gradually throughout the story… (major spoilers)
The first thread is Keltham’s gradual realization that the world of Golarion is pretty terrible for mortals, and is being kept that way by the power dynamics of the gods.
The key to understanding these dynamics is that certain gods (and coalitions of gods) have the capability to destroy the world. However, the gods all know (Eliezer’s take on) decision theory, so you can’t extort them by threatening to destroy the world. They’ll only compromise with you if you would honestly prefer destroying the world to the status quo, if those were your only two options. (And they have ways of checking.) So the current state of things is a compromise to ensure that everyone who could destroy the world, prefers not to.
Keltham would honestly prefer destroying Golarion (primarily because a substantial fraction of mortals currently go to hell and get tortured for eternity), so he realizes that if he can seize the ability to destroy the world, then the gods will negotiate with him to find a mutually-acceptable alternative.
Keltham speculates (though it’s only speculation) that he may have been sent to Golarion by some powerful but distant entity from the larger multiverse, as the least-expensive way of stopping something that entity objects to.
The second thread is that Nethys (god of knowledge, magic, and destruction) has the ability to see alternate versions of Golarion and to communicate with alternate versions of himself, and he’s seen several versions of this story play out already, so he knows what Keltham is up to. Nethys wants Keltham to succeed, because the new equilibrium that Keltham negotiates is better (from Nethys’ perspective) than the status quo.
However, it is absolutely imperative that Nethys does not cause Keltham to succeed, because Nethys does not prefer destroying the world to the status quo. If Keltham only succeeds because of Nethys’ interventions, the gods will treat Keltham as Nethys’ pawn, and treat Keltham’s demands as a threat from Nethys, and will refuse to negotiate.
So Nethys can only intervene in ways that all of the major gods will approve of (in retrospect). So he runs around minimizing collateral damage, nudges Keltham towards being a little friendlier in the final negotiations, and very carefully never removes any obstacle from Keltham’s path until Keltham has proven that he can overcome it on his own.
Nethys considers is likely that this whole situation was intentionally designed as some sort of game, by some unknown entity. (Partly because Keltham makes several successful predictions based on dath ilani game tropes.)
At the end of the story, Keltham uses an artifact called the starstone to turn himself into a minor god, then uses his advanced knowledge of physics (unknown to anyone else in the setting, including the gods) to create weapons capable of destroying the world, announces that that’s his BATNA, and successfully negotiates with the rest of the gods to shut down hell, stop stifling mortal technological development, and make a few inexpensive changes to improve overall mortal quality-of-life. Keltham then puts himself into long-term stasis to see if the future of this world will seem less alienating to him than the present.