You sound like you’re positing the existence of two type of people: type I people who have morality based on “reason” and type II people who have morality based on the “status game”. In reality, everyone’s nearly everyone’s morality is based on something like the status game (see also: 123). It’s just that EAs and moral philosophers are playing the game in a tribe which awards status differently.
The true intrinsic values of most people do place a weight on the happiness of other people (that’s roughly what we call “empathy”), but this weight is very unequally distributed.
There are definitely thorny questions regarding the best way to aggregate the values of different people in TAI. But, I think that given a reasonable solution, a lower bound on the future is imagining that the AI will build a private utopia for every person, as isolated from the other “utopias” as that person wants it to be. Probably some people’s “utopias” will not be great, viewed in utilitarian terms. But, I still prefer that over paperclips (by far). And, I suspect that most people do (even if they protest it in order to play the game).
It’s just that EAs and moral philosophers are playing the game in a tribe which awards status differently.
Sure, I’ve said as much in recent comments, including this one. ETA: Related to this, I’m worried about AI disrupting “our” status game in an unpredictable and possibly dangerous way. E.g., what will happen when everyone uses AI advisors to help them play status games, including the status game of moral philosophy?
The true intrinsic values of most people do place a weight on the happiness of other people (that’s roughly what we call “emapthy”), but this weight is very unequally distributed.
What do you mean by “true intrinsic values”? (I couldn’t find any previous usage of this term by you.) How do you propose finding people’s true intrinsic values?
These weights, if low enough relative to other “values”, haven’t prevented people from committing atrocities on each other in the name of morality.
There are definitely thorny questions regarding the best way to aggregate the values of different people in TAI. But, I think that given a reasonable solution, a lower bound on the future is imagining that the AI will build a private utopia for every person, as isolated from the other “utopias” as that person wants it to be.
This implies solving a version of the alignment problem that includes reasonable value aggregation between different people (or between AIs aligned to different people), but at least some researchers don’t seem to consider that part of “alignment”.
Given that playing status games and status competition between groups/tribes/status games constitute a huge part of people’s lives, I’m not sure how private utopias that are very isolated from each other would work. Also, I’m not sure if your solution would prevent people from instantiating simulations of perceived enemies / “evil people” in their utopias and punishing them, or just simulating a bunch of low status people to lord over.
Probably some people’s “utopias” will not be great, viewed in utilitarian terms. But, I still prefer that over paperclips (by far).
I concede that a utilitarian would probably find almost all “aligned” futures better than paperclips. Perhaps I should have clarified that by “parts of me” being more scared, I meant the selfish and NU-leaning parts. The utilitarian part of me is just worried about the potential waste caused by many or most “utopias” being very suboptimal in terms of value created per unit of resource consumed.
What do you mean by “true intrinsic values”? (I couldn’t find any previous usage of this term by you.) How do you propose finding people’s true intrinsic values?
I mean the values relative to which a person seems most like a rational agent, arguably formalizable along these lines.
These weights, if low enough relative to other “values”, haven’t prevented people from committing atrocities on each other in the name of morality.
Yes.
This implies solving a version of the alignment problem that includes reasonable value aggregation between different people (or between AIs aligned to different people), but at least some researchers don’t seem to consider that part of “alignment”.
Yes. I do think multi-user alignment is an important problem (and occasionally spend some time thinking about it), it just seems reasonable to solve single user alignment first. Andrew Critch is an example of a person who seems to be concerned about this.
Given that playing status games and status competition between groups/tribes/status games constitute a huge part of people’s lives, I’m not sure how private utopias that are very isolated from each other would work.
I meant that each private utopia can contain any number of people created by the AI, in addition to its “customer”. Ofc groups that can agree on a common utopia can band together as well.
Also, I’m not sure if your solution would prevent people from instantiating simulations of perceived enemies / “evil people” in their utopias and punishing them, or just simulating a bunch of low status people to lord over.
They are prevented from simulating other pre-existing people without their consent, but can simulate a bunch of low status people to lord over. Yes, this can be bad. Yes, I still prefer this (assuming my own private utopia) over paperclips. And, like I said, this is just a relatively easy to imagine lower bound, not necessarily the true optimum.
Perhaps I should have clarified that by “parts of me” being more scared, I meant the selfish and NU-leaning parts.
The selfish part, at least, doesn’t have any reason to be scared as long as you are a “customer”.
They are prevented from simulating other pre-existing people without their consent
Why do you think this will be the result of the value aggregation (or a lower bound on how good the aggregation will be)? For example, if there is a big block of people who all want to simulate person X in order to punish that person, and only X and a few other people object, why won’t the value aggregation be “nobody pre-existing except X (and Y and Z etc.) can be simulated”?
Given some assumptions about the domains of the utility functions, it is possible to do better than what I described in the previous comment. Let Xi be the space of possible experience histories[1] of user i and Y the space of everything else the utility functions depend on (things that nobody can observe directly). Suppose that the domain of the utility functions is Z:=∏iXi×Y. Then, we can define the “denosing[2] operator” Di:C(Z)→C(Z) for user i by
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXju(xi,x′,y)
Here, xi is the argument of u that ranges in Xi, x−i are the arguments that range in Xj for j≠i and y is the argument that ranges in Y.
That is, Di modifies a utility function by having it “imagine” that the experiences of all users other than i have been optimized, for the experiences of user i and the unobservables held constant.
Let ui:Z→R be the utility function of user i, and d0∈Rn the initial disagreement point (everyone dying), where n is the number of users. We then perform cooperative bargaining on the denosed utility functions Diui with disagreement point d0, producing some outcome μ0∈Δ(Z). Define d1∈Rn by d1i:=Eμ[ui]. Now we do another cooperative bargaining with d1 as the disagreement point and the original utility functions ui. This gives us the final outcome μ1.
Among other benefits, there is now much less need to remove outliers. Perhaps, instead of removing them we still want to mitigate them by applying “amplified denosing” to them which also removes the dependence on Y.
For this procedure, there is a much better case that the lower bound will be met.
This is very interesting (and “denosing operator” is delightful).
Some thoughts:
If I understand correctly, I think there can still be a problem where user i wants an experience history such that part of the history is isomorphic to a simulation of user j suffering (i wants to fully experience j suffering in every detail).
Here a fixed xi may entail some fixed xj for (some copy of) some j.
It seems the above approach can’t then avoid leaving one of i or j badly off: If i is permitted to freely determine the experience of the embedded j copy, the disagreement point in the second bargaining will bake this in: j may be horrified to see that i wants to experience its copy suffer, but will be powerless to stop it (if i won’t budge in the bargaining).
Conversely, if the embedded j is treated as a user which i will imagine is exactly to i’s liking, but who actually gets what j wants, then the selected μ0 will be horrible for i (e.g. perhaps i wants to fully experience Hitler suffering, and instead gets to fully experience Hitler’s wildest fantasies being realized).
I don’t think it’s possible to do anything like denosing to avoid this.
It may seem like this isn’t a practical problem, since we could reasonably disallow such embedding. However, I think that’s still tricky since there’s a less exotic version of the issue: my experiences likely already are a collection of subagents’ experiences. Presumably my maximisation over xjoe is permitted to determine all the xsubjoe.
It’s hard to see how you draw a principled line here: the ideal future for most people may easily be transhumanist to the point where today’s users are tomorrow’s subpersonalities (and beyond).
A case that may have to be ruled out separately is where i wants to become a suffering j. Depending on what I consider ‘me’, I might be entirely fine with it if ‘I’ wake up tomorrow as suffering j (if I’m done living and think j deserves to suffer). Or perhaps I want to clone myself 1010 times, and then have all copies convert themselves to suffering js after a while. [in general, it seems there has to be some mechanism to distribute resources reasonably—but it’s not entirely clear what that should be]
I think that a rigorous treatment of such issues will require some variant of IB physicalism (in which the monotonicity problem has been solved, somehow). I am cautiously optimistic that a denosing operator exists there which dodges these problems. This operator will declare both the manifesting and evaluation of the source codes of other users to be “out of scope” for a given user. Hence, a preference of i to observe the suffering of j would be “satisfied” by observing nearly anything, since the maximization can interpret anything as a simulation of j.
The “subjoe” problem is different: it is irrelevant because “subjoe” is not a user, only Joe is a user. All the transhumanist magic that happens later doesn’t change this. Users are people living during the AI launch, and only them. The status of any future (trans/post)humans is determined entirely according to the utility functions of users. Why? For two reasons: (i) the AI can only have access and stable pointers to existing people (ii) we only need the buy-in of existing people to launch the AI. If existing people want future people to be treated well, then they have nothing to worry about since this preference is part of the existing people’s utility functions.
Ah—that’s cool if IB physicalism might address this kind of thing (still on my to-read list).
Agreed that the subjoe thing isn’t directly a problem. My worry is mainly whether it’s harder to rule out i experiencing a simulation of xsubj−suffering, since subj isn’t a user. However, if you can avoid the suffering js by limiting access to information, the same should presumably work for relevant sub-js.
If existing people want future people to be treated well, then they have nothing to worry about since this preference is part of the existing people’s utility functions.
This isn’t so clear (to me at least) if:
Most, but not all current users want future people to be treated well.
Part of being “treated well” includes being involved in an ongoing bargaining process which decides the AI’s/future’s trajectory.
For instance, suppose initially 90% of people would like to have an iterated bargaining process that includes future (trans/post)humans as users, once they exist. The other 10% are only willing to accept such a situation if they maintain their bargaining power in future iterations (by whatever mechanism).
If you iterate this process, the bargaining process ends up dominated by users who won’t relinquish any power to future users. 90% of initial users might prefer drift over lock-in, but we get lock-in regardless (the disagreement point also amounting to lock-in).
Unless I’m confusing myself, this kind of thing seems like a problem. (not in terms of reaching some non-terrible lower bound, but in terms of realising potential) Wherever there’s this kind of asymmetry/degradation over bargaining iterations, I think there’s an argument for building in a way to avoid it from the start—since anything short of 100% just limits to 0 over time. [it’s by no means clear that we do want to make future people users on an equal footing to today’s people; it just seems to me that we have to do it at step zero or not at all]
Ah—that’s cool if IB physicalism might address this kind of thing
I admit that at this stage it’s unclear because physicalism brings in the monotonicity principle that creates bigger problems than what we discuss here. But maybe some variant can work.
For instance, suppose initially 90% of people would like to have an iterated bargaining process that includes future (trans/post)humans as users, once they exist. The other 10% are only willing to accept such a situation if they maintain their bargaining power in future iterations (by whatever mechanism).
Roughly speaking, in this case the 10% preserve their 10% of the power forever. I think it’s fine because I want the buy-in of this 10% and the cost seems acceptable to me. I’m also not sure there is any viable alternative which doesn’t have even bigger problems.
Sure, I’m not sure there’s a viable alternative either. This kind of approach seems promising—but I want to better understand any downsides.
My worry wasn’t about the initial 10%, but about the possibility of the process being iterated such that you end up with almost all bargaining power in the hands of power-keepers.
In retrospect, this is probably silly: if there’s a designable-by-us mechanism that better achieves what we want, the first bargaining iteration should find it. If not, then what I’m gesturing at must either be incoherent, or not endorsed by the 10% - so hard-coding it into the initial mechanism wouldn’t get the buy-in of the 10% to the extent that they understood the mechanism.
In the end, I think my concern is that we won’t get buy-in from a large majority of users: In order to accommodate some proportion with odd moral views it seems likely you’ll be throwing away huge amounts of expected value in others’ views—if I’m correctly interpreting your proposal (please correct me if I’m confused).
Is this where you’d want to apply amplified denosing? So, rather than filtering out the undesirable i, for these i you use:
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXj,y′∈Yu(xi,x′,y′) [i.e. ignoring y and imagining it’s optimal]
However, it’s not clear to me how we’d decide who gets strong denosing (clearly not everyone, or we don’t pick a y). E.g. if you strong-denose anyone who’s too willing to allow bargaining failure [everyone dies] you might end up filtering out altruists who worry about suffering risks. Does that make sense?
My worry wasn’t about the initial 10%, but about the possibility of the process being iterated such that you end up with almost all bargaining power in the hands of power-keepers.
I’m not sure what you mean here, but also the process is not iterated: the initial bargaining is deciding the outcome once and for all. At least that’s the mathematical ideal we’re approximating.
In the end, I think my concern is that we won’t get buy-in from a large majority of users:
In order to accommodate some proportion with odd moral views it seems likely you’ll be throwing away huge amounts of expected value in others’ views
I don’t think so? The bargaining system does advantage large groups over small groups.
In practice, I think that for the most part people don’t care much about what happens “far” from them (for some definition of “far”, not physical distance) so giving them private utopias is close to optimal from each individual perspective. Although it’s true they might pretend to care more than they do for the usual reasons, if they’re thinking in “far-mode”.
I would certainly be very concerned about any system that gives even more power to majority views. For example, what if the majority of people are disgusted by gay sex and prefer it not the happen anywhere? I would rather accept things I disapprove of happening far away from me than allow other people to control my own life.
Ofc the system also mandates win-win exchanges. For example, if Alice’s and Bob’s private utopias each contain something strongly unpalatable to the other but not strongly important to the respective customer, the bargaining outcome will remove both unpalatable things.
E.g. if you strong-denose anyone who’s too willing to allow bargaining failure [everyone dies] you might end up filtering out altruists who worry about suffering risks.
I’m fine with strong-denosing negative utlitarianists who would truly stick to their guns about negative utilitarianism (but I also don’t think there are many).
Ah, I was just being an idiot on the bargaining system w.r.t. small numbers of people being able to hold it to ransom. Oops. Agreed that more majority power isn’t desirable. [re iteration, I only meant that the bargaining could become iterated if the initial bargaining result were to decide upon iteration (to include more future users). I now don’t think this is particularly significant.]
I think my remaining uncertainty (/confusion) is all related to the issue I first mentioned (embedded copy experiences). It strikes me that something like this can also happen where minds grow/merge/overlap.
This operator will declare both the manifesting and evaluation of the source codes of other users to be “out of scope” for a given user. Hence, a preference of i to observe the suffering of j would be “satisfied” by observing nearly anything, since the maximization can interpret anything as a simulation of j.
Does this avoid the problem if i’s preferences use indirection? It seems to me that a robust pointer to j may be enough: that with a robust pointer it may be possible to implicitly require something like source-code-access without explicitly referencing it. E.g. where i has a preference to “experience j suffering in circumstances where there’s strong evidence it’s actually j suffering, given that these circumstances were the outcome of this bargaining process”.
If i can’t robustly specify things like this, then I’d guess there’d be significant trouble in specifying quite a few (mutually) desirable situations involving other users too. IIUC, this would only be any problem for the denosed bargaining to find a good d1: for the second bargaining on the true utility functions there’s no need to put anything “out of scope” (right?), so win-wins are easily achieved.
I’m imagining cooperative bargaining between all users, where the disagreement point is everyone dying[1][2] (this is a natural choice assuming that if we don’t build aligned TAI we get paperclips). This guarantees that every user will receive an outcome that’s at least not worse than death.
With Nash bargaining, we can still get issues for (in)famous people that millions of people want to do unpleasant things to. Their outcome will be better than death, but maybe worse than in my claimed “lower bound”.
With Kalai-Smorodinsky bargaining things look better, since essentially we’re maximizing a minimum over all users. This should admit my lower bound, unless it is somehow disrupted by enormous asymmetries in the maximal payoffs of different users.
In either case, we might need to do some kind of outlier filtering: if e.g. literally every person on Earth is a user, then maybe some of them are utterly insane in ways that cause the Pareto frontier to collapse.
Bargaining assumes we can access the utility function. In reality, even if we solve the value learning problem in the single user case, once you go to the multi-user case it becomes a mechanism design problem: users have incentives to lie / misrepresent their utility functions. A perfect solution might be impossible, but I proposed mitigating this by assigning each user a virtual “AI lawyer” that provides optimal input on their behalf into the bargaining system. In this case they at least have no incentive to lie to the lawyer, and the outcome will not be skewed in favor of users who are better in this game, but we don’t get the optimal bargaining solution either.
All of this assumes the TAI is based on some kind of value learning. If the first-stage TAI is based on something else, the problem might become easier or harder. Easier because the first-stage TAI will produce better solutions to the multi-user problem for the second-stage TAI. Harder because it can allow the small group of people controlling it to impose their own preferences.
For IDA-of-imitation, democratization seems like a hard problem because the mechanism by which IDA-of-imitation solves AI risk is precisely by empowering a small group of people over everyone else (since the source of AI risk comes from other people launching unaligned TAI). Adding transparency can entirely undermine safety.
For quantilized debate, adding transparency opens us to an attack vector where the AI manipulates public opinion. This significantly lowers the optimization pressure bar for manipulation, compared to manipulating the (carefully selected) judges, which might undermine the key assumption that effective dishonest strategies are harder to find than effective honest strategies.
This can be formalized by literally having the AI consider the possibility of optimizing for some unaligned utility function. This is a weird and risky approach but it works to 1st approximation.
Bargaining assumes we can access the utility function. In reality, even if we solve the value learning problem in the single user case, once you go to the multi-user case it becomes a mechanism design problem: users have incentives to lie / misrepresent their utility functions. A perfect solution might be impossible, but I proposed mitigating this by assigning each user a virtual “AI lawyer” that provides optimal input on their behalf into the bargaining system. In this case they at least have no incentive to lie to the lawyer, and the outcome will not be skewed in favor of users who are better in this game, but we don’t get the optimal bargaining solution either.
Assuming each lawyer has the same incentive to lie as its client, it has an incentive to misrepresent that some preferable-to-death outcomes are “worse-than-death” (in order to force those outcomes out of the set of “feasible agreements” in hope of getting a more preferred outcome as the actual outcome), and this at equilibrium is balanced by the marginal increase in the probability of getting “everyone dies” as the outcome (due to feasible agreements becoming a null set) caused by the lie. So the probability of “everyone dies” in this game has to be non-zero.
(It’s the same kind of problem as in the AI race or tragedy of commons: people not taking into account the full social costs of their actions as they reach for private benefits.)
Of course in actuality everyone dying may not be a realistic consequence of failure to reach agreement, but if the real consequence is better than that, and the AI lawyers know this, they would be more willing to lie since the perceived downside of lying would be smaller, so you end up with a higher chance of no agreement.
Yes, it’s not a very satisfactory solution. Some alternative/complementary solutions:
Somehow use non-transformative AI to do my mind uploading, and then have the TAI to learn by inspecting the uploads. Would be great for single-user alignment as well.
Somehow use non-transformative AI to create perfect lie detectors, and use this to enforce honesty in the mechanism. (But, is it possible to detect self-deception?)
Have the TAI learn from past data which wasn’t affected by the incentives created by the TAI. (But, is there enough information there?)
Shape the TAI’s prior about human values in order to rule out at least the most blatant lies.
Some clever mechanism design I haven’t thought of. The problem with this is, most mechanism designs rely on money and money that doesn’t seem applicable, whereas when you don’t have money there are many impossibility theorems.
In either case, we might need to do some kind of outlier filtering: if e.g. literally every person on Earth is a user, then maybe some of them are utterly insane in ways that cause the Pareto frontier to collapse.
This seems near guaranteed to me: a non-zero amount of people will be that crazy (in our terms), so filtering will be necessary.
Then I’m curious about how we draw the line on outlier filtering. What filtering rule do we use? I don’t yet see a good principled rule (e.g. if we want to throw out people who’d collapse agreement to the disagreement point, there’s more than one way to do that).
Maybe crazy behaviour correlates with less intelligence
Depending what we mean by ‘crazy’ I think that’s unlikely—particularly when what we care about here are highly unusual moral stances. I’d see intelligence as a multiplier, rather than something which points you in the ‘right’ direction. Outliers will be at both extremes of intelligence—and I think you’ll get a much wider moral variety on the high end.
For instance, I don’t think you’ll find many low-intelligence antinatalists—and here I mean the stronger, non-obvious claim: not simply that most people calling themselves antinatalists, or advocating for antinatalism will have fairly high intelligence, but rather that most people with such a moral stance (perhaps not articulated) will have fairly high intelligence.
Generally, I think there are many weird moral stances you might think your way into that you’d be highly unlikely to find ‘naturally’ (through e.g. absorption of cultural norms). I’d also expect creativity to positively correlate with outlier moralities. Minds that habitually throw together seven disparate concepts will find crazier notions than those which don’t get beyond three.
First, I think we want to be thinking in terms of [personal morality we’d reflectively endorse] rather than [all the base, weird, conflicting… drivers of behaviour that happen to be in our heads].
There are things most of us would wish to change about ourselves if we could. There’s no sense in baking them in for all eternity (or bargaining on their behalf), just because they happen to form part of what drives us now. [though one does have to be a bit careful here, since it’s easy to miss the upside of qualities we regard as flaws]
With this in mind, reflectively endorsed antinatalism really is a problem: yes, some people will endorse sacrificing everything just to get to a world where there’s no suffering (because there are no people).
Note that the kinds of bargaining approach Vanessa is advocating are aimed at guaranteeing a lower bound for everyone (who’s not pre-filtered out) - so you only need to include one person with a particularly weird view to fail to reach a sensible bargain. [though her most recent version should avoid this]
Yes, I still prefer this (assuming my own private utopia) over paperclips.
For a utilitarian, this doesn’t mean much. What’s much more important is something like, “How close is this outcome to an actual (global) utopia (e.g., with optimized utilitronium filling the universe), on a linear scale?” For example, my rough expectation (without having thought about it much) is that your “lower bound” outcome is about midway between paperclips and actual utopia on a logarithmic scale. In one sense, this is much better than paperclips, but in another sense (i.e., on the linear scale), it’s almost indistinguishable from paperclips, and a utilitarian would only care about the latter and therefore be nearly as disappointed by that outcome as paperclips.
I want to add a little to my stance on utilitarianism. A utilitarian superintelligence would probably kill me and everyone I love, because we are made of atoms that could be used for minds that are more hedonic[1][2][3]. Given a choice between paperclips and utilitarianism, I would still choose utilitarianism. But, if there was a utilitarian TAI project along with a half-decent chance to do something better (by my lights), I would actively oppose the utilitarian project. From my perspective, such a project is essentially enemy combatants.
One way to avoid it is by modifying utilitarianism to only place weight on currently existing people. But this is already not that far from my cooperative bargaining proposal (although still inferior to it, IMO).
Another way to avoid it is by postulating some very strong penalty on death (i.e. discontinuity of personality). But this is not trivial to do, especially without creating other problems. Moreover, from my perspective this kind of thing is hacks trying to work around the core issue, namely that I am not a utilitarian (along with the vast majority of people).
A possible counterargument is, maybe the superhedonic future minds would be sad to contemplate our murder. But, this seems too weak to change the outcome, even assuming that this version of utilitarianism mandates minds who would want to know the truth and care about it, and that this preference is counted towards “utility”.
A utilitarian superintelligence would probably kill me and everyone I love, because we are made of atoms that could be used for minds that are more hedonic
This seems like a reasonable concern about some types of hedonic utilitarianism. To be clear, I’m not aware of any formulation of utilitarianism that doesn’t have serious issues, and I’m also not aware of any formulation of any morality that doesn’t have serious issues.
But, if there was a utilitarian TAI project along with a half-decent chance to do something better (by my lights), I would actively oppose the utilitarian project. From my perspective, such a project is essentially enemy combatants.
Just to be clear, this isn’t in response to something I wrote, right? (I’m definitely not advocating any kind of “utilitarian TAI project” and would be quite scared of such a project myself.)
Moreover, from my perspective this kind of thing is hacks trying to work around the core issue, namely that I am not a utilitarian (along with the vast majority of people).
So what are you (and them) then? What would your utopia look like?
Just to be clear, this isn’t in response to something I wrote, right? (I’m definitely not advocating any kind of “utilitarian TAI project” and would be quite scared of such a project myself.)
No! Sorry, if I gave that impression.
So what are you (and them) then? What would your utopia look like?
Well, I linked my toy model of partiality before. Are you asking about something more concrete?
I have low confidence about this, but my best guess personal utopia would be something like: A lot of cool and interesting things are happening. Some of them are good, some of them are bad (a world in which nothing bad ever happens would be boring). However, there is a limit on how bad something is allowed to be (for example, true death, permanent crippling of someone’s mind and eternal torture are over the line), and overall “happy endings” are more common than “unhappy endings”. Moreover, since it’s my utopia (according to my understanding of the question, we are ignoring the bargaining process and acausal cooperation here), I am among the top along those desirable dimensions which are zero-sum (e.g. play an especially important / “protagonist” role in the events to the extent that it’s impossible for everyone to play such an important role, and have high status to the extent that it’s impossible for everyone to have such high status).
First, you wrote “a part of me is actually more scared of many futures in which alignment is solved, than a future where biological life is simply wiped out by a paperclip maximizer.” So, I tried to assuage this fear for a particular class of alignment solutions.
Second… Yes, for a utilitarian this doesn’t mean “much”. But, tbh, who cares? I am not a utilitarian. The vast majority of people are not utilitarians. Maybe even literally no one is an (honest, not self-deceiving) utilitarian. From my perspective, disappointing the imaginary utilitarian is (in itself) about as upsetting as disappointing the imaginary paperclip maximizer.
Third, what I actually want from multi-user alignment is a solution that (i) is acceptable to me personally (ii) is acceptable to the vast majority of people (at least if they think through it rationally and are arguing honestly and in good faith) (iii) is acceptable to key stakeholders (iv) as much as possible, doesn’t leave any Pareto improvements on the table and (v) sufficiently Schelling-pointy to coordinate around. Here, “acceptable” means “a lot better than paperclips and not worth starting an AI race/war to get something better”.
Second… Yes, for a utilitarian this doesn’t mean “much”. But, tbh, who cares? I am not a utilitarian. The vast majority of people are not utilitarians. Maybe even literally no one is an (honest, not self-deceiving) utilitarian. From my perspective, disappointing the imaginary utilitarian is (in itself) about as upsetting as disappointing the imaginary paperclip maximizer.
I’m not a utilitarian either, because I don’t know what my values are or should be. But I do assign significant credence to the possibility that something in the vincinity of utilitarianism is the right values (for me, or period). Given my uncertainties, I want to arrange the current state of the world so that (to the extent possible), whatever I end up deciding my values are, through things like reason, deliberation, doing philosophy, the world will ultimately not turn out to be a huge disappointment according to those values. Unfortunately, your proposed solution isn’t very reassuring to this kind of view.
It’s quite possible that I (and people like me) are simply out of luck, and there’s just no feasible way to do what we want to do, but it sounds like you think I shouldn’t even want what I want, or at least that you don’t want something like this. Is it because you’re already pretty sure what your values are or should be, and therefore think there’s little chance that millennia from now you’ll end up deciding that utilitarianism (or NU, or whatever) is right after all, and regret not doing more in 2021 to push the world in the direction of [your real values, whatever they are]?
I’m moderately sure what my values are, to some approximation. More importantly, I’m even more sure that, whatever my values are, they are not so extremely different from the values of most people that I should wage some kind of war against the majority instead of trying to arrive at a reasonable compromise. And, in the unlikely event that most people (including me) will turn out to be some kind of utilitarians after all, it’s not a problem: value aggregation will then produce a universe which is pretty good for utilitarians.
I’m moderately sure what my values are, to some approximation. More importantly, I’m even more sure that, whatever my values are, they are not so extremely different from the values of most people [...]
Maybe you’re just not part of the target audience of my OP then… but from my perspective, if I determine my values through the kind of process described in the first quote, and most people determine their values through the kind of process described in the second quote, it seems quite likely that the values end up being very different.
[...] that I should wage some kind of war against the majority instead of trying to arrive at a reasonable compromise.
The kind of solution I have in mind is not “waging war” but for example, solving metaphilososphy and building an AI that can encourage philosophical reflection in humans or enhance people’s philosophical abilities.
And, in the unlikely possibility that most people (including me) will turn out to be some kind of utilitarians after all, it’s not a problem: value aggregation will then produce a universe which is pretty good for utilitarians.
What if you turn out to be some kind of utilitarian but most people don’t (because you’re more like the first group in the OP and they’re more like the second group), or most people will eventually turn out to be some kind of utilitarian in a world without AI, but in a world with AI, this will happen?
I don’t think people determine their values through either process. I think that they already have values, which are to a large extent genetic and immutable. Instead, these processes determine what values they pretend to have for game-theory reasons. So, the big difference between the groups is which “cards” they hold and/or what strategy they pursue, not an intrinsic difference in values.
But also, if we do model values as the result of some long process of reflection, and you’re worried about the AI disrupting or insufficiently aiding this process, then this is already a single-user alignment issue and should be analyzed in that context first. The presumed differences in moralities are not the main source of the problem here.
I don’t think people determine their values through either process. I think that they already have values, which are to a large extent genetic and immutable. Instead, these processes determine what values they pretend to have for game-theory reasons. So, the big difference between the groups is which “cards” they hold and/or what strategy they pursue, not an intrinsic difference in values.
This is not a theory that’s familiar to me. Why do you think this is true? Have you written more about it somewhere or can link to a more complete explanation?
But also, if we do model values as the result of some long process of reflection, and you’re worried about the AI disrupting or insufficiently aiding this process, then this is already a single-user alignment issue and should be analyzed in that context first. The presumed differences in moralities are not the main source of the problem here.
This seems reasonable to me. (If this was meant to be an argument against something I said, there may have been anther miscommuncation, but I’m not sure it’s worth tracking that down.)
This is not a theory that’s familiar to me. Why do you think this is true? Have you written more about it somewhere or can link to a more complete explanation?
I considering writing about this for a while, but so far I don’t feel sufficiently motivated. So, the links I posted upwards in the thread are the best I have, plus vague gesturing in the directions of Hansonian signaling theories, Jaynes’ theory of consciousness and Yudkowsky’s belief in belief.
This comment seems to be consistent with the assumption that the outcome 1 year after the singularity is locked in forever. But the future we’re discussing here is one where humans retain autonomy (?), and in that case, they’re allowed to change their mind over time, especially if humanity has access to a superintelligent aligned AI. I think a future where we begin with highly suboptimal personal utopias and gradually transition into utilitronium is among the more plausible outcomes. Compared with other outcomes where Not Everyone Dies, anyway. Your credence may differ if you’re a moral relativist.
But the future we’re discussing here is one where humans retain autonomy (?), and in that case, they’re allowed to change their mind over time, especially if humanity has access to a superintelligent aligned AI.
What if the humans ask the aligned AI to help them be more moral, and part of what they mean by “more moral” is having fewer doubts about their current moral beliefs? This is what a “status game” view of morality seems to predict, for the humans whose status games aren’t based on “doing philosophy”, which seems to be most of them.
I don’t have any reason why this couldn’t happen. My position is something like “morality is real, probably precisely quantifiable; seems plausible that in the scenario of humans with autonomy and aligned AI, this could lead to an asymmetry where more people tend toward utilitronium over time”. (Hence why I replied, you didn’t seem to consider that possibility.) I could make up some mechanisms for this, but probably you don’t need me for that. Also seems plausible that this doesn’t happen. If it doesn’t happen, maybe the people who get to decide what happens with the rest of the universe tend toward utilitronium. But my model is widely uncertain and doesn’t rule out futures of highly suboptimal personal utopias that persist indefinitely.
I strongly believe that (1) well-being is objective, (2) well-being is quantifiable, and (3) Open Individualism is true (i.e., the concept of identity isn’t well-defined, and you’re subjectively no less continuous with the future self if any other person than your own future self).
If (1-3) are all true, then utilitronium is the optimal outcome for everyone even if they’re entirely selfish. Furthermore, I expect an AGI to figure this out, and to the extent that it’s aligned, it should communicate that if it’s asked. (I don’t think an AGI will therefore decide to do the right thing, so this is entirely compatible with everyone dying if alignment isn’t solved.)
In the scenario where people get to talk to the AGI freely and it’s aligned, two concrete mechanisms I see are (a) people just ask the AGI what is morally correct and it tells them, and (b) they get some small taste of what utilitronium would feel like, which would make it less scary. (A crucial piece is that they can rationally expect to experience this themselves in the utilitronium future.)
In the scenario where people don’t get to talk to the AGI, who knows. It’s certainly possible that we have singleton scenario with a few people in charge of the AGI, and they decide to censor questions about ethics because they find the answers scary.
The only org I know of that works on this and shares my philosophical views is QRI. Their goal is to (a) come up with a mathematical space (probably a topological one, mb a Hilbert space) that precisely describes the subjective experience of someone, (b) find a way to put someone in the scanner and create that space, and (c) find a property of that space that corresponds to their well-being in that moment. The flag ship theory is that this property is symmetry. Their model is stronger than (1-3), but if it’s correct, you could get hard evidence on this before AGI since it would make strong testable predictions about people’s well-being (and they think it could also point to easy interventions, though I don’t understand how that works). Whether it’s feasible to do this before AGI is a different question. I’d bet against it, but I think I give it better odds than any specific alignment proposal. (And I happen to know that Mike agrees that the future is dominated by concerns about AI and thinks this is the best thing to work on.)
So, I think their research is the best bet for getting more people on board with utilitronium since it can provide evidence on (1) and (2). (Also has the nice property that it won’t work if (1) or (2) are false, so there’s low risk of outrage.) Other than that, write posts arguing for moral realism and/or for Open Individualism.
Quantifying suffering before AGI would also plausibly help with alignment, since at least you can formally specify a broad space of outcomes you don’t want. though it certainly doesn’t solve it, e.g. because of inner optimizers.
This implies solving a version of the alignment problem that includes reasonable value aggregation between different people (or between AIs aligned to different people),
We already have a solution to this: money. It’s also the only solution that satisfies some essential properties such as sybil orthogonality (especially important for posthuman/AGI societies).
at least some researchers don’t seem to consider that part of “alignment”.
It’s part of alignment. Also, it seems mostly separate from the part about “how do you even have consequentialism powerful enough to make, say, nanotech, without killing everyone as a side-effect?”, and the latter seems not too related to the former.
In reality, everyone’s morality is based on something like the status game (see also: 123)
… I really wanted to say [citation needed], but then you did provide citations, but then the citations were not compelling to me.
I’m pretty opposed to such universal claims being made about humans without pushback, because such claims always seem to me to wish-away the extremely wide variation in human psychology and the difficulty establishing anything like “all humans experience X.”
There are people who have no visual imagery, people who do not think in words, people who have no sense of continuity of self, people who have no discernible emotional response to all sorts of “emotional” stimuli, and on and on and on.
So, I’ll go with “it makes sense to model people as if every one of them is motivated by structures built atop the status game.” And I’ll go with “it seems like the status architecture is a physiological near-universal, so I have a hard time imagining what else people’s morality might be made of.” And I’ll go with “everyone I’ve ever talked to had morality that seemed to me to cash out to being statusy, except the people whose self-reports I ignored because they didn’t fit the story I was building in my head.”
But I reject the blunt universal for not even pretending that it’s interested in making itself falsifiable.
Kind of frustrating that this high karma reply to a high karma comment on my post is based on a double misunderstanding/miscommunication:
First Vanessa understood me as claiming that a significant number of people’s morality is not based on status games. I tried to clarify in an earlier comment already, but to clarify some more: that’s not my intended distinction between the two groups. Rather the distinction is that the first group “know or at least suspect that they are confused about morality, and are eager or willing to apply reason and deliberation to find out what their real values are, or to correct their moral beliefs” (they can well be doing this because of the status game that they’re playing) whereas this quoted description doesn’t apply to the second group.
Then you (Duncan) understood Vanessa as claiming that literally everyone’s morality is based on status games, when (as the subsequent discussion revealed) the intended meaning was more like “the number of people whose morality is not based on status games is a lot fewer than (Vanessa’s misunderstanding of) Wei’s claim”.
I think it’s important and valuable to separate out “what was in fact intended” (and I straightforwardly believe Vanessa’s restatement as a truer explanation of her actual position) from “what was originally said, and how would 70+ out of 100 readers tend to interpret it.”
I think we’ve cleared up what was meant. I still think it was bad that [the perfectly reasonable thing that was meant] was said in a [predictably misleading fashion].
But I think we’ve said all that needs to be said about that, too.
This is a tangent (so maybe you prefer to direct this discussion elsewhere), but: what’s with the brackets? I see you using them regularly; what do they signify?
I use them where I’m trying to convey a single noun that’s made up of many words, and I’m scared that people will lose track of the overall sentence while in the middle of the chunk. It’s an attempt to keep the overall sentence understandable. I’ve tried hyphenating such phrases and people find that more annoying.
It’s not just that the self-reports didn’t fit the story I was building, the self-reports didn’t fit the revealed preferences. Whatever people say about their morality, I haven’t seen anyone who behaves like a true utilitarian.
IMO, this is the source of all the gnashing of teeth about how much % of your salary you need to donate: the fundamental contradiction between the demands of utilitarianism and how much people are actually willing to pay for the status gain. Ofc many excuses were developed (“sure I still need to buy that coffee or those movie tickets, otherwise I won’t be productive”) but they don’t sound like the most parsimonious explanation.
This is also the source of paradoxes in population ethics and its vicinity: those abstractions are just very remote from actual human minds, so there’s no reason they should produce anything sane in edge cases. Their only true utility is as an approximate guideline for making group decisions, for sufficiently mundane scenarios. Once you get to issues with infinities it becomes clear utilitarianism is not even mathematically coherent, in general.
You’re right that there is a lot of variation in human psychology. But it’s also an accepted practice to phrase claims as universal when what you actually mean is, the exceptions are negligible for our practical purpose. For example, most people would accept “humans have 2 arms and 2 legs” as a true statement in many contexts, even though some humans have less. In this case, my claim is that the exceptions are much rarer than the OP seems to imply (i.e. most people the OP classifies as exceptions are not really exceptions).
I’m all for falsifiability, but it’s genuinely hard to do falsifiability in soft topics like this, where no theory makes very sharp predictions and collecting data is hard. Ultimately, which explanation is more reasonable is going to be at least in part an intuitive judgement call based on your own experience and reflection. So, yes, I certainly might be wrong, but what I’m describing is my current best guess.
But it’s also an accepted practice to phrase claims as universal when what you actually mean is, the exceptions are negligible for our practical purpose. For example, most people would accept “humans have 2 arms and 2 legs” as a true statement in many contexts, even though some humans have less.
The equivalent statement would be “In reality, everyone has 2 arms and 2 legs.”
Well, if the OP said something like “most people have 2 eyes but enlightened Buddhists have a third eye” and I responded with “in reality, everyone have 2 eyes” then, I think my meaning would be clear even though it’s true that some people have 1 or 0 eyes (afaik maybe there is even a rare mutation that creates a real third eye). Not adding all possible qualifiers is not the same as “not even pretending that it’s interested in making itself falsifiable”.
I think your meaning would be clear, but “everyone knows what this straightforwardly false thing that I said really meant” is insufficient for a subculture trying to be precise and accurate and converge on truth. Seems like more LWers are on your side than on mine on that question, but that’s not news. ¯\_(ツ)_/¯
It’s a strawman to pretend that “please don’t say a clearly false thing” is me insisting on “please include all possible qualifiers.” I just wish you hadn’t said a clearly false thing, is all.
Natural language is not math, it’s inherently ambiguous and it’s not realistically possible to always be precise without implicitly assuming anything about the reader’s understanding of the context. That said, it seems like I wasn’t sufficiently precise in this case, so I edited my comment. Thank you for the correction.
insufficient for a subculture trying to be precise and accurate and converge on truth
The tradeoff is with verbosity and difficulty of communication, it’s not always a straightforward Pareto improvement. So in this case I fully agree with dropping “everyone” or replacing it with a more accurate qualifier. But I disagree with a general principle that would discount ease for a person who is trained and talented in relevant ways. New habits of thought that become intuitive are improvements, checklists and other deliberative rituals that slow down thinking need merit that overcomes their considerable cost.
I haven’t seen anyone who behaves like a true utilitarian.
That looks like a No True Scotsman argument to me. Just because the extreme doesn’t exist doesn’t mean that all of the scale can be explained by status games.
What does it have to do with “No True Scotsman”? NTS is when you redefine your categories to justify your claim. I don’t think I did that anywhere.
Just because the extreme doesn’t exist doesn’t mean that all of the scale can be explained by status games.
First, I didn’t say all the scale is explained by status games, I did mention empathy as well.
Second, that by itself sure doesn’t mean much. Explaining all the evidence would require an article, or maybe a book (although I hoped the posts I linked explain some of it). My point here is that there is an enormous discrepancy between the reported morality and the revealed preferences, so believing self-reports is clearly a non-starter. How do you build an explanation not from self-reports is a different (long) story.
If you try to quantify it, humans on average probably spend over 95% (Conservative estimation) of their time and resources on non-utilitarian causes. True utilitarian behavior Is extremely rare and all other moral behaviors seem to be either elaborate status games or extended self-interest [1]. The typical human is way closer under any relevant quantified KPI to being completely selfish than being a utilitarian.
[1] - Investing in your family/friends is in a way selfish, from a genes/alliances (respectively) perspective.
But, I still prefer that over paperclips (by far). And, I suspect that most people do (even if they protest it in order to play the game).
What does this even mean? If someone says they don’t want X, and they never take actions that promote X, how can it be said that they “truly” want X? It’s not their stated preference or their revealed preference!
You sound like you’re positing the existence of two type of people: type I people who have morality based on “reason” and type II people who have morality based on the “status game”. In reality,
everyone’snearly everyone’s morality is based on something like the status game (see also: 1 2 3). It’s just that EAs and moral philosophers are playing the game in a tribe which awards status differently.The true intrinsic values of most people do place a weight on the happiness of other people (that’s roughly what we call “empathy”), but this weight is very unequally distributed.
There are definitely thorny questions regarding the best way to aggregate the values of different people in TAI. But, I think that given a reasonable solution, a lower bound on the future is imagining that the AI will build a private utopia for every person, as isolated from the other “utopias” as that person wants it to be. Probably some people’s “utopias” will not be great, viewed in utilitarian terms. But, I still prefer that over paperclips (by far). And, I suspect that most people do (even if they protest it in order to play the game).
Sure, I’ve said as much in recent comments, including this one. ETA: Related to this, I’m worried about AI disrupting “our” status game in an unpredictable and possibly dangerous way. E.g., what will happen when everyone uses AI advisors to help them play status games, including the status game of moral philosophy?
What do you mean by “true intrinsic values”? (I couldn’t find any previous usage of this term by you.) How do you propose finding people’s true intrinsic values?
These weights, if low enough relative to other “values”, haven’t prevented people from committing atrocities on each other in the name of morality.
This implies solving a version of the alignment problem that includes reasonable value aggregation between different people (or between AIs aligned to different people), but at least some researchers don’t seem to consider that part of “alignment”.
Given that playing status games and status competition between groups/tribes/status games constitute a huge part of people’s lives, I’m not sure how private utopias that are very isolated from each other would work. Also, I’m not sure if your solution would prevent people from instantiating simulations of perceived enemies / “evil people” in their utopias and punishing them, or just simulating a bunch of low status people to lord over.
I concede that a utilitarian would probably find almost all “aligned” futures better than paperclips. Perhaps I should have clarified that by “parts of me” being more scared, I meant the selfish and NU-leaning parts. The utilitarian part of me is just worried about the potential waste caused by many or most “utopias” being very suboptimal in terms of value created per unit of resource consumed.
I mean the values relative to which a person seems most like a rational agent, arguably formalizable along these lines.
Yes.
Yes. I do think multi-user alignment is an important problem (and occasionally spend some time thinking about it), it just seems reasonable to solve single user alignment first. Andrew Critch is an example of a person who seems to be concerned about this.
I meant that each private utopia can contain any number of people created by the AI, in addition to its “customer”. Ofc groups that can agree on a common utopia can band together as well.
They are prevented from simulating other pre-existing people without their consent, but can simulate a bunch of low status people to lord over. Yes, this can be bad. Yes, I still prefer this (assuming my own private utopia) over paperclips. And, like I said, this is just a relatively easy to imagine lower bound, not necessarily the true optimum.
The selfish part, at least, doesn’t have any reason to be scared as long as you are a “customer”.
Why do you think this will be the result of the value aggregation (or a lower bound on how good the aggregation will be)? For example, if there is a big block of people who all want to simulate person X in order to punish that person, and only X and a few other people object, why won’t the value aggregation be “nobody pre-existing except X (and Y and Z etc.) can be simulated”?
Given some assumptions about the domains of the utility functions, it is possible to do better than what I described in the previous comment. Let Xi be the space of possible experience histories[1] of user i and Y the space of everything else the utility functions depend on (things that nobody can observe directly). Suppose that the domain of the utility functions is Z:=∏iXi×Y. Then, we can define the “denosing[2] operator” Di:C(Z)→C(Z) for user i by
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXju(xi,x′,y)
Here, xi is the argument of u that ranges in Xi, x−i are the arguments that range in Xj for j≠i and y is the argument that ranges in Y.
That is, Di modifies a utility function by having it “imagine” that the experiences of all users other than i have been optimized, for the experiences of user i and the unobservables held constant.
Let ui:Z→R be the utility function of user i, and d0∈Rn the initial disagreement point (everyone dying), where n is the number of users. We then perform cooperative bargaining on the denosed utility functions Diui with disagreement point d0, producing some outcome μ0∈Δ(Z). Define d1∈Rn by d1i:=Eμ[ui]. Now we do another cooperative bargaining with d1 as the disagreement point and the original utility functions ui. This gives us the final outcome μ1.
Among other benefits, there is now much less need to remove outliers. Perhaps, instead of removing them we still want to mitigate them by applying “amplified denosing” to them which also removes the dependence on Y.
For this procedure, there is a much better case that the lower bound will be met.
In the standard RL formalism this is the space of action-observation sequences (A×O)ω.
From the expression “nosy preferences”, see e.g. here.
This is very interesting (and “denosing operator” is delightful).
Some thoughts:
If I understand correctly, I think there can still be a problem where user i wants an experience history such that part of the history is isomorphic to a simulation of user j suffering (i wants to fully experience j suffering in every detail).
Here a fixed xi may entail some fixed xj for (some copy of) some j.
It seems the above approach can’t then avoid leaving one of i or j badly off:
If i is permitted to freely determine the experience of the embedded j copy, the disagreement point in the second bargaining will bake this in: j may be horrified to see that i wants to experience its copy suffer, but will be powerless to stop it (if i won’t budge in the bargaining).
Conversely, if the embedded j is treated as a user which i will imagine is exactly to i’s liking, but who actually gets what j wants, then the selected μ0 will be horrible for i (e.g. perhaps i wants to fully experience Hitler suffering, and instead gets to fully experience Hitler’s wildest fantasies being realized).
I don’t think it’s possible to do anything like denosing to avoid this.
It may seem like this isn’t a practical problem, since we could reasonably disallow such embedding. However, I think that’s still tricky since there’s a less exotic version of the issue: my experiences likely already are a collection of subagents’ experiences. Presumably my maximisation over xjoe is permitted to determine all the xsubjoe.
It’s hard to see how you draw a principled line here: the ideal future for most people may easily be transhumanist to the point where today’s users are tomorrow’s subpersonalities (and beyond).
A case that may have to be ruled out separately is where i wants to become a suffering j. Depending on what I consider ‘me’, I might be entirely fine with it if ‘I’ wake up tomorrow as suffering j (if I’m done living and think j deserves to suffer).
Or perhaps I want to clone myself 1010 times, and then have all copies convert themselves to suffering js after a while. [in general, it seems there has to be some mechanism to distribute resources reasonably—but it’s not entirely clear what that should be]
I think that a rigorous treatment of such issues will require some variant of IB physicalism (in which the monotonicity problem has been solved, somehow). I am cautiously optimistic that a denosing operator exists there which dodges these problems. This operator will declare both the manifesting and evaluation of the source codes of other users to be “out of scope” for a given user. Hence, a preference of i to observe the suffering of j would be “satisfied” by observing nearly anything, since the maximization can interpret anything as a simulation of j.
The “subjoe” problem is different: it is irrelevant because “subjoe” is not a user, only Joe is a user. All the transhumanist magic that happens later doesn’t change this. Users are people living during the AI launch, and only them. The status of any future (trans/post)humans is determined entirely according to the utility functions of users. Why? For two reasons: (i) the AI can only have access and stable pointers to existing people (ii) we only need the buy-in of existing people to launch the AI. If existing people want future people to be treated well, then they have nothing to worry about since this preference is part of the existing people’s utility functions.
Ah—that’s cool if IB physicalism might address this kind of thing (still on my to-read list).
Agreed that the subjoe thing isn’t directly a problem. My worry is mainly whether it’s harder to rule out i experiencing a simulation of xsubj−suffering, since subj isn’t a user. However, if you can avoid the suffering js by limiting access to information, the same should presumably work for relevant sub-js.
This isn’t so clear (to me at least) if:
Most, but not all current users want future people to be treated well.
Part of being “treated well” includes being involved in an ongoing bargaining process which decides the AI’s/future’s trajectory.
For instance, suppose initially 90% of people would like to have an iterated bargaining process that includes future (trans/post)humans as users, once they exist. The other 10% are only willing to accept such a situation if they maintain their bargaining power in future iterations (by whatever mechanism).
If you iterate this process, the bargaining process ends up dominated by users who won’t relinquish any power to future users. 90% of initial users might prefer drift over lock-in, but we get lock-in regardless (the disagreement point also amounting to lock-in).
Unless I’m confusing myself, this kind of thing seems like a problem. (not in terms of reaching some non-terrible lower bound, but in terms of realising potential)
Wherever there’s this kind of asymmetry/degradation over bargaining iterations, I think there’s an argument for building in a way to avoid it from the start—since anything short of 100% just limits to 0 over time. [it’s by no means clear that we do want to make future people users on an equal footing to today’s people; it just seems to me that we have to do it at step zero or not at all]
I admit that at this stage it’s unclear because physicalism brings in the monotonicity principle that creates bigger problems than what we discuss here. But maybe some variant can work.
Roughly speaking, in this case the 10% preserve their 10% of the power forever. I think it’s fine because I want the buy-in of this 10% and the cost seems acceptable to me. I’m also not sure there is any viable alternative which doesn’t have even bigger problems.
Sure, I’m not sure there’s a viable alternative either. This kind of approach seems promising—but I want to better understand any downsides.
My worry wasn’t about the initial 10%, but about the possibility of the process being iterated such that you end up with almost all bargaining power in the hands of power-keepers.
In retrospect, this is probably silly: if there’s a designable-by-us mechanism that better achieves what we want, the first bargaining iteration should find it. If not, then what I’m gesturing at must either be incoherent, or not endorsed by the 10% - so hard-coding it into the initial mechanism wouldn’t get the buy-in of the 10% to the extent that they understood the mechanism.
In the end, I think my concern is that we won’t get buy-in from a large majority of users:
In order to accommodate some proportion with odd moral views it seems likely you’ll be throwing away huge amounts of expected value in others’ views—if I’m correctly interpreting your proposal (please correct me if I’m confused).
Is this where you’d want to apply amplified denosing?
So, rather than filtering out the undesirable i, for these i you use:
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXj, y′∈Yu(xi,x′,y′) [i.e. ignoring y and imagining it’s optimal]
However, it’s not clear to me how we’d decide who gets strong denosing (clearly not everyone, or we don’t pick a y). E.g. if you strong-denose anyone who’s too willing to allow bargaining failure [everyone dies] you might end up filtering out altruists who worry about suffering risks.
Does that make sense?
I’m not sure what you mean here, but also the process is not iterated: the initial bargaining is deciding the outcome once and for all. At least that’s the mathematical ideal we’re approximating.
I don’t think so? The bargaining system does advantage large groups over small groups.
In practice, I think that for the most part people don’t care much about what happens “far” from them (for some definition of “far”, not physical distance) so giving them private utopias is close to optimal from each individual perspective. Although it’s true they might pretend to care more than they do for the usual reasons, if they’re thinking in “far-mode”.
I would certainly be very concerned about any system that gives even more power to majority views. For example, what if the majority of people are disgusted by gay sex and prefer it not the happen anywhere? I would rather accept things I disapprove of happening far away from me than allow other people to control my own life.
Ofc the system also mandates win-win exchanges. For example, if Alice’s and Bob’s private utopias each contain something strongly unpalatable to the other but not strongly important to the respective customer, the bargaining outcome will remove both unpalatable things.
I’m fine with strong-denosing negative utlitarianists who would truly stick to their guns about negative utilitarianism (but I also don’t think there are many).
Ah, I was just being an idiot on the bargaining system w.r.t. small numbers of people being able to hold it to ransom. Oops. Agreed that more majority power isn’t desirable.
[re iteration, I only meant that the bargaining could become iterated if the initial bargaining result were to decide upon iteration (to include more future users). I now don’t think this is particularly significant.]
I think my remaining uncertainty (/confusion) is all related to the issue I first mentioned (embedded copy experiences). It strikes me that something like this can also happen where minds grow/merge/overlap.
Does this avoid the problem if i’s preferences use indirection? It seems to me that a robust pointer to j may be enough: that with a robust pointer it may be possible to implicitly require something like source-code-access without explicitly referencing it. E.g. where i has a preference to “experience j suffering in circumstances where there’s strong evidence it’s actually j suffering, given that these circumstances were the outcome of this bargaining process”.
If i can’t robustly specify things like this, then I’d guess there’d be significant trouble in specifying quite a few (mutually) desirable situations involving other users too. IIUC, this would only be any problem for the denosed bargaining to find a good d1: for the second bargaining on the true utility functions there’s no need to put anything “out of scope” (right?), so win-wins are easily achieved.
I’m imagining cooperative bargaining between all users, where the disagreement point is everyone dying[1][2] (this is a natural choice assuming that if we don’t build aligned TAI we get paperclips). This guarantees that every user will receive an outcome that’s at least not worse than death.
With Nash bargaining, we can still get issues for (in)famous people that millions of people want to do unpleasant things to. Their outcome will be better than death, but maybe worse than in my claimed “lower bound”.
With Kalai-Smorodinsky bargaining things look better, since essentially we’re maximizing a minimum over all users. This should admit my lower bound, unless it is somehow disrupted by enormous asymmetries in the maximal payoffs of different users.
In either case, we might need to do some kind of outlier filtering: if e.g. literally every person on Earth is a user, then maybe some of them are utterly insane in ways that cause the Pareto frontier to collapse.
[EDIT: see improved solution]
Bargaining assumes we can access the utility function. In reality, even if we solve the value learning problem in the single user case, once you go to the multi-user case it becomes a mechanism design problem: users have incentives to lie / misrepresent their utility functions. A perfect solution might be impossible, but I proposed mitigating this by assigning each user a virtual “AI lawyer” that provides optimal input on their behalf into the bargaining system. In this case they at least have no incentive to lie to the lawyer, and the outcome will not be skewed in favor of users who are better in this game, but we don’t get the optimal bargaining solution either.
All of this assumes the TAI is based on some kind of value learning. If the first-stage TAI is based on something else, the problem might become easier or harder. Easier because the first-stage TAI will produce better solutions to the multi-user problem for the second-stage TAI. Harder because it can allow the small group of people controlling it to impose their own preferences.
For IDA-of-imitation, democratization seems like a hard problem because the mechanism by which IDA-of-imitation solves AI risk is precisely by empowering a small group of people over everyone else (since the source of AI risk comes from other people launching unaligned TAI). Adding transparency can entirely undermine safety.
For quantilized debate, adding transparency opens us to an attack vector where the AI manipulates public opinion. This significantly lowers the optimization pressure bar for manipulation, compared to manipulating the (carefully selected) judges, which might undermine the key assumption that effective dishonest strategies are harder to find than effective honest strategies.
This can be formalized by literally having the AI consider the possibility of optimizing for some unaligned utility function. This is a weird and risky approach but it works to 1st approximation.
An alternative choice of disagreement point is maximizing the utility of a randomly chosen user. This has advantages and disadvantages.
Assuming each lawyer has the same incentive to lie as its client, it has an incentive to misrepresent that some preferable-to-death outcomes are “worse-than-death” (in order to force those outcomes out of the set of “feasible agreements” in hope of getting a more preferred outcome as the actual outcome), and this at equilibrium is balanced by the marginal increase in the probability of getting “everyone dies” as the outcome (due to feasible agreements becoming a null set) caused by the lie. So the probability of “everyone dies” in this game has to be non-zero.
(It’s the same kind of problem as in the AI race or tragedy of commons: people not taking into account the full social costs of their actions as they reach for private benefits.)
Of course in actuality everyone dying may not be a realistic consequence of failure to reach agreement, but if the real consequence is better than that, and the AI lawyers know this, they would be more willing to lie since the perceived downside of lying would be smaller, so you end up with a higher chance of no agreement.
Yes, it’s not a very satisfactory solution. Some alternative/complementary solutions:
Somehow use non-transformative AI to do my mind uploading, and then have the TAI to learn by inspecting the uploads. Would be great for single-user alignment as well.
Somehow use non-transformative AI to create perfect lie detectors, and use this to enforce honesty in the mechanism. (But, is it possible to detect self-deception?)
Have the TAI learn from past data which wasn’t affected by the incentives created by the TAI. (But, is there enough information there?)
Shape the TAI’s prior about human values in order to rule out at least the most blatant lies.
Some clever mechanism design I haven’t thought of. The problem with this is, most mechanism designs rely on money and money that doesn’t seem applicable, whereas when you don’t have money there are many impossibility theorems.
This seems near guaranteed to me: a non-zero amount of people will be that crazy (in our terms), so filtering will be necessary.
Then I’m curious about how we draw the line on outlier filtering. What filtering rule do we use? I don’t yet see a good principled rule (e.g. if we want to throw out people who’d collapse agreement to the disagreement point, there’s more than one way to do that).
Depending what we mean by ‘crazy’ I think that’s unlikely—particularly when what we care about here are highly unusual moral stances. I’d see intelligence as a multiplier, rather than something which points you in the ‘right’ direction. Outliers will be at both extremes of intelligence—and I think you’ll get a much wider moral variety on the high end.
For instance, I don’t think you’ll find many low-intelligence antinatalists—and here I mean the stronger, non-obvious claim: not simply that most people calling themselves antinatalists, or advocating for antinatalism will have fairly high intelligence, but rather that most people with such a moral stance (perhaps not articulated) will have fairly high intelligence.
Generally, I think there are many weird moral stances you might think your way into that you’d be highly unlikely to find ‘naturally’ (through e.g. absorption of cultural norms).
I’d also expect creativity to positively correlate with outlier moralities. Minds that habitually throw together seven disparate concepts will find crazier notions than those which don’t get beyond three.
First, I think we want to be thinking in terms of [personal morality we’d reflectively endorse] rather than [all the base, weird, conflicting… drivers of behaviour that happen to be in our heads].
There are things most of us would wish to change about ourselves if we could. There’s no sense in baking them in for all eternity (or bargaining on their behalf), just because they happen to form part of what drives us now. [though one does have to be a bit careful here, since it’s easy to miss the upside of qualities we regard as flaws]
With this in mind, reflectively endorsed antinatalism really is a problem: yes, some people will endorse sacrificing everything just to get to a world where there’s no suffering (because there are no people).
Note that the kinds of bargaining approach Vanessa is advocating are aimed at guaranteeing a lower bound for everyone (who’s not pre-filtered out) - so you only need to include one person with a particularly weird view to fail to reach a sensible bargain. [though her most recent version should avoid this]
For a utilitarian, this doesn’t mean much. What’s much more important is something like, “How close is this outcome to an actual (global) utopia (e.g., with optimized utilitronium filling the universe), on a linear scale?” For example, my rough expectation (without having thought about it much) is that your “lower bound” outcome is about midway between paperclips and actual utopia on a logarithmic scale. In one sense, this is much better than paperclips, but in another sense (i.e., on the linear scale), it’s almost indistinguishable from paperclips, and a utilitarian would only care about the latter and therefore be nearly as disappointed by that outcome as paperclips.
I want to add a little to my stance on utilitarianism. A utilitarian superintelligence would probably kill me and everyone I love, because we are made of atoms that could be used for minds that are more hedonic[1][2][3]. Given a choice between paperclips and utilitarianism, I would still choose utilitarianism. But, if there was a utilitarian TAI project along with a half-decent chance to do something better (by my lights), I would actively oppose the utilitarian project. From my perspective, such a project is essentially enemy combatants.
One way to avoid it is by modifying utilitarianism to only place weight on currently existing people. But this is already not that far from my cooperative bargaining proposal (although still inferior to it, IMO).
Another way to avoid it is by postulating some very strong penalty on death (i.e. discontinuity of personality). But this is not trivial to do, especially without creating other problems. Moreover, from my perspective this kind of thing is hacks trying to work around the core issue, namely that I am not a utilitarian (along with the vast majority of people).
A possible counterargument is, maybe the superhedonic future minds would be sad to contemplate our murder. But, this seems too weak to change the outcome, even assuming that this version of utilitarianism mandates minds who would want to know the truth and care about it, and that this preference is counted towards “utility”.
This seems like a reasonable concern about some types of hedonic utilitarianism. To be clear, I’m not aware of any formulation of utilitarianism that doesn’t have serious issues, and I’m also not aware of any formulation of any morality that doesn’t have serious issues.
Just to be clear, this isn’t in response to something I wrote, right? (I’m definitely not advocating any kind of “utilitarian TAI project” and would be quite scared of such a project myself.)
So what are you (and them) then? What would your utopia look like?
No! Sorry, if I gave that impression.
Well, I linked my toy model of partiality before. Are you asking about something more concrete?
Yeah, I mean aside from how much you care about various other people, what concrete things do you want in your utopia?
I have low confidence about this, but my best guess personal utopia would be something like: A lot of cool and interesting things are happening. Some of them are good, some of them are bad (a world in which nothing bad ever happens would be boring). However, there is a limit on how bad something is allowed to be (for example, true death, permanent crippling of someone’s mind and eternal torture are over the line), and overall “happy endings” are more common than “unhappy endings”. Moreover, since it’s my utopia (according to my understanding of the question, we are ignoring the bargaining process and acausal cooperation here), I am among the top along those desirable dimensions which are zero-sum (e.g. play an especially important / “protagonist” role in the events to the extent that it’s impossible for everyone to play such an important role, and have high status to the extent that it’s impossible for everyone to have such high status).
First, you wrote “a part of me is actually more scared of many futures in which alignment is solved, than a future where biological life is simply wiped out by a paperclip maximizer.” So, I tried to assuage this fear for a particular class of alignment solutions.
Second… Yes, for a utilitarian this doesn’t mean “much”. But, tbh, who cares? I am not a utilitarian. The vast majority of people are not utilitarians. Maybe even literally no one is an (honest, not self-deceiving) utilitarian. From my perspective, disappointing the imaginary utilitarian is (in itself) about as upsetting as disappointing the imaginary paperclip maximizer.
Third, what I actually want from multi-user alignment is a solution that (i) is acceptable to me personally (ii) is acceptable to the vast majority of people (at least if they think through it rationally and are arguing honestly and in good faith) (iii) is acceptable to key stakeholders (iv) as much as possible, doesn’t leave any Pareto improvements on the table and (v) sufficiently Schelling-pointy to coordinate around. Here, “acceptable” means “a lot better than paperclips and not worth starting an AI race/war to get something better”.
I’m not a utilitarian either, because I don’t know what my values are or should be. But I do assign significant credence to the possibility that something in the vincinity of utilitarianism is the right values (for me, or period). Given my uncertainties, I want to arrange the current state of the world so that (to the extent possible), whatever I end up deciding my values are, through things like reason, deliberation, doing philosophy, the world will ultimately not turn out to be a huge disappointment according to those values. Unfortunately, your proposed solution isn’t very reassuring to this kind of view.
It’s quite possible that I (and people like me) are simply out of luck, and there’s just no feasible way to do what we want to do, but it sounds like you think I shouldn’t even want what I want, or at least that you don’t want something like this. Is it because you’re already pretty sure what your values are or should be, and therefore think there’s little chance that millennia from now you’ll end up deciding that utilitarianism (or NU, or whatever) is right after all, and regret not doing more in 2021 to push the world in the direction of [your real values, whatever they are]?
I’m moderately sure what my values are, to some approximation. More importantly, I’m even more sure that, whatever my values are, they are not so extremely different from the values of most people that I should wage some kind of war against the majority instead of trying to arrive at a reasonable compromise. And, in the unlikely event that most people (including me) will turn out to be some kind of utilitarians after all, it’s not a problem: value aggregation will then produce a universe which is pretty good for utilitarians.
Maybe you’re just not part of the target audience of my OP then… but from my perspective, if I determine my values through the kind of process described in the first quote, and most people determine their values through the kind of process described in the second quote, it seems quite likely that the values end up being very different.
The kind of solution I have in mind is not “waging war” but for example, solving metaphilososphy and building an AI that can encourage philosophical reflection in humans or enhance people’s philosophical abilities.
What if you turn out to be some kind of utilitarian but most people don’t (because you’re more like the first group in the OP and they’re more like the second group), or most people will eventually turn out to be some kind of utilitarian in a world without AI, but in a world with AI, this will happen?
I don’t think people determine their values through either process. I think that they already have values, which are to a large extent genetic and immutable. Instead, these processes determine what values they pretend to have for game-theory reasons. So, the big difference between the groups is which “cards” they hold and/or what strategy they pursue, not an intrinsic difference in values.
But also, if we do model values as the result of some long process of reflection, and you’re worried about the AI disrupting or insufficiently aiding this process, then this is already a single-user alignment issue and should be analyzed in that context first. The presumed differences in moralities are not the main source of the problem here.
This is not a theory that’s familiar to me. Why do you think this is true? Have you written more about it somewhere or can link to a more complete explanation?
This seems reasonable to me. (If this was meant to be an argument against something I said, there may have been anther miscommuncation, but I’m not sure it’s worth tracking that down.)
I considering writing about this for a while, but so far I don’t feel sufficiently motivated. So, the links I posted upwards in the thread are the best I have, plus vague gesturing in the directions of Hansonian signaling theories, Jaynes’ theory of consciousness and Yudkowsky’s belief in belief.
Isn’t this the main thesis of “The righteous mind”?
This comment seems to be consistent with the assumption that the outcome 1 year after the singularity is locked in forever. But the future we’re discussing here is one where humans retain autonomy (?), and in that case, they’re allowed to change their mind over time, especially if humanity has access to a superintelligent aligned AI. I think a future where we begin with highly suboptimal personal utopias and gradually transition into utilitronium is among the more plausible outcomes. Compared with other outcomes where Not Everyone Dies, anyway. Your credence may differ if you’re a moral relativist.
What if the humans ask the aligned AI to help them be more moral, and part of what they mean by “more moral” is having fewer doubts about their current moral beliefs? This is what a “status game” view of morality seems to predict, for the humans whose status games aren’t based on “doing philosophy”, which seems to be most of them.
I don’t have any reason why this couldn’t happen. My position is something like “morality is real, probably precisely quantifiable; seems plausible that in the scenario of humans with autonomy and aligned AI, this could lead to an asymmetry where more people tend toward utilitronium over time”. (Hence why I replied, you didn’t seem to consider that possibility.) I could make up some mechanisms for this, but probably you don’t need me for that. Also seems plausible that this doesn’t happen. If it doesn’t happen, maybe the people who get to decide what happens with the rest of the universe tend toward utilitronium. But my model is widely uncertain and doesn’t rule out futures of highly suboptimal personal utopias that persist indefinitely.
I’m interested in your view on this, plus what we can potentially do to push the future in this direction.
I strongly believe that (1) well-being is objective, (2) well-being is quantifiable, and (3) Open Individualism is true (i.e., the concept of identity isn’t well-defined, and you’re subjectively no less continuous with the future self if any other person than your own future self).
If (1-3) are all true, then utilitronium is the optimal outcome for everyone even if they’re entirely selfish. Furthermore, I expect an AGI to figure this out, and to the extent that it’s aligned, it should communicate that if it’s asked. (I don’t think an AGI will therefore decide to do the right thing, so this is entirely compatible with everyone dying if alignment isn’t solved.)
In the scenario where people get to talk to the AGI freely and it’s aligned, two concrete mechanisms I see are (a) people just ask the AGI what is morally correct and it tells them, and (b) they get some small taste of what utilitronium would feel like, which would make it less scary. (A crucial piece is that they can rationally expect to experience this themselves in the utilitronium future.)
In the scenario where people don’t get to talk to the AGI, who knows. It’s certainly possible that we have singleton scenario with a few people in charge of the AGI, and they decide to censor questions about ethics because they find the answers scary.
The only org I know of that works on this and shares my philosophical views is QRI. Their goal is to (a) come up with a mathematical space (probably a topological one, mb a Hilbert space) that precisely describes the subjective experience of someone, (b) find a way to put someone in the scanner and create that space, and (c) find a property of that space that corresponds to their well-being in that moment. The flag ship theory is that this property is symmetry. Their model is stronger than (1-3), but if it’s correct, you could get hard evidence on this before AGI since it would make strong testable predictions about people’s well-being (and they think it could also point to easy interventions, though I don’t understand how that works). Whether it’s feasible to do this before AGI is a different question. I’d bet against it, but I think I give it better odds than any specific alignment proposal. (And I happen to know that Mike agrees that the future is dominated by concerns about AI and thinks this is the best thing to work on.)
So, I think their research is the best bet for getting more people on board with utilitronium since it can provide evidence on (1) and (2). (Also has the nice property that it won’t work if (1) or (2) are false, so there’s low risk of outrage.) Other than that, write posts arguing for moral realism and/or for Open Individualism.
Quantifying suffering before AGI would also plausibly help with alignment, since at least you can formally specify a broad space of outcomes you don’t want. though it certainly doesn’t solve it, e.g. because of inner optimizers.
We already have a solution to this: money. It’s also the only solution that satisfies some essential properties such as sybil orthogonality (especially important for posthuman/AGI societies).
It’s part of alignment. Also, it seems mostly separate from the part about “how do you even have consequentialism powerful enough to make, say, nanotech, without killing everyone as a side-effect?”, and the latter seems not too related to the former.
… I really wanted to say [citation needed], but then you did provide citations, but then the citations were not compelling to me.
I’m pretty opposed to such universal claims being made about humans without pushback, because such claims always seem to me to wish-away the extremely wide variation in human psychology and the difficulty establishing anything like “all humans experience X.”
There are people who have no visual imagery, people who do not think in words, people who have no sense of continuity of self, people who have no discernible emotional response to all sorts of “emotional” stimuli, and on and on and on.
So, I’ll go with “it makes sense to model people as if every one of them is motivated by structures built atop the status game.” And I’ll go with “it seems like the status architecture is a physiological near-universal, so I have a hard time imagining what else people’s morality might be made of.” And I’ll go with “everyone I’ve ever talked to had morality that seemed to me to cash out to being statusy, except the people whose self-reports I ignored because they didn’t fit the story I was building in my head.”
But I reject the blunt universal for not even pretending that it’s interested in making itself falsifiable.
Kind of frustrating that this high karma reply to a high karma comment on my post is based on a double misunderstanding/miscommunication:
First Vanessa understood me as claiming that a significant number of people’s morality is not based on status games. I tried to clarify in an earlier comment already, but to clarify some more: that’s not my intended distinction between the two groups. Rather the distinction is that the first group “know or at least suspect that they are confused about morality, and are eager or willing to apply reason and deliberation to find out what their real values are, or to correct their moral beliefs” (they can well be doing this because of the status game that they’re playing) whereas this quoted description doesn’t apply to the second group.
Then you (Duncan) understood Vanessa as claiming that literally everyone’s morality is based on status games, when (as the subsequent discussion revealed) the intended meaning was more like “the number of people whose morality is not based on status games is a lot fewer than (Vanessa’s misunderstanding of) Wei’s claim”.
I think it’s important and valuable to separate out “what was in fact intended” (and I straightforwardly believe Vanessa’s restatement as a truer explanation of her actual position) from “what was originally said, and how would 70+ out of 100 readers tend to interpret it.”
I think we’ve cleared up what was meant. I still think it was bad that [the perfectly reasonable thing that was meant] was said in a [predictably misleading fashion].
But I think we’ve said all that needs to be said about that, too.
This is a tangent (so maybe you prefer to direct this discussion elsewhere), but: what’s with the brackets? I see you using them regularly; what do they signify?
I use them where I’m trying to convey a single noun that’s made up of many words, and I’m scared that people will lose track of the overall sentence while in the middle of the chunk. It’s an attempt to keep the overall sentence understandable. I’ve tried hyphenating such phrases and people find that more annoying.
Hmm, I see, thanks.
It’s not just that the self-reports didn’t fit the story I was building, the self-reports didn’t fit the revealed preferences. Whatever people say about their morality, I haven’t seen anyone who behaves like a true utilitarian.
IMO, this is the source of all the gnashing of teeth about how much % of your salary you need to donate: the fundamental contradiction between the demands of utilitarianism and how much people are actually willing to pay for the status gain. Ofc many excuses were developed (“sure I still need to buy that coffee or those movie tickets, otherwise I won’t be productive”) but they don’t sound like the most parsimonious explanation.
This is also the source of paradoxes in population ethics and its vicinity: those abstractions are just very remote from actual human minds, so there’s no reason they should produce anything sane in edge cases. Their only true utility is as an approximate guideline for making group decisions, for sufficiently mundane scenarios. Once you get to issues with infinities it becomes clear utilitarianism is not even mathematically coherent, in general.
You’re right that there is a lot of variation in human psychology. But it’s also an accepted practice to phrase claims as universal when what you actually mean is, the exceptions are negligible for our practical purpose. For example, most people would accept “humans have 2 arms and 2 legs” as a true statement in many contexts, even though some humans have less. In this case, my claim is that the exceptions are much rarer than the OP seems to imply (i.e. most people the OP classifies as exceptions are not really exceptions).
I’m all for falsifiability, but it’s genuinely hard to do falsifiability in soft topics like this, where no theory makes very sharp predictions and collecting data is hard. Ultimately, which explanation is more reasonable is going to be at least in part an intuitive judgement call based on your own experience and reflection. So, yes, I certainly might be wrong, but what I’m describing is my current best guess.
The equivalent statement would be “In reality, everyone has 2 arms and 2 legs.”
Well, if the OP said something like “most people have 2 eyes but enlightened Buddhists have a third eye” and I responded with “in reality, everyone have 2 eyes” then, I think my meaning would be clear even though it’s true that some people have 1 or 0 eyes (afaik maybe there is even a rare mutation that creates a real third eye). Not adding all possible qualifiers is not the same as “not even pretending that it’s interested in making itself falsifiable”.
I think your meaning would be clear, but “everyone knows what this straightforwardly false thing that I said really meant” is insufficient for a subculture trying to be precise and accurate and converge on truth. Seems like more LWers are on your side than on mine on that question, but that’s not news. ¯\_(ツ)_/¯
It’s a strawman to pretend that “please don’t say a clearly false thing” is me insisting on “please include all possible qualifiers.” I just wish you hadn’t said a clearly false thing, is all.
Natural language is not math, it’s inherently ambiguous and it’s not realistically possible to always be precise without implicitly assuming anything about the reader’s understanding of the context. That said, it seems like I wasn’t sufficiently precise in this case, so I edited my comment. Thank you for the correction.
The tradeoff is with verbosity and difficulty of communication, it’s not always a straightforward Pareto improvement. So in this case I fully agree with dropping “everyone” or replacing it with a more accurate qualifier. But I disagree with a general principle that would discount ease for a person who is trained and talented in relevant ways. New habits of thought that become intuitive are improvements, checklists and other deliberative rituals that slow down thinking need merit that overcomes their considerable cost.
That looks like a No True Scotsman argument to me. Just because the extreme doesn’t exist doesn’t mean that all of the scale can be explained by status games.
What does it have to do with “No True Scotsman”? NTS is when you redefine your categories to justify your claim. I don’t think I did that anywhere.
First, I didn’t say all the scale is explained by status games, I did mention empathy as well.
Second, that by itself sure doesn’t mean much. Explaining all the evidence would require an article, or maybe a book (although I hoped the posts I linked explain some of it). My point here is that there is an enormous discrepancy between the reported morality and the revealed preferences, so believing self-reports is clearly a non-starter. How do you build an explanation not from self-reports is a different (long) story.
I agree that there is an enormous discrepancy.
If you try to quantify it, humans on average probably spend over 95% (Conservative estimation) of their time and resources on non-utilitarian causes. True utilitarian behavior Is extremely rare and all other moral behaviors seem to be either elaborate status games or extended self-interest [1]. The typical human is way closer under any relevant quantified KPI to being completely selfish than being a utilitarian.
[1] - Investing in your family/friends is in a way selfish, from a genes/alliances (respectively) perspective.
What does this even mean? If someone says they don’t want X, and they never take actions that promote X, how can it be said that they “truly” want X? It’s not their stated preference or their revealed preference!