I find that the day-to-day benefits in terms of suffering less already make meditation worth it. I’m not sure why rebirth would be necessary if my current life can already be made much better.
Kaj_Sotala(Kaj Sotala)
Interesting. I’ve generally had the position that while the Buddha was no doubt a great innovator, he was still just one guy who lived thousands of years ago. And that it’d probably be a better use of time to read all the more recent meditation teachers who can draw on the thousands of years of progress since then, and can synthesize the most valuable pieces of Buddha’s teaching with the things that have been learned since then.
Nevertheless, your challenge made me interested and I got a copy of “In the Words of the Buddha” now. It does seem interesting.
Ah that’s the one, thanks! Edited the link into the post.
At this moment my best guess is that kindness is generally nice, but there is also such thing as too much kindness. I want my friends to be nice to me, but I also want them to be able to defend me and themselves from our potential enemies; and I need some credible signal that they could do that if necessary. (Or, as Jordan Peterson would say, “good” is not the same as “harmless”. A good person is one who could hurt you, but chooses not to.) There is “kindness” that is a choice, and “kindness” that is a strategy to survive by avoiding conflict. I suppose the former is attractive, and the latter is not. But to make it clear that your kindness is a choice, you must sometimes be visibly not-kind.
Yeah this sounds right to me.
The book Altered Traits summarized some of the research on meditators (though if I recall correctly, it also caveated this with saying that most of the research wasn’t very high-quality), e.g.:
Sticking with meditation over the years offers more benefits as meditators reach the long-term range of lifetime hours, around 1,000 to 10,000 hours. This might mean a daily meditation session, and perhaps annual retreats with further instruction lasting a week or so—all sustained over many years. The earlier effects deepen, while others emerge.
For example, in this range we see the emergence of neural and hormonal indicators of lessened stress reactivity. In addition, functional connectivity in the brain in a circuit important for emotion regulation is strengthened, and cortisol, a key hormone secreted by the adrenal gland in response to stress, lessens.
Loving-kindness and compassion practice over the long term enhance neural resonance with another person’s suffering, along with concern and a greater likelihood of actually helping. Attention, too, strengthens in many aspects with long-term practice: selective attention sharpens, the attentional blink diminishes, sustained attention becomes easier, and an alert readiness to respond increases. And long-term practitioners show enhanced ability to down-regulate the mind-wandering and self-obsessed thoughts of the default mode, as well as weakening connectivity within those circuits—signifying less self-preoccupation. These improvements often show up during meditative states, and generally tend to become traits.
Shifts in very basic biological processes, such as a slower breath rate, occur only after several thousand hours of practice. Some of these impacts seem more strongly enhanced by intensive practice on retreat than by daily practice.
lsusr’s review has more quotes.
If they successfully get their preferences (a.k.a. “needs”, etc.) acknowledged and accommodated, that is a distinct advantage relative to the base case where they are just a regular person with no special needs! This seems very obvious!
I mean, yes, but having that preference gives them no special advantage relative to not having it.
Suppose that I have a gluten intolerance and need to have gluten-free food available. It’s of course true that getting to have gluten-free food is now a distinct advantage to me, compared to a scenario where I couldn’t get gluten-free food. But the fact that I have a gluten intolerance doesn’t make me better off overall. If I get accommodated, then at best I get to the same neutral level of “can eat food without getting terrible symptoms” that everyone else is at. And more realistically, I won’t even get to that zero level but rather will sometimes accidentally eat food with gluten, will miss out on tasty foods I’d enjoy, etc., so it’d be better for me to not have the intolerance.
Likewise, if someone gets terribly upset about being told “you’re wrong”, then if that’s accommodated, at best they get to the same zero level as everyone who doesn’t get terribly upset about it. And it’s more likely that they won’t get perfectly accommodated, so not only will they gain nothing, but will also need to endure discomfort they wouldn’t need to endure if they didn’t have that sensitivity. So if they don’t already have that pre-existing sensitivity, there’s no incentive for them to develop it.
I do not think that this is the least bit warranted.
Why not? I know plenty of otherwise intelligent, creative etc. people who also have serious mental health problems.
That’s odd. (Note that while this excerpt was already present in the first edition, it was also retained in the second edition.)
I don’t have a complete model of what exactly is going on either. My current guess is that there are something like two different layers of motivation in the brain. One calculates expected utilities in a relatively unbiased manner and meditation doesn’t really affect that one much, but then there’s another layer on top of that which notices particularly high-utility (positive or negative) scenarios and gives them disproportionate weight. That second one tends to mess things up and is the one that meditation seems to weaken.
It looks to me like weakening the second thing tends to make one’s decisions purely better, and more likely for the brain to just do the correct expected utility calculations. I acknowledge that this is very weird and implausible-sounding, because why would the brain develop a second layer of motivation that just messes things up?
My strong suspicion at the moment is that it has to do with social strategies. Calculating expected utilities wrong is normally just bad, but it can be beneficial if other agents are modeling you and making decisions based on their models of you. So if you end up believing that an actually impossible outcome is possible, you may not be able to ever achieve that outcome. But your opponents who see that you are impossible to reason with may still give in, letting you get at least somewhat closer to that outcome than as if you’d been reasonable.
I have some posts with more speculation about these things here and here.
o Because you talk quite a bit about what will or will not fix this student’s problem, whether it is easy or hard for this student to “get over it”, how long it would take for this student to fix the problem, what this student’s experience is like, etc., etc. (Likewise the author of the quoted original post.)
These concerns all belong to a corrective view. From a selective view, all of them are irrelevant. What is relevant instead is questions like: supposing that we adopt a policy of granting such requests, what will the student body (likewise: the class, the results of the course, etc.) be like? If we adopt a policy of not granting such requests, what will said things be like? And so on.
So, you say that you thought that selective ways would be a bad way of dealing with the issue. Well, that’s a perfectly coherent view. But it needs to be argued for, not just left as a totally unstated background assumption.
Fair enough, I’ll elaborate then.
You’re thinking about the selective view in a way that focuses on people who have the incentive to make these kinds of requests. Meanwhile I’m more focused on the people who wouldn’t have an incentive to make these kinds of requests, if not for the fact that their trauma makes it excessively costly for them to not have their requests accommodated. It’s not even the case that, as your link to the Vladimir_M comment implies, that they have developed this as an unconscious strategy for getting something that makes them better off. Having this problem just makes them worse off.
If such people make requests that are low-cost for us to grant, while very costly for them if we do not grant them, that has the consequence of making it significantly costlier and harder for them to graduate. And even if they do manage to graduate, it may make it significantly harder for them to overcome their problems later in life. So then one selective effect is that the school as a whole loses people who would otherwise have been good students. But also on a broader level, society as a whole suffers as these people become increasingly worse off and less capable of contributing, when a more accommodating policy could have allowed them to thrive.
(This is a somewhat personal topic for me, as I have a close relative who had to drop out of university due to mental health problems and ultimately ended up on an early disability pension when they would have preferred to be working. And things wouldn’t have needed to go very differently for something similar to happen to me.)
So my view is that it’s bad to take an “ignore/deny” type approach because the selective effects have an overall negative composition on both the student body of that particular school, as well as on the composition of society in general.
Instituting a policy of making, and requiring, such adjustments, selects against the latter sort of individual. (There’s that selective view again!)
Some of my best teachers, instructors, professors, etc. have seemed very much like the latter sort.
That’s true. Though if these people have difficulty making accommodations that would make some of their students significantly better off, then that seems to me like evidence that they’re not among the best teachers, and that it might be good to select against them.
Also this bit sounds like we are now talking about something like… what university-level policies should be like, and while that’s not an unreasonable direction to be going in, I was originally thinking of this just as a discussion of the choices of this particular teacher. It seems to me like one should first get on the same page about whether his specific decision was good or bad, before trying to consider possible policy implications. (Since it can be that a particular choice made individually is good or bad, but trying to create an institution-wide policy that everyone/no-one should make the same choice has unintended effects; but one can’t talk about the policy-level tradeoffs before first knowing what the effects on the individual level are.)
For another thing, some obvious questions we could ask would be: ok, so you’re going to not ever tell this student that she’s wrong. Are you also not going to tell the other students that they’re wrong? What if they ask you “so, am I wrong about this?”, or “so, is this wrong?”? What if they ask you “so, what so-and-so just said—that’s wrong, isn’t it?” You might be tempted to describe various evasions which you could use, but now suppose that another student is (again, to take just one possible example) on the autism spectrum, and has a difficult time understanding euphemisms, indirection, etc.? Suppose that they’re not a native speaker? Suppose that they’re on the spectrum and they’re not a native speaker?
It’s easy to generate lots of “what ifs”, but a long list of them doesn’t mean they’d be hard to answer in practice. Your first two questions in particular seem to me to have obvious answers—automatically extending the policy to all students only makes sense, but if someone explicitly wants to know if they are wrong or not, then they presumably won’t mind being told if they are.
On my Facebook wall where I originally shared this post, various people chimed in with their experiences from either teaching or work, and several mentioned that this wouldn’t really come up for them because they had never had a reason to tell anyone “you’re wrong” anyway. E.g. one person said:
I don’t have much experience of university teaching but I do have some, and I don’t recall ever having to tell a student “you are wrong”, or even words with similar meaning. That’s simply because no student has ever presented a claim that has simple yes or no truth value. The closest I have come to “you are wrong” is something like “it’s partially like you said, but also...” My academic field is history, if that matters.
That said, even if a student made a clearly wrong statement, I would probably state my answer as something like, “an interesting point but it didn’t go quite like that”. I haven’t thought about this in any kind of “woke” scenario, for me it’s just simply a part of good manners that a teacher shouldn’t directly embarass a student.
And another said:
I don’t remember when was the last time at my job I told anyone they are wrong—and I don’t remember when anyone told me I’m wrong. And I work with a lot of people from many walks of life.
There have been plenty of cases where a person (me or someone else) says “here’s how I see it” and the other goes “Ooh, right”. Telling the other person they’re wrong is both useless and counter productive. [...]
Here’s how I see things happening in the places I go to for work. People skip the “that’s not right” or “that’s wrong”, because it does not add anything to the discussion.
Instead depending on the veracity of their own claim, they might say “This is how it is”, “Have you read this piece of research that shows A and B?”. There’s no need to rub the fact that one person was wrong on anybody’s face. It is especially bad form, if one tries to convince the other person to change their mind.
While not everyone agreed, many people seemed to have the position that avoiding “you’re wrong”-type language is just a net improvement and that they’ve never ran into a situation where one would need to use it (and I think at least some of those people do also work in fields with people on the spectrum). This also matches my own experience, and makes me skeptical of how likely it is for any of your scenarios to turn out to be a problem in practice.
If anything, the systematic effects seem positive, in that applying the same policy to all students would likely also make several others feel more comfortable.
Of course, it would be silly to argue that there could never be any costs associated with this. It’s certainly likely that trying to accommodate, if not this particular request, then some other request of vaguely the same type would eventually get us into a challenging scenario. But if that happens, one can figure out the best course of action when it happens, and then possibly re-evaluate the overall approach if it does turn out to be more costly.
In any case, it would seem bad to me if this teacher, in considering whether to grant the request, would conclude that he has to deny it because of some complicated hypothetical scenario that assumed other students reacted in very specific ways. Given that it’s also very plausible that none of that happens and everything goes just fine.
In short, it is very easy to say that making some adjustment would be nearly costless. It is quite another thing to actually consider all the costs.
Fair enough, so I’ll rephrase: there seem to be very small immediate costs for this particular teacher to accommodate the request. It is of course possible that some major unexpected cost comes up, but if that happens, he can consider them and then shift his approach accordingly.
And I haven’t even gotten to the distortionary effect on your own cognition, from not just avoiding such straightforward constructions as “you’re wrong” (and similar), but automatically avoiding them! Perhaps one could make an argument that having your brain automatically flinch away from responding to a clearly wrong statement with “that’s wrong” is not as bad as it sounds… but that certainly wouldn’t be the way I’d bet.
I have a hard time seeing this as particularly problematic, given the previously mentioned view that saying “you’re wrong” just doesn’t seem to be necessary in general. “Having your brain automatically flinch away from” something sounds bad if you phrase it like that, but part of the process of acquiring any skill is to learn to automatically flinch away from performing the skill in a bad way. Similarly, if there’s little reason to ever say “you’re wrong” while there are good reasons to avoid it, then I see this less as distorting cognition and more as optimizing it (or more specifically optimizing the skill of good communication).
Yep. (Of course “ignore” can be finessed, but you’ve certainly got the gist of it.)
Okay. In that case I’m not sure why you say that my commentary is dismissing the selective ways of looking at the issue. When I wrote it, the two options I had in mind were “agree to the request” or “deny/ignore the request”. (I mean, I did “dismiss” the selective ways in the sense that I thought they’d be a bad way of dealing with the issue, but I interpret you to have meant “dismiss” in the sense of “not consider at all”.)
I strongly disagree with your evaluation of the cost, and am surprised to see you make such a claim. Surely you must know that “just” using (“slightly”) different language is far from costless?
In the general case it’s not necessarily costless. But in this particular case, at least her teacher seemed to think that it was, if not literally costless, then a very small cost. And if I was in his position, I think it would be a very negligible cost for me to implement. Remembering to use different language would take a little bit of effort at first, but then become automatic very quickly.
(I’m again assuming that the request can be incorporated while still delivering all of the teaching and grading etc. essentially unchanged. If it was the case that she couldn’t easily be corrected on mistakes in assignments or something similar, that would substantially change my position.)
Assume the cost of one accommodation to be “basically zero”. It does not follow from this that the expected total cost of all requested accommodations, conditional on the policy of “grant requested accommodations” being instituted, will be “basically zero”, or even “less than astronomical”. (Indeed, we can observe that such an optimistic prediction turns out to be manifestly false.)
I agree. I did not mean to argue for a blanket policy of granting all requested accommodations, nor did I interpret the original poster to be arguing for that.
I think that you [Kaj Sotala] are using the phrase “ordinary human desires” to refer to what I conceptualize of as non-desire “preference-likes”.
Cool. Yeah, that was what I meant.
The commentary (both the quote of the original, and your own) seems to entirely dismiss the possibility of selective ways of looking at the issue, opting to only consider corrective approaches (which indeed are unlikely to work) and (in a rather one-sided way) structural approaches.
Would the selective approach in this case be something like “ignore the request and let them drop out if they can’t handle that”, or something else?
My own view is that the analysis given in the original source is extremely bad, and the approach described therein creates horrible incentives, in ways well understood. The world absolutely is worse because the student in question declares a need for accommodation, and gets it.
I agree that there are definite incentive problems to take into account. However in this specific case, where the cost of accommodation is basically zero (just using slightly different language), I don’t think they’re an issue.
Thanks for checking the sources in that article! I hadn’t done that.
I now took a quick look at the first paper as well. While “kindness” did only have two hits, searching for “kind” also brought up this bit from the article itself:
4.1. Qualifications and limitations [...]
Several important qualifications must attend the interpretation of these findings. [...] Third, neither earning potential nor physical appearance emerged as the highest rated or ranked characteristic for either sex, even though these characteristics showed large sex differences. Both sexes ranked the characteristics “kind-understanding” and “intelligent” higher than earning power and attractiveness in all samples, suggesting that species-typical mate preferences may be more potent than sex-linked preferences
EDIT: Looks like the second paper was accessible via sci-hub; it has this:
Mate preferences were standardized across countries prior to analysis, so this and all b values can be interpreted as equivalent to Cohen’s ds. The average for women was 5.48, 95% CI = [5.46, 5.51], and the average for men was 5.11, 95% CI = [5.08, 5.14]. The smallest sex difference was in Spain, b = −0.12, and the largest sex difference was in China, b = −0.56. Furthermore, men reported a higher preference for a physically attractive ideal mate than women, on average, b = 0.27, SE = 0.03, p < .001. The average for women was 5.56, 95% CI = [5.53, 5.58], and the average for men was 5.85, 95% CI = [5.83, 5.88]. The sex difference (b) ranged from −0.07 in China to 0.50 in Brazil.
Furthermore, we found small but still-significant sex differences in reported ideal preference for kindness, intelligence, and health. However, both men and women reported higher preferences for these traits in an ideal partner than for good financial prospects or for physical attractiveness. Women reported preferences for kinder ideal mates than men, on average, b = −0.12, SE = 0.02, p < .001. The average for women was 6.23, 95% CI = [6.21, 6.26], and the average for men was 6.12, 95% CI = [6.10, 6.15]. The sex difference (b) ranged from −0.23 in the United States to 0.06 in Uganda. Women also reported preferences for greater intelligence in ideal mates, on average, b = −0.12, SE = 0.02, p < .001. The average for women was 6.03, 95% CI = [6.01, 6.05], and the average for men was 5.92, 95% CI = [5.89, 5.94]. The sex difference (b) ranged from −0.35 in China to 0.04 in Algeria.
Suffering subroutines—maybe 10-20% likely. i don’t think suffering reduces to “pre-determined response patterns for undesirable situations,” because i can think of simple algorithmic examples of that which don’t seem like suffering.
Yeah, I agree with this to be clear. Our intended claim wasn’t that just “pre-determined response patterns for undesirable situations” would be enough for suffering. Actually, there were meant to be two separate claims, which I guess we should have distinguished more clearly:
1) If evolution stumbled on pain and suffering, those might be relatively easy and natural ways to get a mind to do something. So an AGI that built other AGIs might also build them to experience pain and suffering (that it was entirely indifferent to), if that happened to be an effective motivational system.
2) If this did happen, then there’s also some speculation suggesting that an AI that wanted to stay in charge might not want to give its worker AGIs things much in the way of things that looked like positive emotions, but did have a reason to give them things that looked like negative emotions. Which would then tilt the balance of pleasure vs. pain in the post-AGI world much more heavily in favor of (emotional) pain.
Now the second claim is much more speculative and I don’t even know if I’d consider it a particularly likely scenario (probably not); we just put it in since much of the paper was just generally listing various possibilities of what might happen. But the first claim—that since all the biological minds we know of seem to run on something like pain and pleasure, we should put a substantial probability on AGI architectures also ending up with something like that—seems much stronger to me.
I’m not sure if I’ve had the same global insight as lsusr has, but I feel like I’ve had local experiences taking me more in that direction. My experience has been that the thing that’s being shed is more accurately described as “rationalization” than “desire”.
E.g. in Fabricated Options, Duncan talks about situations where all the options available to people have downsides they don’t like. So then some people think that there should be an option that had only upsides, refusing to accept that there might not be any such thing. So if you stop doing that, then you lose the desire that reality should be something else than what it is. And then you can actually achieve your desires better, since you see what reality is actually like. Even if this does also require you to acknowledge the fact that you do have to let go of some of your original “get me only the upsides” desires—but those were the kinds of desires that were always impossible to achieve anyway.
You still keep most of your ordinary human desires though. I’ve also seen various advanced meditation teachers say—and this matches my experience—that your natural personality (which includes all of your desires) starts to shine brighter since you also lose the belief relating to “my personality should be something else than what it is”. That doesn’t mean you can’t still work on anger management or whatever, just that you come to see it for what it really is rather than as something you’d want to see it as.
But then also there are various approaches within Buddhism, with some being more actively anti-desire (“renunciation”) than others. So what makes things confusing is that some teachers do say that you should also let go of the things we’d usually call “desires”, conflating those with the rationalization-type desires. Given that lsusr says you understood him perfectly, maybe he subscribes to those schools? That’s unclear to me from his post.
Ingram has actively hunted for any jhana hunters for twenty years and hasn’t found any.
That seems like the opposite of what he wrote in MCTB?
Just to drive this point home, an important feature of concentration practices is that they are not liberating in and of themselves. Even the highest of these states ends. The afterglow from them does not last long. Regular life and reality might even seem like an assault when that afterglow has worn off. However, jhana junkies abound in all traditions and even outside traditions, and many have no idea that this is what they have become. I have a friend who has been lost in the formless realms for over twenty years, attaining them again and again in practice, rationalizing that he is doing Dzogchen practice when he is just staying in the fourth through sixth jhanas, and further rationalizing that the last two formless realms are “emptiness”, and that he is enlightened. This story, or a version of it, repeats countless times. It is a true dharma tragedy.
Unfortunately, as another good friend of mine rightly pointed out, it is almost impossible to reach such people after a while. They get trapped in temporary attainments so exquisite that they have no idea they are in prison, nor do they take at all kindly to suggestions that this may be so, particularly if their identity has become bound up in their false notion that they are a realized being. Chronic jhana junkies are fairly easy to spot, even though they often imagine that they are not. We are all presumably able to take responsibility for our choices in life, so if people want to be jhana junkies, that’s their choice, and the jhanas clearly beat most things one could become addicted to. However, when people don’t realize that this is what they have become and pretend that what they are doing has anything to do with insight practices, that’s a truly lost opportunity to put those attainments into the service of achieving actual realization and true freedom.
You may find Superintelligence as a Cause or Cure for Risks of Astronomical Suffering of interest; among other things, it discusses s-risks that might come about from having unaligned AGI.
Superintelligence is related to three categories of suffering risk: suffering subroutines (Tomasik 2017), mind crime (Bostrom 2014) and flawed realization (Bostrom 2013).
5.1 Suffering subroutines
Humans have evolved to be capable of suffering, and while the question of which other animals are conscious or capable of suffering is controversial, pain analogues are present in a wide variety of animals. The U.S. National Research Council’s Committee on Recognition and Alleviation of Pain in Laboratory Animals (2004) argues that, based on the state of existing evidence, at least all vertebrates should be considered capable of experiencing pain.
Pain seems to have evolved because it has a functional purpose in guiding behavior: evolution having found it suggests that pain might be the simplest solution for achieving its purpose. A superintelligence which was building subagents, such as worker robots or disembodied cognitive agents, might then also construct them in such a way that they were capable of feeling pain—and thus possibly suffering (Metzinger 2015)—if that was the most efficient way of making them behave in a way that achieved the superintelligence’s goals.
Humans have also evolved to experience empathy towards each other, but the evolutionary reasons which cause humans to have empathy (Singer 1981) may not be relevant for a superintelligent singleton which had no game-theoretical reason to empathize with others. In such a case, a superintelligence which had no disincentive to create suffering but did have an incentive to create whatever furthered its goals, could create vast populations of agents which sometimes suffered while carrying out the superintelligence’s goals. Because of the ruling superintelligence’s indifference towards suffering, the amount of suffering experienced by this population could be vastly higher than it would be in e.g. an advanced human civilization, where humans had an interest in helping out their fellow humans.
Depending on the functional purpose of positive mental states such as happiness, the subagents might or might not be built to experience them. For example, Fredrickson (1998) suggests that positive and negative emotions have differing functions. Negative emotions bias an individual’s thoughts and actions towards some relatively specific response that has been evolutionarily adaptive: fear causes an urge to escape, anger causes an urge to attack, disgust an urge to be rid of the disgusting thing, and so on. In contrast, positive emotions bias thought-action tendencies in a much less specific direction. For example, joy creates an urge to play and be playful, but “play” includes a very wide range of behaviors, including physical, social, intellectual, and artistic play. All of these behaviors have the effect of developing the individual’s skills in whatever the domain. The overall effect of experiencing positive emotions is to build an individual’s resources—be those resources physical, intellectual, or social.
To the extent that this hypothesis were true, a superintelligence might design its subagents in such a way that they had pre-determined response patterns for undesirable situations, so exhibited negative emotions. However, if it was constructing a kind of a command economy in which it desired to remain in control, it might not put a high value on any subagent accumulating individual resources. Intellectual resources would be valued to the extent that they contributed to the subagent doing its job, but physical and social resources could be irrelevant, if the subagents were provided with whatever resources necessary for doing their tasks. In such a case, the end result could be a world whose inhabitants experienced very little if any in the way of positive emotions, but did experience negative emotions. [...]
5.2 Mind crime
A superintelligence might run simulations of sentient beings for a variety of purposes. Bostrom (2014, p. 152) discusses the specific possibility of an AI creating simulations of human beings which were detailed enough to be conscious. These simulations could then be placed in a variety of situations in order to study things such as human psychology and sociology, and be destroyed afterwards.
The AI could also run simulations that modeled the evolutionary history of life on Earth in order to obtain various kinds of scientific information, or to help estimate the likely location of the “Great Filter” (Hanson 1998) and whether it should expect to encounter other intelligent civilizations. This could repeat the wildanimal suffering (Tomasik 2015, Dorado 2015) experienced in Earth’s evolutionary history. The AI could also create and mistreat, or threaten to mistreat, various minds as a way to blackmail other agents. [...]
5.3 Flawed realization
A superintelligence with human-aligned values might aim to convert the resources in its reach into clusters of utopia, and seek to colonize the universe in order to maximize the value of the world (Bostrom 2003a), filling the universe with new minds and valuable experiences and resources. At the same time, if the superintelligence had the wrong goals, this could result in a universe filled by vast amounts of disvalue.
While some mistakes in value loading may result in a superintelligence whose goal is completely unlike what people value, certain mistakes could result in flawed realization (Bostrom 2013). In this outcome, the superintelligence’s goal gets human values mostly right, in the sense of sharing many similarities with what we value, but also contains a flaw that drastically changes the intended outcome.
For example, value-extrapolation (Yudkowsky 2004) and value-learning (Soares 2016, Sotala 2016) approaches attempt to learn human values in order to create a world that is in accordance with those values.
There have been occasions in history when circumstances that cause suffering have been defended by appealing to values which seem pointless to modern sensibilities, but which were nonetheless a part of the prevailing values at the time. In Victorian London, the use of anesthesia in childbirth was opposed on the grounds that being under the partial influence of anesthetics may cause “improper” and “lascivious” sexual dreams (Farr 1980), with this being considered more important to avoid than the pain of childbirth.
A flawed value-loading process might give disproportionate weight to historical, existing, or incorrectly extrapolated future values whose realization then becomes more important than the avoidance of suffering. Besides merely considering the avoidance of suffering less important than the enabling of other values, a flawed process might also tap into various human tendencies for endorsing or celebrating cruelty (see the discussion in section 4), or outright glorifying suffering. Small changes to a recipe for utopia may lead to a future with much more suffering than one shaped by a superintelligence whose goals were completely different from ours.
When you make a claim like “misaligned AIs kill literally everyone”, then reasonable people will be like “but will they?” and you should be a in a position where you can defend this claim.
I think most reasonable people will round off “some humans may be kept as brain scans that may have arbitrary cruelties done to them” to be equivalent to “everyone will be killed (or worse)” and not care about this particular point, seeing it as nitpicking that would not make the scenario any less horrible even if it was true.
Also, it’s a normal part of reasoning to interpret people’s answers in a way that makes sense. If you have a typo or a missing word in your prompt, but your intended meaning is clear, ChatGPT will go with what you obviously meant rather than with what you wrote.
For some of these puzzles, it wouldn’t normally make much sense to ask about the simplified version because the answer is so obvious. Why would you ask “how can the man and the goat cross the river with the boat” if the answer was just “by taking the boat”? So it’s not unreasonable for ChatGPT to assume that you really meant the standard form of the puzzle, and just forgot to describe it in full. Its tendency to pattern-match in that way is the same thing that gives it the ability to ignore people’s typos.
I guess you could reap some benefits out of it even without actually believing in Christianity, but those seem much smaller than the ones you can get out of meditation. Also I think some of the happiness benefit Christians get is from being in a supportive community, and I already have non-religious communities that make me feel happy.