I mean this completely seriously: now that MIRI has changed to the Death With Dignity strategy, is there anything that I or anyone on LW can do to help with said strategy, other than pursue independent alignment research? Not that pursuing alignment research is the wrong thing to do, just that you might have better ideas.
I’ve always thought that something in the context of mental health would be nice.
The idea that humanity is doomed is pretty psychologically hard to deal with. Well, it seems that there is a pretty wide range in how people respond psychologically to it, from what I can tell. Some seem to do just fine. But others seem to be pretty harmed (including myself, not that this is about me; ie. this post literally brought me to tears). So, yeah, some sort of guidance for how to deal with it would be nice.
Plus it’d serve the purpose of increasing the productivity of and mitigating the risk of burnout for AI researchers, thus increasing humanities supposedly slim chances of “making it”. This seems pretty nontrivial to me. AI researchers deal with this stuff on a daily basis. I don’t know much about what sort of mental states are common for them, but I’d guess that something like 10-40% of them are hurt pretty badly. In which case better mental health guidance would yield pretty substantial improvements in productivity. Unfortunately, I think it’s quite the difficult thing to “figure out” though, and for that reason I suspect it isn’t worth sinking too many resources into.
Said this above; there are a ton of excellent mental health resources developed by and for climate change activists dealing with a massive, existential, depressing problem at which we are currently severely failing at and which is getting extremely acute and mentally draining—a lot of us (I am with scientist rebellion) are literally purposefully going to jail for this at this point, and getting physically beaten and threatened by cops, and sued in courts, and we have front line insight into how seriously crazily fucking dire it has gotten and how close we are to irreversible collapse and how much is already being destroyed and how many are already being killed right now in front of our eyes. At this point, we could hit collapse in 10 years; and what we need to do to actually avoid collapse, not just delay it, is so far beyond what we are currently doing that it is fucking ludicrous. And yet, we have gotten much better at reducing burnout in our members and staying sane. I already mentioned Sonit’s “Hope in the Dark” as well as “Active hope” as books I personally found helpful. At this point, we are dedicating whole task forces, events, videos, trainings and methodologies to this and it is affecting everything about how we do the things we do, because we have realised that people getting suicidally depressed to a degree where they were paralysed was becoming a primary threat to our activism. And it is working.
A related question a lot of us have been purposefully asking us: Within what needs to be done to save us, what are activities you can do that will make and impact and that you find most rewarding and that are most sustainable for you? When you rate actions that harmed you and actions that energised you, what differentiates them, and is that something you can purposefully engineer? How would your activism have to change for you to be able to continue it effectively for the long haul, at minimum confidently for another decade? What can you personally do to keep other people in the movement from giving up?
Once you reframe mental health as a fundamental resource and solvable problem, this suddenly changes a lot.
We have specifically found that optimism is neither necessary nor even beneficial for finding the kind of hope that drives action.
High charisma/extroversion seems useful for movement building. Do you have any experience in programming or AI?
Do you want to give it a go? Let’s suppose you were organising a conference on AI safety. Can you name 5 or 6 ways that the conference could end up being net-negative?
Programming yes, and I’d say I’m a skilled amateur, though I need to just do more programming. AI experience, not so much, other than reading (a large amount of) LW.
>Let’s suppose you were organising a conference on AI safety. Can you name 5 or 6 ways that the conference could end up being net-negative?
The conference involves someone talking about an extremely taboo topic (eugenics, say) as part of their plan to save the world from AI; the conference is covered in major news outlets as “AI Safety has an X problem” or something along those lines, and leading AI researchers are distracted from their work by the ensuing twitter storm.
One of the main speakers at the event is very good at diverting money towards him/herself through raw charisma and ends up diverting money for projects/compute away from other, more promising projects; later it turns out that their project actually accelerated the development of an unaligned AI.
The conference on AI safety doesn’t involve the people actually trying to build an AGI, and only involves the people who are already committed to and educated about AI alignment. The organizers and conference attendees are reassured by the consensus of “alignment is the most pressing problem we’re facing, and we need to take any steps necessary that don’t hurt us in the long run to fix it,” while that attitude isn’t representative of the audience the organizers actually want to reach. The organizers make future decisions based on the information that “lead AI researchers already are concerned about alignment to the degree we want them to be”, which ends up being wrong and they should have been more focused on reaching lead AI researchers.
The conference is just a waste of time, and the attendees could have been doing better things with the time/resources spent attending.
There’s a bus crash on the way to the event, and several key researchers die, setting back progress by years.
Similar to #2, the conference convinces researchers that [any of the wrong ways to approach “death with dignity” mentioned in this post] is the best way to try to solve x-risk from AGI, and resources are put towards plans that, if they fail, will fail catastrophically
“If we manage to create an AI smarter than us, won’t it be more moral?” or any AGI-related fallacy disproved in the Sequences is spouted as common wisdom, and people are convinced.
Cool, so I’d suggest looking into movement-building (obviously take with a grain of salt given how little we’ve talked). It’s probably good to try to develop some AI knowledge as well so that people will take you more seriously, but it’s not like you’d need that before you start.
You did pretty well in terms of generating ways it could be net-negative. That’s makes me more confident that you would be able to have a net-positive impact.
I guess it’d also be nice to have some degree of organisational skills, but honestly, if there isn’t anyone else doing AI safety movement-building in your area all you have to be is not completely terrible so long as you are aware of your limits and avoid organising anything that would go beyond them.
What about Hail Mary strategies that were previously discarded due to being too risky? I can think of a couple off the top of my head. A cornered rat should always fight.
Do they perchance have significant downsides if they fail? Just wildly guessing, here. I’m a lot more cheerful about Hail Mary strategies that don’t explode when the passes fail, and take out the timelines that still had hope in them after all.
As a Hail Mary-strategy, how about making a 100% effort into trying to become elected of a small democratic voting district?
And, if that works, make a 100% effort to become elected by bigger and bigger districts—until all democratic countries support the [a stronger humanity can be reached by a systematic investigation of our surroundings, cooperation in the production of private and public goods, which includes not creating powerful aliens]-party?
Yes, yes, politics is horrible. BUT. What if you could do this within 8 years? AND, you test it by only trying one or two districts....one or two months each? So, in total it would cost at the most four months.
Downsides? Political corruption is the biggest one. But, I believe your approach to politics would be a continuation of what you do now, so if you succeeded it would only be by strengthening the existing EA/Humanitarian/Skeptical/Transhumanist/Libertarian-movements.
There may be a huge downside for you personally, as you may have to engage in some appropriate signalling to make people vote for your party. But, maybe it isn’t necessary. And if the whole thing doesn’t work it would only be for four months, top.
Yeah, most of them do. I have some hope for the strategy-cluster that uses widespread propaganda[1] as a coordination mechanism.
Given the whole “brilliant elites” thing, and the fecundity of rationalist memes among such people, I think it’s possible to shift the world to a better Nash equilibrium.
Is it not obvious to you that this constitutes dying with less dignity, or is it obvious but you disagree that death with dignity is the correct way to go?
Dignity exists within human minds. If human-descended minds go extinct, dignity doesn’t matter. Nature grades us upon what happens, not how hard we try. There is no goal I hold greater than the preservation of humanity.
Did you read the OP post? The post identifies dignity with reductions in existential risk, and it talks a bunch about the ‘let’s violate ethical injunctions willy-nilly’ strategy
The post assumes that there are no ethics-violating strategies that will work. I understand that people can just-world-fallacy their way into thinking that they will be saved if only they sacrifice their deontology. What I’m saying is that deontology-violating strategies should be adopted if they offer, say, +1e-5 odds of success.
One of Eliezer’s points is that most people’s judgements about adding 1e-5 odds (I assume you mean log odds and not additive probability?) are wrong, and even systematically have the wrong sign.
The post talks about how most people are unable to evaluate these odds accurately, and that an indicator of you thinking you found a loophole actually being a sign that you are one of those people.
And what do you think are the chances that those strategies work, or that the world lives after you hypothetically buy three or six more years that way?
I’m not well calibrated on sub 1% probabilities. Yeah, the odds are low.
There are other classes of Hail Mary. Picture a pair of reseachers, one of whom controls an electrode wired to the pleasure centers of the other. Imagine they have free access to methamphetamine and LSD. I don’t think research output is anywhere near where it could be.
So—just to be very clear here—the plan is that you do the bad thing, and then almost certainly everybody dies anyways even if that works?
I think at that level you want to exhale, step back, and not injure the reputations of the people who are gathering resources, finding out what they can, and watching closely for the first signs of a positive miracle. The surviving worlds aren’t the ones with unethical plans that seem like they couldn’t possibly work even on the open face of things; the real surviving worlds are only injured by people who imagine that throwing away their ethics surely means they must be buying something positive.
Great! If I recall correctly, you wanted genetically optimized kids to be gestated and trained.
I suspect that akrasia is a much bigger problem than most people think, and to be truly effective, one must outsource part of their reward function. There could be massive gains.
What do you think about the setup I outlined, where a pair of reseachers exist such that one controls an electrode embedded in the other’s reward center? Think Focus from Vinge’s A Deepness In The Sky.
I think memetically ‘optimized’ kids (and adults?) might be an interesting alternative to explore. That is, more scalable and better education for the ‘consequentialists’ (I have no clue how to teach people that are not ‘consequentialist’, hopefully someone else can teach those) may get human thought-enhancement results earlier, and available to more people. There has been some work in this space and some successes, but I think that in general, the “memetics experts” and the “education experts” haven’t been cooperating properly as much as they should. I think it would seem dignified to me to try bridging this gap. If this is indeed dignified, then that would be good, because I’m currently in the early stages of a project trying to bridge this gap.
The better version then reward hacking I can think of is inducing a state of jhana (basically a pleasure button) in alignment researchers. For example, use neuro-link to get the brain-process of ~1000 people going through the jhanas at multiple time-steps, average them in a meaningful way, induce those brainwaves in other people.
The effect is people being satiated with the feeling of happiness (like being satiated with food/water), and are more effective as a result.
Jhana’s seem much healthier, though I’m pretty confused imagining your setup so I don’t have much confidence. Say it works and gets past the problems of generalizing reward (eg the brain only rewards for specific parts of research and not others) and ignoring downward spiral effects of people hacking themselves, then we hopefully have people who look forward to doing certain parts of research.
If you model humans as multi-agents, it’s making a certain type of agent (the “do research” one) have a stronger say in what actions get done. This is not as robust as getting all the agents to agree and not fight each other. I believe jhana gets part of that done because some sub-agents are pursuing the feeling of happiness and you can get that any time.
In our earliest work with a single lever it was noted that while the subject would lever-press at a steady rate for stimulation to various brain sites, the current could be turned off entirely and he would continue lever-pressing at the same rate (for as many as 2000 responses) until told to stop.
It is of interest that the introduction of an attractive tray of food produced no break in responding, although the subject had been without food for 7 hours, was noted to glance repeatedly at the tray, and later indicated that he knew he could have stopped to eat if he wished. Even under these conditions he continued to respond without change in rate after the current was turned off, until finally instructed to stop, at which point he ate heartily.
People’s revealed choice in tenaciously staying alive and keeping others alive suggests otherwise. This everyday observation trumps all philosophical argument that fire does not burn, water is not wet, and bears do not shit in the woods.
I’m not immediately convinced (I think you need another ingredient).
Imagine a kind of orthogonality thesis but with experiential valence on one axis and ‘staying aliveness’ on the other. I think it goes through (one existence proof for the experientially-horrible-but-high-staying-aliveness quadrant might be the complex of torturer+torturee).
Another ingredient you need to posit for this argument to go through is that, as humans are constituted, experiential valence is causally correlated with behaviour in a way such that negative experiential valence reliably causes not-staying-aliveness. I think we do probably have this ingredient, but it’s not entirely clear cut to me.
Unlike jayterwahl, I don’t consider experiential valence, which I take to mean mental sensations of pleasure and pain in the immediate moment, as of great importance in itself. It may be a sign that I am doing well or badly at life, but like the score on a test, it is only a proxy for what matters. People also have promises to keep, and miles to go before they sleep.
I think many of the things that you might want to do in order to slow down tech development are things that will dramatically worsen human experiences, or reduce the number of them. Making a trade like that in order to purchase the whole future seems like it’s worth considering; making a trade like that in order to purchase three more years seems much more obviously not worth it.
I will note that I’m still a little confused about Butlerian Jihad style approaches (where you smash all the computers, or restrict them to the capability available in 1999 or w/e); if I remember correctly Eliezer has called that a ‘straightforward loss’, which seems correct from a ‘cosmic endowment’ perspective but not from a ‘counting up from ~10 remaining years’ perspective.
My guess is that the main response is “look, if you can coordinate to smash all of the computers, you can probably coordinate on the less destructive-to-potential task of just not building AGI, and the difficulty is primarily in coordinating at all instead of the coordination target.”
Suppose they don’t? I have at least one that AFAICT doesn’t do anything worse than take researchers/resources away from AI alignment in most bad-ends and even in the worst case scenario “just” generates a paperclipper anyway. Which, to be clear, is bad, but not any worse than the current timeline.
(Namely, actual literal time travel and outcome pumps. There is some reason to believe that an outcome pump with a sufficiently short time horizon is easier to safely get hypercompute out of than an AGI, and that a “time machine” that moves an electron back a microsecond is at least energetically within bounds of near-term technology.
You are welcome to complain that time travel is completely incoherent if you like; I’m not exactly convinced myself. But so far, the laws of physics have avoided actually banning CTCs outright.)
a “time machine” that moves an electron back a microsecond is at least energetically within bounds of near-term technology.
Do you have a pointer for this? Traversable wormholes tend to require massive amounts of energy[1] (as in, amounts of energy that are easier to state in c^2 units).
There is some reason to believe that an outcome pump with a sufficiently short time horizon is easier to safely get hypercompute out of than an AGI, and that a “time machine” that moves an electron back a microsecond [...]
Note: this isn’t strictly hypercompute. Finite speed of light means that you can only address a finite number of bits within a fixed time, and your critical path is limited by the timescale of the CTC.
That being said, figuring out the final state of a 1TB-state-vector[2] FSM would itself be very useful. Just not strictly hypercomputation.
Speaking of which, one thing we should be doing is keeping a lookout for opportunities to reduce s-risk (with dignity) … I haven’t yet been convinced that s-risk reduction is intractable.
This is an example of what EY is talking about I think—as far as I can tell all the obvious things one would do to reduce s-risk via increasing x-risk are the sort of supervillian schemes that are more likely to increase s-risk than decrease it once secondary effects and unintended consequences etc. are taken into account. This is partly why I put the “with dignity” qualifier in. (The other reason is that I’m not a utilitarian and don’t think our decision about whether to do supervillian schemes should come down to whether we think the astronomical long-term consequences are slightly more likely to be positive than negative.)
Suppose, for example, that you’re going to try to build an AGI anyway. You could just not try to train it to care about human values, hoping that it would destroy the world, rather than creating some kind of crazy mind-control dystopia.
I submit that, if your model of the universe is that AGI will, by default, be a huge x-risk and/or a huge s-risk, then the “supervillain” step in that process would be deciding to build it in the first place, and not necessarily not trying to “align” it. You lost your dignity at the first step, and won’t lose any more at the second.
Also, I kind of hate to say it, but sometimes the stuff about “secondary effects and unintended consequences” sounds more like “I’m looking for reasons not to break widely-loved deontological rules, regardless of my professed ethical system, because I am uncomfortable with breaking those rules” than like actual caution. It’s very easy to stop looking for more effects in either direction when you reach the conclusion you want.
I mean, yes, those deontological rules are useful time-tested heuristics. Yes, a lot of the time the likely consequences of violating them will be bad in clearly foreseeable ways. Yes, you are imperfect and should also be very, very nervous about consequences you do not foresee. But all of that can also act as convenient cover for switching from being an actual utilitarian to being an actual deontologist, without ever saying as much.
Personally, I’m neither. And I also don’t believe that intelligence, in any actually achievable quantity, is a magic wand that automatically lets you either destroy the world or take over and torture everybody. And I very much doubt that ML-as-presently-practiced, without serious structural innovations and running on physically realizable computers, will get all that smart anyway. So I don’t really have an incentive to get all supervillainy to begin with. And I wouldn’t be good at it anyhow.
… but if faced with a choice between a certainty of destroying the world, and a certainty of every living being being tortured for eternity, even I would go with the “destroy” option.
I think we are on the same page here. I would recommend not creating AGI at all in that situation, but I agree that creating a completely unaligned one is better than creating an s-risky one. https://arbital.com/p/hyperexistential_separation/
I can imagine a plausible scenario in which WW3 is a great thing, because both sides brick each other’s datacenters and bomb each other’s semiconductor fabs. Also, all the tech talent will be spent trying to hack the other side and will not be spent training bigger and bigger language models.
That only gives you a brief delay on a timeline which could, depending on the horizons you adopt, be billions of years long. If you really wanted to reduce s-risk in an absolute sense, you’d have to try to sterilize the planet, not set back semiconductor manufacturing by a decade. This, I think, is a project which should give one pause.
The downvotes on my comment reflect a threat we all need to be extremely mindful of: people who are so terrified of death that they’d rather flip the coin on condemning us all to hell, than die. They’ll only grow ever more desperate & willing to resort to more hideously reckless hail marys as we draw closer.
Upvoting you because I think this is an important point to be made, even if I’m unsure how much I agree with it. We need people pushing back against potentially deeply unethical schemes, even if said schemes also have the potential to be extremely valuable (not that I’ve seen very many of those at all; most proposed supervillain schemes would pretty obviously be a Bad Idea™). Having the dialogue is valuable, and it’s disappointing to see unpopular thoughts downvoted here.
I mean this completely seriously: now that MIRI has changed to the Death With Dignity strategy, is there anything that I or anyone on LW can do to help with said strategy, other than pursue independent alignment research? Not that pursuing alignment research is the wrong thing to do, just that you might have better ideas.
I’ve always thought that something in the context of mental health would be nice.
The idea that humanity is doomed is pretty psychologically hard to deal with. Well, it seems that there is a pretty wide range in how people respond psychologically to it, from what I can tell. Some seem to do just fine. But others seem to be pretty harmed (including myself, not that this is about me; ie. this post literally brought me to tears). So, yeah, some sort of guidance for how to deal with it would be nice.
Plus it’d serve the purpose of increasing the productivity of and mitigating the risk of burnout for AI researchers, thus increasing humanities supposedly slim chances of “making it”. This seems pretty nontrivial to me. AI researchers deal with this stuff on a daily basis. I don’t know much about what sort of mental states are common for them, but I’d guess that something like 10-40% of them are hurt pretty badly. In which case better mental health guidance would yield pretty substantial improvements in productivity. Unfortunately, I think it’s quite the difficult thing to “figure out” though, and for that reason I suspect it isn’t worth sinking too many resources into.
Said this above; there are a ton of excellent mental health resources developed by and for climate change activists dealing with a massive, existential, depressing problem at which we are currently severely failing at and which is getting extremely acute and mentally draining—a lot of us (I am with scientist rebellion) are literally purposefully going to jail for this at this point, and getting physically beaten and threatened by cops, and sued in courts, and we have front line insight into how seriously crazily fucking dire it has gotten and how close we are to irreversible collapse and how much is already being destroyed and how many are already being killed right now in front of our eyes. At this point, we could hit collapse in 10 years; and what we need to do to actually avoid collapse, not just delay it, is so far beyond what we are currently doing that it is fucking ludicrous. And yet, we have gotten much better at reducing burnout in our members and staying sane. I already mentioned Sonit’s “Hope in the Dark” as well as “Active hope” as books I personally found helpful. At this point, we are dedicating whole task forces, events, videos, trainings and methodologies to this and it is affecting everything about how we do the things we do, because we have realised that people getting suicidally depressed to a degree where they were paralysed was becoming a primary threat to our activism. And it is working.
A related question a lot of us have been purposefully asking us: Within what needs to be done to save us, what are activities you can do that will make and impact and that you find most rewarding and that are most sustainable for you? When you rate actions that harmed you and actions that energised you, what differentiates them, and is that something you can purposefully engineer? How would your activism have to change for you to be able to continue it effectively for the long haul, at minimum confidently for another decade? What can you personally do to keep other people in the movement from giving up?
Once you reframe mental health as a fundamental resource and solvable problem, this suddenly changes a lot.
We have specifically found that optimism is neither necessary nor even beneficial for finding the kind of hope that drives action.
I mean, I’d like to see a market in dignity certificates, to take care of generating additional dignity in a distributed and market-oriented fashion?
Do you have any ideas for how to go about measuring dignity?
(although, measuring impact on alignment to that degree might be of a similar difficulty as actually solving alignment).
Only if you need to be really accurate, which I don’t think you necessarily do.
Two questions:
Do you have skills relevant to building websites, marketing, running events, movement building or ops?
How good are you at generating potential downsides for any given project?
High charisma/extroversion, not much else I can think of that’s relevant there. (Other than generally being a fast learner at that type of thing.)
Not something I’ve done before.
High charisma/extroversion seems useful for movement building. Do you have any experience in programming or AI?
Do you want to give it a go? Let’s suppose you were organising a conference on AI safety. Can you name 5 or 6 ways that the conference could end up being net-negative?
>Do you have any experience in programming or AI?
Programming yes, and I’d say I’m a skilled amateur, though I need to just do more programming. AI experience, not so much, other than reading (a large amount of) LW.
>Let’s suppose you were organising a conference on AI safety. Can you name 5 or 6 ways that the conference could end up being net-negative?
The conference involves someone talking about an extremely taboo topic (eugenics, say) as part of their plan to save the world from AI; the conference is covered in major news outlets as “AI Safety has an X problem” or something along those lines, and leading AI researchers are distracted from their work by the ensuing twitter storm.
One of the main speakers at the event is very good at diverting money towards him/herself through raw charisma and ends up diverting money for projects/compute away from other, more promising projects; later it turns out that their project actually accelerated the development of an unaligned AI.
The conference on AI safety doesn’t involve the people actually trying to build an AGI, and only involves the people who are already committed to and educated about AI alignment. The organizers and conference attendees are reassured by the consensus of “alignment is the most pressing problem we’re facing, and we need to take any steps necessary that don’t hurt us in the long run to fix it,” while that attitude isn’t representative of the audience the organizers actually want to reach. The organizers make future decisions based on the information that “lead AI researchers already are concerned about alignment to the degree we want them to be”, which ends up being wrong and they should have been more focused on reaching lead AI researchers.
The conference is just a waste of time, and the attendees could have been doing better things with the time/resources spent attending.
There’s a bus crash on the way to the event, and several key researchers die, setting back progress by years.
Similar to #2, the conference convinces researchers that [any of the wrong ways to approach “death with dignity” mentioned in this post] is the best way to try to solve x-risk from AGI, and resources are put towards plans that, if they fail, will fail catastrophically
“If we manage to create an AI smarter than us, won’t it be more moral?” or any AGI-related fallacy disproved in the Sequences is spouted as common wisdom, and people are convinced.
Cool, so I’d suggest looking into movement-building (obviously take with a grain of salt given how little we’ve talked). It’s probably good to try to develop some AI knowledge as well so that people will take you more seriously, but it’s not like you’d need that before you start.
You did pretty well in terms of generating ways it could be net-negative. That’s makes me more confident that you would be able to have a net-positive impact.
I guess it’d also be nice to have some degree of organisational skills, but honestly, if there isn’t anyone else doing AI safety movement-building in your area all you have to be is not completely terrible so long as you are aware of your limits and avoid organising anything that would go beyond them.
What about Hail Mary strategies that were previously discarded due to being too risky? I can think of a couple off the top of my head. A cornered rat should always fight.
Do they perchance have significant downsides if they fail? Just wildly guessing, here. I’m a lot more cheerful about Hail Mary strategies that don’t explode when the passes fail, and take out the timelines that still had hope in them after all.
As a Hail Mary-strategy, how about making a 100% effort into trying to become elected of a small democratic voting district?
And, if that works, make a 100% effort to become elected by bigger and bigger districts—until all democratic countries support the [a stronger humanity can be reached by a systematic investigation of our surroundings, cooperation in the production of private and public goods, which includes not creating powerful aliens]-party?
Yes, yes, politics is horrible. BUT. What if you could do this within 8 years? AND, you test it by only trying one or two districts....one or two months each? So, in total it would cost at the most four months.
Downsides? Political corruption is the biggest one. But, I believe your approach to politics would be a continuation of what you do now, so if you succeeded it would only be by strengthening the existing EA/Humanitarian/Skeptical/Transhumanist/Libertarian-movements.
There may be a huge downside for you personally, as you may have to engage in some appropriate signalling to make people vote for your party. But, maybe it isn’t necessary. And if the whole thing doesn’t work it would only be for four months, top.
Yeah, most of them do. I have some hope for the strategy-cluster that uses widespread propaganda[1] as a coordination mechanism.
Given the whole “brilliant elites” thing, and the fecundity of rationalist memes among such people, I think it’s possible to shift the world to a better Nash equilibrium.
Making more rationalists is all well and good, but let’s not shy away from no holds barred memetic warfare.
Is it not obvious to you that this constitutes dying with less dignity, or is it obvious but you disagree that death with dignity is the correct way to go?
Dignity exists within human minds. If human-descended minds go extinct, dignity doesn’t matter. Nature grades us upon what happens, not how hard we try. There is no goal I hold greater than the preservation of humanity.
Did you read the OP post? The post identifies dignity with reductions in existential risk, and it talks a bunch about the ‘let’s violate ethical injunctions willy-nilly’ strategy
The post assumes that there are no ethics-violating strategies that will work. I understand that people can just-world-fallacy their way into thinking that they will be saved if only they sacrifice their deontology. What I’m saying is that deontology-violating strategies should be adopted if they offer, say, +1e-5 odds of success.
One of Eliezer’s points is that most people’s judgements about adding 1e-5 odds (I assume you mean log odds and not additive probability?) are wrong, and even systematically have the wrong sign.
The post talks about how most people are unable to evaluate these odds accurately, and that an indicator of you thinking you found a loophole actually being a sign that you are one of those people.
Pretty telling IMHO to see such massive herding on the downvotes here, for such an obviously-correct point. Disappointing!
Coordination on what, exactly?
Coordination (cartelization) so that AI capabilities are not a race to the bottom
Coordination to indefinitely halt semiconductor supply chains
Coordination to shun and sanction those who research AI capabilities (compare: coordination against embyronic human gene editing)
Coordination to deliberately turn Moore’s Law back a few years (yes, I’m serious)
And do you think if you try that, you’ll succeed, and that the world will then be saved?
These are all strategies to buy time, so that alignment efforts may have more exposure to miracle-risk.
And what do you think are the chances that those strategies work, or that the world lives after you hypothetically buy three or six more years that way?
I’m not well calibrated on sub 1% probabilities. Yeah, the odds are low.
There are other classes of Hail Mary. Picture a pair of reseachers, one of whom controls an electrode wired to the pleasure centers of the other. Imagine they have free access to methamphetamine and LSD. I don’t think research output is anywhere near where it could be.
So—just to be very clear here—the plan is that you do the bad thing, and then almost certainly everybody dies anyways even if that works?
I think at that level you want to exhale, step back, and not injure the reputations of the people who are gathering resources, finding out what they can, and watching closely for the first signs of a positive miracle. The surviving worlds aren’t the ones with unethical plans that seem like they couldn’t possibly work even on the open face of things; the real surviving worlds are only injured by people who imagine that throwing away their ethics surely means they must be buying something positive.
Fine. What do you think about the human-augmentation cluster of strategies? I recall you thought along very similar lines circa ~2001.
I don’t think we’ll have time, but I’d favor getting started anyways. Seems a bit more dignified.
Great! If I recall correctly, you wanted genetically optimized kids to be gestated and trained.
I suspect that akrasia is a much bigger problem than most people think, and to be truly effective, one must outsource part of their reward function. There could be massive gains.
What do you think about the setup I outlined, where a pair of reseachers exist such that one controls an electrode embedded in the other’s reward center? Think Focus from Vinge’s A Deepness In The Sky.
(I predict that would help with AI safety, in that it would swiftly provide useful examples of reward hacking and misaligned incentives)
I think memetically ‘optimized’ kids (and adults?) might be an interesting alternative to explore. That is, more scalable and better education for the ‘consequentialists’ (I have no clue how to teach people that are not ‘consequentialist’, hopefully someone else can teach those) may get human thought-enhancement results earlier, and available to more people. There has been some work in this space and some successes, but I think that in general, the “memetics experts” and the “education experts” haven’t been cooperating properly as much as they should. I think it would seem dignified to me to try bridging this gap. If this is indeed dignified, then that would be good, because I’m currently in the early stages of a project trying to bridge this gap.
The better version then reward hacking I can think of is inducing a state of jhana (basically a pleasure button) in alignment researchers. For example, use neuro-link to get the brain-process of ~1000 people going through the jhanas at multiple time-steps, average them in a meaningful way, induce those brainwaves in other people.
The effect is people being satiated with the feeling of happiness (like being satiated with food/water), and are more effective as a result.
The “electrode in the reward center” setup has been proven to work in humans, whereas jhanas may not tranfer over Neuralink.
Deep brain stimulation is FDA-approved in humans, meaning less (though nonzero) regulatory fuckery will be required.
Happiness is not pleasure; wanting is not liking. We are after reinforcement.
Could you link the proven part?
Jhana’s seem much healthier, though I’m pretty confused imagining your setup so I don’t have much confidence. Say it works and gets past the problems of generalizing reward (eg the brain only rewards for specific parts of research and not others) and ignoring downward spiral effects of people hacking themselves, then we hopefully have people who look forward to doing certain parts of research.
If you model humans as multi-agents, it’s making a certain type of agent (the “do research” one) have a stronger say in what actions get done. This is not as robust as getting all the agents to agree and not fight each other. I believe jhana gets part of that done because some sub-agents are pursuing the feeling of happiness and you can get that any time.
https://en.wikipedia.org/wiki/Brain_stimulation_reward
https://doi.org/10.1126/science.140.3565.394
https://sci-hub.hkvisa.net/10.1126/science.140.3565.394
Is the average human life experientially negative, such that buying three more years of existence for the planet is ethically net-negative?
People’s revealed choice in tenaciously staying alive and keeping others alive suggests otherwise. This everyday observation trumps all philosophical argument that fire does not burn, water is not wet, and bears do not shit in the woods.
I’m not immediately convinced (I think you need another ingredient).
Imagine a kind of orthogonality thesis but with experiential valence on one axis and ‘staying aliveness’ on the other. I think it goes through (one existence proof for the experientially-horrible-but-high-staying-aliveness quadrant might be the complex of torturer+torturee).
Another ingredient you need to posit for this argument to go through is that, as humans are constituted, experiential valence is causally correlated with behaviour in a way such that negative experiential valence reliably causes not-staying-aliveness. I think we do probably have this ingredient, but it’s not entirely clear cut to me.
Unlike jayterwahl, I don’t consider experiential valence, which I take to mean mental sensations of pleasure and pain in the immediate moment, as of great importance in itself. It may be a sign that I am doing well or badly at life, but like the score on a test, it is only a proxy for what matters. People also have promises to keep, and miles to go before they sleep.
I think many of the things that you might want to do in order to slow down tech development are things that will dramatically worsen human experiences, or reduce the number of them. Making a trade like that in order to purchase the whole future seems like it’s worth considering; making a trade like that in order to purchase three more years seems much more obviously not worth it.
I will note that I’m still a little confused about Butlerian Jihad style approaches (where you smash all the computers, or restrict them to the capability available in 1999 or w/e); if I remember correctly Eliezer has called that a ‘straightforward loss’, which seems correct from a ‘cosmic endowment’ perspective but not from a ‘counting up from ~10 remaining years’ perspective.
My guess is that the main response is “look, if you can coordinate to smash all of the computers, you can probably coordinate on the less destructive-to-potential task of just not building AGI, and the difficulty is primarily in coordinating at all instead of the coordination target.”
Suppose they don’t? I have at least one that AFAICT doesn’t do anything worse than take researchers/resources away from AI alignment in most bad-ends and even in the worst case scenario “just” generates a paperclipper anyway. Which, to be clear, is bad, but not any worse than the current timeline.
(Namely, actual literal time travel and outcome pumps. There is some reason to believe that an outcome pump with a sufficiently short time horizon is easier to safely get hypercompute out of than an AGI, and that a “time machine” that moves an electron back a microsecond is at least energetically within bounds of near-term technology.
You are welcome to complain that time travel is completely incoherent if you like; I’m not exactly convinced myself. But so far, the laws of physics have avoided actually banning CTCs outright.)
Do you have a pointer for this? Traversable wormholes tend to require massive amounts of energy[1] (as in, amounts of energy that are easier to state in c^2 units).
Note: this isn’t strictly hypercompute. Finite speed of light means that you can only address a finite number of bits within a fixed time, and your critical path is limited by the timescale of the CTC.
That being said, figuring out the final state of a 1TB-state-vector[2] FSM would itself be very useful. Just not strictly hypercomputation.
Or negative energy density. Or massive amounts of negative energy density.
Ballpark. Roundtrip to 1TB of RAM in 1us is doable.
Never even THINK ABOUT trying a hail mary if it also comes with an increased chance of s-risk. I’d much rather just die.
Speaking of which, one thing we should be doing is keeping a lookout for opportunities to reduce s-risk (with dignity) … I haven’t yet been convinced that s-risk reduction is intractable.
The most obvious way to reduce s-risk would be to increase x-risk, but somehow that doesn’t sound very appealing...
This is an example of what EY is talking about I think—as far as I can tell all the obvious things one would do to reduce s-risk via increasing x-risk are the sort of supervillian schemes that are more likely to increase s-risk than decrease it once secondary effects and unintended consequences etc. are taken into account. This is partly why I put the “with dignity” qualifier in. (The other reason is that I’m not a utilitarian and don’t think our decision about whether to do supervillian schemes should come down to whether we think the astronomical long-term consequences are slightly more likely to be positive than negative.)
Suppose, for example, that you’re going to try to build an AGI anyway. You could just not try to train it to care about human values, hoping that it would destroy the world, rather than creating some kind of crazy mind-control dystopia.
I submit that, if your model of the universe is that AGI will, by default, be a huge x-risk and/or a huge s-risk, then the “supervillain” step in that process would be deciding to build it in the first place, and not necessarily not trying to “align” it. You lost your dignity at the first step, and won’t lose any more at the second.
Also, I kind of hate to say it, but sometimes the stuff about “secondary effects and unintended consequences” sounds more like “I’m looking for reasons not to break widely-loved deontological rules, regardless of my professed ethical system, because I am uncomfortable with breaking those rules” than like actual caution. It’s very easy to stop looking for more effects in either direction when you reach the conclusion you want.
I mean, yes, those deontological rules are useful time-tested heuristics. Yes, a lot of the time the likely consequences of violating them will be bad in clearly foreseeable ways. Yes, you are imperfect and should also be very, very nervous about consequences you do not foresee. But all of that can also act as convenient cover for switching from being an actual utilitarian to being an actual deontologist, without ever saying as much.
Personally, I’m neither. And I also don’t believe that intelligence, in any actually achievable quantity, is a magic wand that automatically lets you either destroy the world or take over and torture everybody. And I very much doubt that ML-as-presently-practiced, without serious structural innovations and running on physically realizable computers, will get all that smart anyway. So I don’t really have an incentive to get all supervillainy to begin with. And I wouldn’t be good at it anyhow.
… but if faced with a choice between a certainty of destroying the world, and a certainty of every living being being tortured for eternity, even I would go with the “destroy” option.
I think we are on the same page here. I would recommend not creating AGI at all in that situation, but I agree that creating a completely unaligned one is better than creating an s-risky one. https://arbital.com/p/hyperexistential_separation/
I can imagine a plausible scenario in which WW3 is a great thing, because both sides brick each other’s datacenters and bomb each other’s semiconductor fabs. Also, all the tech talent will be spent trying to hack the other side and will not be spent training bigger and bigger language models.
I imagine that WW3 would be an incredibly strong pressure, akin to WW2, which causes governments to finally sit up and take notice of AI.
And then spend several trillion dollars running Manhattan Project Two: Manhattan Harder, racing each other to be the first to get AI.
And then we die even faster, and instead of being converted into paperclips, we’re converted into tiny American/Chinese flags
Missed opportunity to call it Manhattan Project Two: The Bronx.
That only gives you a brief delay on a timeline which could, depending on the horizons you adopt, be billions of years long. If you really wanted to reduce s-risk in an absolute sense, you’d have to try to sterilize the planet, not set back semiconductor manufacturing by a decade. This, I think, is a project which should give one pause.
The downvotes on my comment reflect a threat we all need to be extremely mindful of: people who are so terrified of death that they’d rather flip the coin on condemning us all to hell, than die. They’ll only grow ever more desperate & willing to resort to more hideously reckless hail marys as we draw closer.
Upvoting you because I think this is an important point to be made, even if I’m unsure how much I agree with it. We need people pushing back against potentially deeply unethical schemes, even if said schemes also have the potential to be extremely valuable (not that I’ve seen very many of those at all; most proposed supervillain schemes would pretty obviously be a Bad Idea™). Having the dialogue is valuable, and it’s disappointing to see unpopular thoughts downvoted here.