I found the site a few months ago due to a link from an AI themed forum. I read the sequences and developed the belief that this was a place for people who think in ways similar to me. I work as a nuclear engineer. When I entered the workforce, I was surprised to find that there weren’t people as dispositioned toward logic as I was. I thought perhaps there wasn’t really a community of similar people and I had largely stopped looking.
This seems like a good place for me to learn, for the time being. Whether or not this is a place for me to develop community remains to be seen. The format seems to promote people presenting well-formed ideas. This seems valuable, but I am also interested in finding a space to explore ideas which are not well-formed. It isn’t clear to me that this is intended to be such a space. This may simply be due to my ignorance of the mechanics around here. That said, this thread seems to be inviting poorly formed ideas and I aim to oblige.
There seem to be some writings around here which speak of instrumental rationality, or “Rationality Is Systematized Winning”. However, this seems to beg the question: “At what scale?” My (perhaps naive) impression is that if you execute instrumental rationality with an objective function at the personal scale it might yield the decision that one should go work in finance and accrue a pile of utility. But if you apply instrumental rationality to an objective function at the societal scale it might yield the decision to give all your spare resources to the most effective organizations you can find. It seems to me that the focus on rationality is important but doesn’t resolve the broader question of “In service of what?” which actually seems to be an important selector of who participates in this community. I don’t see much value in pursuing Machiavellian rationality and my impression is that most here don’t either. I am interested in finding additional work that explores the implications of global scale objective functions.
On a related topic, I am looking to explore how to determine the right scale of the objective function for revenge (or social correction if you prefer a smaller scope). My intuition is that revenge was developed as a mechanism to perform tribal level optimizations. In a situation where there has been a social transgression, and redressing that transgression would be personally costly but societally beneficial, what is the correct balance between personal interest and societal interest?
My current estimate of P(doom) in the next 15 years is 5%. That is, high enough to be concerned , but not high enough to cash out my retirement. I am curious about anyone harboring a P(doom) > 50%. This would seem to be high enough to support drastic actions. What work has been done to develop rational approaches to such a high P(doom)?
This idea is quite poorly formed, but I am interested in exploring how to promote encapsulation, specialization, and reuse of components via the cost function in an artificial neural network. This comes out of the intuition that actions (things described by verbs, or transforms) may be a primitive in human mental architecture and are one of the mechanisms by which analogical connections are searched. I am interested in seeing if continuous mechanisms could be defined to promote the development of a collection of transforms which could be applied usefully across multiple different domains. Relatedly, I am also interested in what an architecture/cost function would need to look like to promote retaining multiple representations of a concept with differing levels of specificity/complexity.
My current estimate of P(doom) in the next 15 years is 5%. That is, high enough to be concerned , but not high enough to cash out my retirement. I am curious about anyone harboring a P(doom) > 50%. This would seem to be high enough to support drastic actions. What work has been done to develop rational approaches to such a high P(doom)?
I mean, what do you think we’ve been doing all along?
I’m at like 90% in 20 years, but I’m not claiming even one significant digit on that figure. My drastic actions have been to get depressed enough to be unwilling to work in a job as stressful as my last one. I don’t want to be that miserable if we’ve only got a few years left. I don’t think I’m being sufficiently rational about it, no. It would be more dignified to make lots of money and donate it to the organization with the best chance of stopping or at least delaying our impending doom. I couldn’t tell you which one that is at the moment though.
Some are starting to take more drastic actions. Whether those actions will be effective remains to be seen.
In my view, technical alignment is not keeping up with capabilities advancement. We have no alignment tech robust enough to even possibly survive the likely intelligence explosion scenario, and it’s not likely to be developed in time. Corporate incentive structure and dysfunction makes them insufficiently cautious. Even without an intelligence explosion, we also have no plans for the likely social upheaval from rapid job loss. The default outcome is that human life becomes worthless, because that’s already the case in such economies.
Our best chance at this point is probably government intervention to put the liability back on reckless AI labs for the risks they’re imposing on the rest of us, if not an outright moratorium on massive training runs.
I mean, what do you think we’ve been doing all along?
So, the short answer is that I am actually just ignorant about this. I’m reading here to learn more but I certainly haven’t ingested a sufficient history of relevant works. I’m happy to prioritize any recommendations that others have found insightful or thought provoking, especially from the point of view of a novice.
I can answer the specific question “what do I think” in a bit more detail. The answer should be understood to represent the viewpoint of someone who is new to the discussion and has only been exposed to an algorithmically influenced, self-selected slice of the information.
I watched the Lex Fridman interview of Eliezer Yudkowsky and around 3:06 Lex asks about what advice Eliezer would give to young people. Eliezer’s initial answer is something to the extent of “Don’t expect a long future.” I interpreted Eliezer’s answer largely as trying to evoke a sense of reverence for the seriousness of the problem. When pushed on the question a bit further, Eliezer’s given answer is “…I hardly know how to fight myself at this point.” I interpreted this to mean that the space of possible actions that is being searched appears intractable from the perspective of a dedicated researcher. This, I believe, is largely the source of my question. Current approaches appear to be losing the race, so what other avenues are being explored?
I read the “Thomas Kwa’s MIRI research experience” discussion and there was a statement to the effect that MIRI does not want Nate’s mindset to be known to frontier AI labs. I interpreted this to mean that the most likely course being explored at MIRI is to build a good AI to preempt or stop a bad AI. This strikes me as plausible because my intuition is that the LLM architectures being employed are largely inefficient for developing AGI. However, the compute scaling seems to work well enough that it may win the race before other competing ideas come to fruition.
An example of an alternative approach that I read was “Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible” which seems like an avenue worth exploring, but well outside of my areas of expertise. The approach shares a characteristic with my inference of MIRI’s approach in that both appear to be pursuing highly technical avenues which would not scale meaningfully at this stage by adding helpers from the general public.
The forms of approaches that I expected to see but haven’t seen too much of thus far are those similar to the one that you linked about STOP AI. That is, approaches that would scale with the addition of approximately average people. I expected that this type of approach might take the form of disrupting model training by various means or coopting the organizations involved with an aim toward redirection or delay. My lack of exposure to such information supports a few competing models: (1) drastic actions aren’t being pursued at large scales, (2) actions are being pursued covertly, or (3) I am focusing my attention in the wrong places.
Our best chance at this point is probably government intervention to put the liability back on reckless AI labs for the risks they’re imposing on the rest of us, if not an outright moratorium on massive training runs.
Government action strikes me as a very reasonable approach for people estimating long time scales or relatively lower probabilities. However, it seems to be a less reasonable approach if time scales are short or probabilities are high. I presume that your high P(doom) already accounts for your estimation of the probability of government action being successful. Does your high P(doom) imply that you expect these to be too slow, or too ineffective? I interpret a high P(doom) as meaning that the current set of actions that you have thought of are unlikely to be successful and therefore additional action exploration is necessary. I would expect this would include the admission of ideas which would have previously been pruned because they come with negative consequences.
The forms of approaches that I expected to see but haven’t seen too much of thus far are those similar to the one that you linked about STOP AI. That is, approaches that would scale with the addition of approximately average people.
Besides STOP AI, there’s also the less extreme PauseAI. They’re interested in things like lobbying, protests, lawsuits, etc.
I presume that your high P(doom) already accounts for your estimation of the probability of government action being successful. Does your high P(doom) imply that you expect these to be too slow, or too ineffective?
Yep, most of my hope is on our civilization’s coordination mechanisms kicking in in time. Most of the world’s problems seem to be failures to coordinate, but that’s not the same as saying we can’t coordinate. Failures are more salient, but that’s a cognitive bias. We’ve achieved a remarkable level of stability, in the light of recent history. But rationalists can see more clearly than most just how mad the world still is. Most of the public and most of our leaders fail to grasp some of the very basics of epistemology.
We used to think the public wouldn’t get it (because most people are insufficiently sane), but they actually seem appropriately suspicious of AI. We used to think a technical solution was our only realistic option, but progress there has not kept up with more powerful computers brute-forcing AI. In desperation, we asked for more time. We were pleasantly surprised at how well the message was received, but it doesn’t look like the slowdown is actually happening yet.
As a software engineer, I’ve worked in tech companies. Relatively big ones, even. I’ve seen the pressures and dysfunction. I strongly suspected that they’re not taking safety and security seriously enough to actually make a difference, and reports from insiders only confirm that narrative. If those are the institutions calling the shots when we achieve AGI, we’re dead. We desperately need more regulation to force them to behave or stop. I fear that what regulations we do get won’t be enough, but they might.
Other hopes are around a technical breakthrough that advances alignment more than capabilities, or the AI labs somehow failing in their project to produce AGI (despite the considerable resources they’ve already amassed), perhaps due to a breakdown in the scaling laws or some unrelated disaster that makes the projects too expensive to continue.
However, it seems to be a less reasonable approach if time scales are short or probabilities are high.
I have a massive level of uncertainty around AGI timelines, but there’s an uncomfortably large amount of probability mass on the possibility that through some breakthrough or secret project, AGI was achieved yesterday and not caught up with me. We’re out of buffer. But we might still have decades before things get bad. We might be able to coordinate in time, with government intervention.
I would expect this would include the admission of ideas which would have previously been pruned because they come with negative consequences.
Yep, most of my hope is on our civilization’s coordination mechanisms kicking in in time. Most of the world’s problems seem to be failures to coordinate, but that’s not the same as saying we can’t coordinate.
This is where most of my anticipated success paths lie as well.
Other hopes are around a technical breakthrough that advances alignment more than capabilities…
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents. I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
I have a massive level of uncertainty around AGI timelines, but there’s an uncomfortably large amount of probability mass on the possibility that through some breakthrough or secret project, AGI was achieved yesterday and not caught up with me.
I have not been thinking deeply in the direction of a superintelligent AGI having been achieved already. It certainly seems possible. It would invalidate most of the things I have thus far thought of as plausible mitigation measures.
What ideas are those?
Assuming a superintelligent AGI does not already exist, I would expect someone with a high P(doom) to be considering options of the form:
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
Identify key resources involved in AI development and work to restrict those resources. For truly desperate individuals this might look like the Metcalf attack, but a tamer approach might be something more along the lines of investing in a grid operator and pushing to increase delivery fees to data centers.
I haven’t pursued these thoughts in any serious way because my estimation of the threat isn’t as high as yours. I think it is likely we are unintentionally heading toward the Ozymandias approach anyhow.
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
ChaosGPT already exists. It’s incompetent to the point of being comical at the moment, but maybe more powerful analogues will appear and wreak havoc. Considering the current prevalence of malware, it might be more surprising if something like this didn’t happen.
We’ve already seen developments that could have been considered AI “warning shots” in the past. So far, they haven’t been enough to stop capabilities advancement. Why would the next one be any different? We’re already living in a world with literal wars killing people right now, and crazy terrorists with various ideologies. It’s surprising what people get used to. How bad would a warning shot have to be to shock the world into action given that background noise? Or would we be desensitized by then by the smaller warning shots leading up to it? Boiling the frog, so to speak. I honestly don’t know. And by the time a warning shot gets that bad, can we act in time to survive the next one?
Intentionally causing earlier warning shots would be evil, illegal, destructive, and undignified. Even “purely” economic damage at sufficient scale is going to literally kill people. Our best chance is civilization stepping up and coordinating. That means regulations and treaties, and only then the threat of violence to enforce the laws and impose the global consensus on any remaining rogue nations. That looks like the police and the army, not terrorists and hackers.
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents.
The instrumental convergence of goals implies that a powerful AI would almost certainly act to prevent any rivals from emerging, whether aligned or not. In the intelligence explosion scenario, progress would be rapid enough that the first mover achieves a decisive strategic advantage over the entire world. If we find an alignment solution robust enough to survive the intelligence explosion, it will set up guardrails to prevent most catastrophes, including the emergence of unaligned AGIs.
I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
Alignment and capabilities don’t necessarily correlate, and that accounts for lot of why my p(doom) is so high. But more aligned agents are, in principle, more useful, so rational organizations should be motivated to pursue aligned AGI, not just AGI. Unfortunately, alignment research seems barely tractable, capabilities can be brute-forced (and look valuable in the short term) and corporate incentive structures being what they are, in practice, what we’re seeing is a reckless amount of risk taking. Regulation could alter the incentives to balance the externality with appropriate costs.
We have already identified some key resources involved in AI development that could be restricted. The economic bottlenecks are mainly around high energy requirements and chip manufacturing.
Energy is probably too connected to the rest of the economy to be a good regulatory lever, but the U.S. power grid can’t currently handle the scale of the data centers the AI labs want for model training. That might buy us a little time. Big tech is already talking about buying small modular nuclear reactors to power the next generation of data centers. Those probably won’t be ready until the early 2030s. Unfortunately, that also creates pressures to move training to China or the Middle East where energy is cheaper, but where governments are less concerned about human rights.
A recent hurricane flooding high-purity quartz mines made headlines because chip producers require it for the crucibles used in making silicon wafers. Lower purity means accidental doping of the silicon crystal, which means lower chip yields per wafer, at best. Those mines aren’t the only source, but they seem to be the best one. There might also be ways to utilize lower-purity materials, but that might take time to develop and would require a lot more energy, which is already a bottleneck.
The very cutting-edge chips required for AI training runs require some delicate and expensive extreme-ultraviolet lithography machines to manufacture. They literally have to plasmify tin droplets with a pulsed laser to reach those frequencies. ASML Holdings is currently the only company that sells these systems, and machines that advanced have their own supply chains. They have very few customers, and (last I checked) only TSMC was really using them successfully at scale. There are a lot of potential policy levers in this space, at least for now.
On a related topic, I am looking to explore how to determine the right scale of the objective function for revenge (or social correction if you prefer a smaller scope). My intuition is that revenge was developed as a mechanism to perform tribal level optimizations. In a situation where there has been a social transgression, and redressing that transgression would be personally costly but societally beneficial, what is the correct balance between personal interest and societal interest?
This is a question for game theory. Trading a state of total anarchy for feudalism, a family who will avenge you is a great deterrent to have. It could even save your life. Revenge is thus a good thing. A moral duty, even. Yes, really. For a smaller scope, being quick to anger and vindictive will make others reluctant to mess with you.
Unfortunately, this also tends to result in endless blood feuds as families get revenge for the revenge for the revenge, at least until one side gets powerful enough to massacre the other. In the smaller scope, maybe you exhaust yourself or risk getting killed fighting duels to protect your honor.
We’ve found that having a central authority to monopolize violence rather than vengeance and courts to settle disputes rather than duels works better. But the instincts for anger and revenge and taking offense are still there. Societies with the better alternatives now consider such instincts bad.
Unfortunately, this kind of improved dispute resolution isn’t available at the largest and smallest scales. There is no central authority to resolve disputes between nations, or at least not ones powerful enough to prevent all wars. We still rely on the principle of vengeance (second strike) to deter nuclear wars. This is not an ideal situation to be in. At the smaller scale, poor inner-city street kids join gangs that could avenge them, use social media show off weapons they’re not legally allowed to have, and have a lot of anger and bluster, all to try to protect themselves in a system that can’t or won’t do that for them.
So, to answer the original question, the optimal balance really depends on your social context.
I am also interested in finding a space to explore ideas which are not well-formed. It isn’t clear to me that this is intended to be such a space. This may simply be due to my ignorance of the mechanics around here.
For not well-formed ideas, you can write a Quick Take (can be found by clicking on your profile name in the top right corner) or starting a dialogue if you want to develop the idea together with someone (can be found in the same corner).
at the personal scale it might yield the decision that one should go work in finance and accrue a pile of utility. But if you apply instrumental rationality to an objective function at the societal scale it might yield the decision to give all your spare resources to the most effective organizations you can find.
My take is that maybe you put on your own oxygen mask first, and then maybe pay a tithe, to the most effective orgs you can find. If you get so rich that even that’s not enough, why not invest in causes that benefit you personally, but society as well? (Medical research, for example.)
I also don’t feel the need to aid potential future enemies just because they happen to be human. (And feel even less obligation for animals.) Folks may legitimately differ on what level counts as having taken care of themselves first. I don’t feel like I’m there yet. Some are probably worse off than me and yet giving a lot more. But neglecting one’s own need is probably not very “effective” either.
I believe it was the Singularity subreddit in this case. I was more or less passing through while searching for places to learn more about principles of ANN for AGI.
I found the site a few months ago due to a link from an AI themed forum. I read the sequences and developed the belief that this was a place for people who think in ways similar to me. I work as a nuclear engineer. When I entered the workforce, I was surprised to find that there weren’t people as dispositioned toward logic as I was. I thought perhaps there wasn’t really a community of similar people and I had largely stopped looking.
This seems like a good place for me to learn, for the time being. Whether or not this is a place for me to develop community remains to be seen. The format seems to promote people presenting well-formed ideas. This seems valuable, but I am also interested in finding a space to explore ideas which are not well-formed. It isn’t clear to me that this is intended to be such a space. This may simply be due to my ignorance of the mechanics around here. That said, this thread seems to be inviting poorly formed ideas and I aim to oblige.
There seem to be some writings around here which speak of instrumental rationality, or “Rationality Is Systematized Winning”. However, this seems to beg the question: “At what scale?” My (perhaps naive) impression is that if you execute instrumental rationality with an objective function at the personal scale it might yield the decision that one should go work in finance and accrue a pile of utility. But if you apply instrumental rationality to an objective function at the societal scale it might yield the decision to give all your spare resources to the most effective organizations you can find. It seems to me that the focus on rationality is important but doesn’t resolve the broader question of “In service of what?” which actually seems to be an important selector of who participates in this community. I don’t see much value in pursuing Machiavellian rationality and my impression is that most here don’t either. I am interested in finding additional work that explores the implications of global scale objective functions.
On a related topic, I am looking to explore how to determine the right scale of the objective function for revenge (or social correction if you prefer a smaller scope). My intuition is that revenge was developed as a mechanism to perform tribal level optimizations. In a situation where there has been a social transgression, and redressing that transgression would be personally costly but societally beneficial, what is the correct balance between personal interest and societal interest?
My current estimate of P(doom) in the next 15 years is 5%. That is, high enough to be concerned , but not high enough to cash out my retirement. I am curious about anyone harboring a P(doom) > 50%. This would seem to be high enough to support drastic actions. What work has been done to develop rational approaches to such a high P(doom)?
This idea is quite poorly formed, but I am interested in exploring how to promote encapsulation, specialization, and reuse of components via the cost function in an artificial neural network. This comes out of the intuition that actions (things described by verbs, or transforms) may be a primitive in human mental architecture and are one of the mechanisms by which analogical connections are searched. I am interested in seeing if continuous mechanisms could be defined to promote the development of a collection of transforms which could be applied usefully across multiple different domains. Relatedly, I am also interested in what an architecture/cost function would need to look like to promote retaining multiple representations of a concept with differing levels of specificity/complexity.
I mean, what do you think we’ve been doing all along?
I’m at like 90% in 20 years, but I’m not claiming even one significant digit on that figure. My drastic actions have been to get depressed enough to be unwilling to work in a job as stressful as my last one. I don’t want to be that miserable if we’ve only got a few years left. I don’t think I’m being sufficiently rational about it, no. It would be more dignified to make lots of money and donate it to the organization with the best chance of stopping or at least delaying our impending doom. I couldn’t tell you which one that is at the moment though.
Some are starting to take more drastic actions. Whether those actions will be effective remains to be seen.
In my view, technical alignment is not keeping up with capabilities advancement. We have no alignment tech robust enough to even possibly survive the likely intelligence explosion scenario, and it’s not likely to be developed in time. Corporate incentive structure and dysfunction makes them insufficiently cautious. Even without an intelligence explosion, we also have no plans for the likely social upheaval from rapid job loss. The default outcome is that human life becomes worthless, because that’s already the case in such economies.
Our best chance at this point is probably government intervention to put the liability back on reckless AI labs for the risks they’re imposing on the rest of us, if not an outright moratorium on massive training runs.
Gladstone has an Action Plan. There’s also https://www.narrowpath.co/.
So, the short answer is that I am actually just ignorant about this. I’m reading here to learn more but I certainly haven’t ingested a sufficient history of relevant works. I’m happy to prioritize any recommendations that others have found insightful or thought provoking, especially from the point of view of a novice.
I can answer the specific question “what do I think” in a bit more detail. The answer should be understood to represent the viewpoint of someone who is new to the discussion and has only been exposed to an algorithmically influenced, self-selected slice of the information.
I watched the Lex Fridman interview of Eliezer Yudkowsky and around 3:06 Lex asks about what advice Eliezer would give to young people. Eliezer’s initial answer is something to the extent of “Don’t expect a long future.” I interpreted Eliezer’s answer largely as trying to evoke a sense of reverence for the seriousness of the problem. When pushed on the question a bit further, Eliezer’s given answer is “…I hardly know how to fight myself at this point.” I interpreted this to mean that the space of possible actions that is being searched appears intractable from the perspective of a dedicated researcher. This, I believe, is largely the source of my question. Current approaches appear to be losing the race, so what other avenues are being explored?
I read the “Thomas Kwa’s MIRI research experience” discussion and there was a statement to the effect that MIRI does not want Nate’s mindset to be known to frontier AI labs. I interpreted this to mean that the most likely course being explored at MIRI is to build a good AI to preempt or stop a bad AI. This strikes me as plausible because my intuition is that the LLM architectures being employed are largely inefficient for developing AGI. However, the compute scaling seems to work well enough that it may win the race before other competing ideas come to fruition.
An example of an alternative approach that I read was “Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible” which seems like an avenue worth exploring, but well outside of my areas of expertise. The approach shares a characteristic with my inference of MIRI’s approach in that both appear to be pursuing highly technical avenues which would not scale meaningfully at this stage by adding helpers from the general public.
The forms of approaches that I expected to see but haven’t seen too much of thus far are those similar to the one that you linked about STOP AI. That is, approaches that would scale with the addition of approximately average people. I expected that this type of approach might take the form of disrupting model training by various means or coopting the organizations involved with an aim toward redirection or delay. My lack of exposure to such information supports a few competing models: (1) drastic actions aren’t being pursued at large scales, (2) actions are being pursued covertly, or (3) I am focusing my attention in the wrong places.
Government action strikes me as a very reasonable approach for people estimating long time scales or relatively lower probabilities. However, it seems to be a less reasonable approach if time scales are short or probabilities are high. I presume that your high P(doom) already accounts for your estimation of the probability of government action being successful. Does your high P(doom) imply that you expect these to be too slow, or too ineffective? I interpret a high P(doom) as meaning that the current set of actions that you have thought of are unlikely to be successful and therefore additional action exploration is necessary. I would expect this would include the admission of ideas which would have previously been pruned because they come with negative consequences.
Besides STOP AI, there’s also the less extreme PauseAI. They’re interested in things like lobbying, protests, lawsuits, etc.
Yep, most of my hope is on our civilization’s coordination mechanisms kicking in in time. Most of the world’s problems seem to be failures to coordinate, but that’s not the same as saying we can’t coordinate. Failures are more salient, but that’s a cognitive bias. We’ve achieved a remarkable level of stability, in the light of recent history. But rationalists can see more clearly than most just how mad the world still is. Most of the public and most of our leaders fail to grasp some of the very basics of epistemology.
We used to think the public wouldn’t get it (because most people are insufficiently sane), but they actually seem appropriately suspicious of AI. We used to think a technical solution was our only realistic option, but progress there has not kept up with more powerful computers brute-forcing AI. In desperation, we asked for more time. We were pleasantly surprised at how well the message was received, but it doesn’t look like the slowdown is actually happening yet.
As a software engineer, I’ve worked in tech companies. Relatively big ones, even. I’ve seen the pressures and dysfunction. I strongly suspected that they’re not taking safety and security seriously enough to actually make a difference, and reports from insiders only confirm that narrative. If those are the institutions calling the shots when we achieve AGI, we’re dead. We desperately need more regulation to force them to behave or stop. I fear that what regulations we do get won’t be enough, but they might.
Other hopes are around a technical breakthrough that advances alignment more than capabilities, or the AI labs somehow failing in their project to produce AGI (despite the considerable resources they’ve already amassed), perhaps due to a breakdown in the scaling laws or some unrelated disaster that makes the projects too expensive to continue.
I have a massive level of uncertainty around AGI timelines, but there’s an uncomfortably large amount of probability mass on the possibility that through some breakthrough or secret project, AGI was achieved yesterday and not caught up with me. We’re out of buffer. But we might still have decades before things get bad. We might be able to coordinate in time, with government intervention.
What ideas are those?
This is where most of my anticipated success paths lie as well.
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents. I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
I have not been thinking deeply in the direction of a superintelligent AGI having been achieved already. It certainly seems possible. It would invalidate most of the things I have thus far thought of as plausible mitigation measures.
Assuming a superintelligent AGI does not already exist, I would expect someone with a high P(doom) to be considering options of the form:
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
Identify key resources involved in AI development and work to restrict those resources. For truly desperate individuals this might look like the Metcalf attack, but a tamer approach might be something more along the lines of investing in a grid operator and pushing to increase delivery fees to data centers.
I haven’t pursued these thoughts in any serious way because my estimation of the threat isn’t as high as yours. I think it is likely we are unintentionally heading toward the Ozymandias approach anyhow.
ChaosGPT already exists. It’s incompetent to the point of being comical at the moment, but maybe more powerful analogues will appear and wreak havoc. Considering the current prevalence of malware, it might be more surprising if something like this didn’t happen.
We’ve already seen developments that could have been considered AI “warning shots” in the past. So far, they haven’t been enough to stop capabilities advancement. Why would the next one be any different? We’re already living in a world with literal wars killing people right now, and crazy terrorists with various ideologies. It’s surprising what people get used to. How bad would a warning shot have to be to shock the world into action given that background noise? Or would we be desensitized by then by the smaller warning shots leading up to it? Boiling the frog, so to speak. I honestly don’t know. And by the time a warning shot gets that bad, can we act in time to survive the next one?
Intentionally causing earlier warning shots would be evil, illegal, destructive, and undignified. Even “purely” economic damage at sufficient scale is going to literally kill people. Our best chance is civilization stepping up and coordinating. That means regulations and treaties, and only then the threat of violence to enforce the laws and impose the global consensus on any remaining rogue nations. That looks like the police and the army, not terrorists and hackers.
The instrumental convergence of goals implies that a powerful AI would almost certainly act to prevent any rivals from emerging, whether aligned or not. In the intelligence explosion scenario, progress would be rapid enough that the first mover achieves a decisive strategic advantage over the entire world. If we find an alignment solution robust enough to survive the intelligence explosion, it will set up guardrails to prevent most catastrophes, including the emergence of unaligned AGIs.
Alignment and capabilities don’t necessarily correlate, and that accounts for lot of why my p(doom) is so high. But more aligned agents are, in principle, more useful, so rational organizations should be motivated to pursue aligned AGI, not just AGI. Unfortunately, alignment research seems barely tractable, capabilities can be brute-forced (and look valuable in the short term) and corporate incentive structures being what they are, in practice, what we’re seeing is a reckless amount of risk taking. Regulation could alter the incentives to balance the externality with appropriate costs.
We have already identified some key resources involved in AI development that could be restricted. The economic bottlenecks are mainly around high energy requirements and chip manufacturing.
Energy is probably too connected to the rest of the economy to be a good regulatory lever, but the U.S. power grid can’t currently handle the scale of the data centers the AI labs want for model training. That might buy us a little time. Big tech is already talking about buying small modular nuclear reactors to power the next generation of data centers. Those probably won’t be ready until the early 2030s. Unfortunately, that also creates pressures to move training to China or the Middle East where energy is cheaper, but where governments are less concerned about human rights.
A recent hurricane flooding high-purity quartz mines made headlines because chip producers require it for the crucibles used in making silicon wafers. Lower purity means accidental doping of the silicon crystal, which means lower chip yields per wafer, at best. Those mines aren’t the only source, but they seem to be the best one. There might also be ways to utilize lower-purity materials, but that might take time to develop and would require a lot more energy, which is already a bottleneck.
The very cutting-edge chips required for AI training runs require some delicate and expensive extreme-ultraviolet lithography machines to manufacture. They literally have to plasmify tin droplets with a pulsed laser to reach those frequencies. ASML Holdings is currently the only company that sells these systems, and machines that advanced have their own supply chains. They have very few customers, and (last I checked) only TSMC was really using them successfully at scale. There are a lot of potential policy levers in this space, at least for now.
This is a question for game theory. Trading a state of total anarchy for feudalism, a family who will avenge you is a great deterrent to have. It could even save your life. Revenge is thus a good thing. A moral duty, even. Yes, really. For a smaller scope, being quick to anger and vindictive will make others reluctant to mess with you.
Unfortunately, this also tends to result in endless blood feuds as families get revenge for the revenge for the revenge, at least until one side gets powerful enough to massacre the other. In the smaller scope, maybe you exhaust yourself or risk getting killed fighting duels to protect your honor.
We’ve found that having a central authority to monopolize violence rather than vengeance and courts to settle disputes rather than duels works better. But the instincts for anger and revenge and taking offense are still there. Societies with the better alternatives now consider such instincts bad.
Unfortunately, this kind of improved dispute resolution isn’t available at the largest and smallest scales. There is no central authority to resolve disputes between nations, or at least not ones powerful enough to prevent all wars. We still rely on the principle of vengeance (second strike) to deter nuclear wars. This is not an ideal situation to be in. At the smaller scale, poor inner-city street kids join gangs that could avenge them, use social media show off weapons they’re not legally allowed to have, and have a lot of anger and bluster, all to try to protect themselves in a system that can’t or won’t do that for them.
So, to answer the original question, the optimal balance really depends on your social context.
For not well-formed ideas, you can write a Quick Take (can be found by clicking on your profile name in the top right corner) or starting a dialogue if you want to develop the idea together with someone (can be found in the same corner).
Yes. And yes. See You Need More Money for the former, Effective Altruism for the latter, and Earning to give for a combination of the two.
As for which to focus on, well, Rationality doesn’t decide for you what your utility function is. That’s on you. (surprise! you want what you want)
My take is that maybe you put on your own oxygen mask first, and then maybe pay a tithe, to the most effective orgs you can find. If you get so rich that even that’s not enough, why not invest in causes that benefit you personally, but society as well? (Medical research, for example.)
I also don’t feel the need to aid potential future enemies just because they happen to be human. (And feel even less obligation for animals.) Folks may legitimately differ on what level counts as having taken care of themselves first. I don’t feel like I’m there yet. Some are probably worse off than me and yet giving a lot more. But neglecting one’s own need is probably not very “effective” either.
I’m interested in knowing which AI forum you came from.
I believe it was the Singularity subreddit in this case. I was more or less passing through while searching for places to learn more about principles of ANN for AGI.