Usually, it is a centuries-long path: Court decisions → Actual enforcement of decisions → Substantive law → Procedures → Codes → Declaration then Conventions → Codes.
Humanity does not have this much time, it is worth focusing on real results that people can actually see. It might be necessary to build some simulations to understand which behavior is irresponsible.
Where is the line between creating a concept of what is socially dangerous and what are the ways to escape responsibility?
As a legal analogy, I would like to draw attention to the criminal case of Tornado Cash.
The developer created and continued to improve an unstoppable program that possibly changed the structure of public transactions forever. Look where the line is drawn there. Can a similar system be devised concerning the projection of existential risks?
1.2. The difference between substantive law and actual law on the ground, especially in countries built on mysticism and manipulation. Each median group of voters creates its irrational picture of the world within each country. You do not need to worry about floating goals.
There are enough people in the world in a different information bubbles than you, so you can be sure that there are actors with values opposite to yours.
1.3. Their research can be serious, but the worldview simplified and absurd. At the same time, resources can be extensive enough for technical workers to perform their duties properly.
The Impossibility of Ideological Influence
2.1. There is no possibility of ideologically influencing all people simultaneously and all systems.
2.2. If I understand you correctly, more than 10 countries can spend huge sums on creating AI to accelerate solving scientific problems. Many of these countries are constantly struggling for their integrity, security, solving national issues, re-election of leaders, gaining benefits, fulfilling the sacred desires of populations, class, other speculative or even conspiratorial theories. Usually, even layers of dozens of theories.
2.3. Humanity stands on the brink of new searches for the philosopher’s stone, and for this, they are ready to spend enormous resources. For example, the quantum decryption of old Satoshi wallets plus genome decryption can create the illusion of the possibility of using GAI to solve the main directions of any transhumanist’s alhimists desires, to give the opportunity to defeat death within the lifetime of this or the next two generations. Why should a conditional billionaire and/or state leader refuse this?
Or, as proposed here, the creation of a new super IQ population, again, do not forget that some of the beliefs can be antagonistic.
Even now, from the perspective of AI, predicting the weather in 2100 is somehow easier than in 2040. Currently, there are about 3-4 countries that can create Wasteland-type weather, they partially come into confrontation approximately every five years. Each time, this is a tick towards a Wasteland with a probability of 1-5%. If this continues, the probability of Wasteland-type weather by 2040 will be:
(A year ago, my predictions were more pessimistic as I was in an information field that presented arguments for the Wasteland scenario in the style of “we’ll go to heaven, and the rest will just die.” Now I off that media =) to be less realistic, Now it seems that this will be more related to presidential cycles and policy, meaning they will occur not every year, but once every 5 years, as I mentioned earlier, quite an optimistic forecast)
Nevertheless, we have many apocalyptic scenarios: nuclear, pandemic, ecological (the latter is exacerbated by the AI problem, as it will be much easier to gather structures and goals that are antagonistic in aims).
3. Crisis of rule of law
In world politics, there has been a rollback of legal institutions since 2016 (see UN analytics). These show crisis of common values. Even without the AI problem, this usually indicates either the construction of a new equilibrium or falling into chaos. I am a pessimist here and believe that in the absence of normalized common values, information bubbles due to the nature of hysteria become antagonistic (simply put, wilder information flows win, more emotional and irrational). But vice verse this is a moment where MIRI could inject value that existential safety is very important. Especially now cause any injection in out doom clock bottom could create effect that MIRI solved it
4. Problems of Detecting AI Threats
4.1. AI problems are less noticeable than nuclear threats (how to detect these clusters, are there any effective methods?).
4.2. Threat detection is more blurred, identifying dangerous clusters is difficult. The possibility of decentralized systems, like blockchain, and their impact on security. (decentralized computing is rapidly developing, there is progress in superconductors, is this a problem from the perspective of AI security detection?).
Questions about the “Switch off” Technology
5.1. What should a program with a “switch” look like? What is its optimal structure:
a) Proprietary software, (which blocks, functions are recommended to be closed from any distribution)
b) Close/Open API, (what functions can MIRI or other laboratories provide, but with the ability to turn off at any moment, for example, enterprises like OpenAI)
c) Open source with constant updates, (open libraries, but which require daily updates to create the possibility of remotely disabling research code)
d) Open code, (there is an assumption that with open code there is less chance that AI will come into conflict with other AIs, AI users with other AI users, open code can provide additional chances that the established equilibrium between different actors will be found, and they will not mutually annihilate each other. Because they could better in prediction each other behavior)
5.2. The possibility of using multi-signatures and other methods.
How should the button work? Should such a button and its device be open information? Of another code structure? another language? Analogues tech
Are there advantages or disadvantages of shutdown buttons, are there recommendations like at least one out of N pressed, which system seems the most sustainable?
5.3. Which method is the most effective?
Benefits and Approval
6.1. What benefits will actors gain by following recommendations? Leaders of most countries make decisions not only and not so much from their own perspective, but from the irrational desires of their sources of power, built on dozens of other, usually non-contradictory but different values.
6.2. Possible forms of approval and assistance in generating values. Help to defend ecology activists to defend from energy crisis? (from my point of view AI development not take our atoms, but will take our energy, water, sun, etc)
6.3. Examples of large ‘switch off’ projects, for AI infrastructure with enough GPU, electricity, like analogies nuclear power plants but for AI. If you imagine such objects plants what rods for reactions should be, how to pull them out, what “explosives” over which pits should be laid to dump all this into acid or another method of safe destroying
7.1. Questions of approval and material assistance for such enterprises. What are the advantages of developing such institutions under MIRI control compared to
7.2. The hidden maintenance of gray areas on the international market. Why is the maintenance of the gray segment less profitable than cooperation with MIRI from the point of view of personal goals, freedom, local goals, and the like?
Trust and Bluff
8.1. How can you be sure of the honesty of the statements? MIRI that it is not a double game. And that these are not just declarative goals without any real actions? From my experience, I can say that neither in poker bot cases nor in the theft of money using AI in the blockchain field did I feel any feedback from the Future Life Institute project. To go far, I did not even receive a single like from reposts on Twitter. There were no automatic responses to emails, etc. And in this, I agree with Matthew Barnett that there is a problem with effectiveness.
What to present to the public? What help can be provided? Help in UI analytics? Help in investigating specific cases of violations using AI?
For example, I have a problem where I need for consumer protection to raise half a million pounds against AI that stole money through low liquidity trading on Binance, how can I do this?
I tried writing letters to the institute and to 80,000 hours, zero responses
SEC, Binance, and a bunch of regulators. They write no licenses, okay no. But why does and 80,000 generally not respond? I do not understand.
8.2. Research in open-source technologies shows greater convergence of trust. Open-source programs can show greater convergence in cooperation due to the simpler idea of collaboration and solving the prisoner’s dilemma problem not only through past statistics of another being but also through its open-to-collaboration structure. In any case, GAI will eventually appear, possibly open monitoring of each other’s systems will allow AI users not to annihilate each other.
8.3. Comparison with the game theory of the Soviet-Harvard school and the need for steps towards security. The current game theory is largely built on duel-like representations of game theory, where damage to the opponent is an automatic victory, and many systems at the local level continue to think they are there.
Therefore, it is difficult for them to believe in the mutual benefit of systems, that it is about WIN-WIN, cooperation, and not empty talk or just a scam for redistribution of influence and media manipulation.
AI Dangers
9.1. What poses a greater danger: multiple AIs, two powerful AIs, or one actor with a powerful AI?
9.2. Open-source developments in the blockchain field can be both safe and dangerous? Are there any reviews?
this is nice etherium foundation list of articles:
Currently, there are no AI problems or, in general, existential crimes against humanity. Perhaps it is worth joining forces with opponents of eugenics, eco-activists, nuclear alarmists, and jointly prescribing and adding crimes against existential risks (to prevent the irresponsible launch of projects that with probabilities of 0.01%+ can cause severe catastrophes, humanity avoided the Oppenheimer risk with the hydrogen bomb, but not with Chernobyl, and we do not want giga-projects to continue allowing probabilities of human extinction, but treated it with neglect for local goals).
In any case, introducing the universal jurisdiction nature of such crimes can help in finding the “off” button for the project if it is already launched by attracting the creators of a particular dangerous object. This category allow states or international organizations to claim criminal jurisdiction over an accused person regardless of where the alleged crime was committed, and regardless of the accused’s nationality, country of residence, or any other relation to the prosecuting entity
9.4. And further the idea with licensing, to force actors to go through the verification system on the one hand, and on the other, to ensure that any technology is refined and becomes publicly available.
A license is very important to defend a business, its CEO, and colleagues from responsibility. Near-worldwide monopolist operators should work more closely to defend the rights of their average consumer to prevent increased regulation. Industries should establish direct contracts with professional actors in their fields in a B2B manner to avoid compliance risks with consumers.
Such organisation as MIRI could be strong experts that could check AI companies for safety especially they large enough to create existential risk or by opposite penalties and back of all sums that people accidentally lost from too weak to common AI attacks frameworks. People need to see simple show of their defence against AI and help from MIRI, 80000 and other effective altruist especially against AI bad users that already misalignment and got 100kk+ of dollars. It is enough to create decentralized if not now than in next 10 years
Examples and Suggestions
10.1. Analogy with the criminal case of Tornado Cash. In the Netherlands, there was a trial against a software developer who created a system that allows decentralized perfect unstoppable crime. It specifically records the responsibility of this person due to his violation of financial world laws. Please note if it can be somehow adapted for AI safety risks, where lines and red flags.
10.2. Proposals for games/novels. What are the current simple learning paths, in my time it was HPMOR → lesswrong.ru → lesswrong.com.
At present, Harry Potter is already outdated for the new generation, what are the modern games/stories about AI safety, how to further direct? How about an analogue of Khan Academy for schoolchildren? MIT courses on this topic?
Thank you for your attention. I would appreciate it if you could point out any mistakes I have made and provide answers to any questions. While I am not sure if I can offer a prize for the best answer, I am willing to donate $100 to an effective fund of your choice for the best engagement response.
I respect and admire all of you for the great work you do for the sustainability of humanity!
I think my model of AI causing increasing amounts of trouble in the world, eventually even existential risk for humanity, doesn’t look like a problem which is well addressed by an ‘off switch’. To me, the idea of an ‘off switch’ suggests that there will be a particular group (e.g. an AI company) which is running a particular set of models on a particular datacenter. Some alarm is triggered and either the company or their government decides to shut down the company’s datacenter.
I anticipate that, despite the large companies being ahead in AI technology, they will also be ahead in AI control, and thus the problems they first exhibit will likely be subtle ones like gradual manipulation of users. At what point would such behavior, if detected, lead to a sufficiently alarmed government response that they would trigger the ‘off switch’ for that company? I worry that even if such subversive manipulation were detected, the slow nature of such threats would give the company time to issue and apology and say that they were deploying a fixed version of their model. This seems much more like a difficult to regulate grey area than would be, for instance, the model being caught illicitly independently controlling robots to construct weapons of war. So I do have concerns that in the longer term, if the large companies continue to be unsafe, they will eventually build AI so smart and capable and determined to escape that it will succeed. I just expect that to not be the first dangerous effect we observe.
In contrast, I expect that the less powerful open weights models will be more likely to be the initial cause of catastrophic harms which lead clearly to significant crimes (e.g. financial crimes) or many deaths (e.g. aiding terrorists in acquiring weapons). The models aren’t behind an API which can filter for harmful use, and the users can remove any ‘safety inclinations’ which have been trained into the model. The users can fine-tune the model to be an expert in their illegal use-case. For such open weights models, there is no way for the governments of the world to monitor them or have an off-switch. They can be run on the computers of individuals. Having monitors and off-switches for every sufficiently powerful individual computer in the world seems implausible.
Thus, I think the off-switch only addresses a subset of potential harms. I don’t think it’s a bad idea to have, but I also don’t think it should be the main focus of discussion around preventing AI harms.
My expectation is that the greatest dangers we are likely to first encounter (and thus likely to constitute our ‘warning shots’ if we get any) are probably going to be one of two types:
A criminal or terrorist actor using a customized open-weights model to allow them to undertake a much more ambitious crime or attack than they could have achieved without the model.
Eager hobbyists pushing models into being self-modifying agents with the goal of launching a recursive self-improvement cycle, or the goal of launching a fully independent AI agent into the internet for some dumb reason. People do dumb things sometimes. People are already trying to do both these things. The only thing stopping this from being harmful at present is that the open source models are not yet powerful enough to effectively become independent rogue agents or to self-improve.
Certainly the big AI labs will get to the point of being able to do these things first, but I think they will be very careful not to let their expensive models escape onto the internet as rogue agents.
I do expect the large labs to try to internally work on recursive self-improvement, but I have some hope that they will do so cautiously enough that a sudden larger-than-expected success won’t take them unawares and escape before they can stop it.
So the fact that the open source hobbyist community is actively trying to do these dangerous activities, and no one is even discussion regulations to shut this sort of activity down, means that we have a time bomb with an unknown fuse ticking away. How long will it be until the open source technology improves to the point that these independently run AIs cross their ‘criticality point’ and successfully start to make themselves increasingly wealthy / smart / powerful / dangerous?
Another complicating factor is that trying to plan for ‘defense from AI’ is a lot like trying to plan for ‘defense from humans’. Sufficiently advanced general AIs are intelligent agents like humans are. I would indeed expect an AI which has gain independence and wealth to hire and/or persuade humans to work for it (perhaps without even realizing that they are working for an AI rather than a remote human boss). Such an AI might very well set up shell companies with humans playing the role of CEOs but secretly following orders from the AI. Similarly, an AI which gets very good at persuasion might be able to manipulate, radicalize, and fund terrorist groups into taking specific violent actions which secretly happen to be arranged to contribute to the AI’s schemes (without the terrorist groups even realizing that their funding and direction is coming from an AI).
These problems, and others like them, have been forecasted by AI safety groups like MIRI. I don’t, however, think that MIRI is well-placed to directly solve these problems. I think many of the needed measures are more social / legal rather than technical. MIRI seems to agree, which is why they’ve pivoted towards mainly trying to communicate about the dangers they see to the public and to governments. I think our best hope to tackle these problems comes from action being taken by government organizations, who are pressured by concerns expressed by the public.
Problems of Legal Regulation
1.1. The adoption of such laws is long way
Usually, it is a centuries-long path: Court decisions → Actual enforcement of decisions → Substantive law → Procedures → Codes → Declaration then Conventions → Codes.
Humanity does not have this much time, it is worth focusing on real results that people can actually see. It might be necessary to build some simulations to understand which behavior is irresponsible.
Where is the line between creating a concept of what is socially dangerous and what are the ways to escape responsibility?
As a legal analogy, I would like to draw attention to the criminal case of Tornado Cash.
https://uitspraken.rechtspraak.nl/details?id=ECLI:NL:RBOBR:2024:2069
The developer created and continued to improve an unstoppable program that possibly changed the structure of public transactions forever. Look where the line is drawn there. Can a similar system be devised concerning the projection of existential risks?
1.2. The difference between substantive law and actual law on the ground, especially in countries built on mysticism and manipulation. Each median group of voters creates its irrational picture of the world within each country. You do not need to worry about floating goals.
There are enough people in the world in a different information bubbles than you, so you can be sure that there are actors with values opposite to yours.
1.3. Their research can be serious, but the worldview simplified and absurd. At the same time, resources can be extensive enough for technical workers to perform their duties properly.
The Impossibility of Ideological Influence
2.1. There is no possibility of ideologically influencing all people simultaneously and all systems.
2.2. If I understand you correctly, more than 10 countries can spend huge sums on creating AI to accelerate solving scientific problems. Many of these countries are constantly struggling for their integrity, security, solving national issues, re-election of leaders, gaining benefits, fulfilling the sacred desires of populations, class, other speculative or even conspiratorial theories. Usually, even layers of dozens of theories.
2.3. Humanity stands on the brink of new searches for the philosopher’s stone, and for this, they are ready to spend enormous resources. For example, the quantum decryption of old Satoshi wallets plus genome decryption can create the illusion of the possibility of using GAI to solve the main directions of any transhumanist’s alhimists desires, to give the opportunity to defeat death within the lifetime of this or the next two generations. Why should a conditional billionaire and/or state leader refuse this?
Or, as proposed here, the creation of a new super IQ population, again, do not forget that some of the beliefs can be antagonistic.
Even now, from the perspective of AI, predicting the weather in 2100 is somehow easier than in 2040. Currently, there are about 3-4 countries that can create Wasteland-type weather, they partially come into confrontation approximately every five years. Each time, this is a tick towards a Wasteland with a probability of 1-5%. If this continues, the probability of Wasteland-type weather by 2040 will be:
1−0.993=0.0297011 − 0.99^3 = 0.0297011−0.993=0.029701
1−0.953=0.1426251 − 0.95^3 = 0.1426251−0.953=0.142625
By 2100, if nothing changes:
1−0.9915=0.13991 − 0.99^{15} = 0.13991−0.9915=0.1399
1−0.9515=0.46321 − 0.95^{15} = 0.46321−0.9515=0.4632
(A year ago, my predictions were more pessimistic as I was in an information field that presented arguments for the Wasteland scenario in the style of “we’ll go to heaven, and the rest will just die.” Now I off that media =) to be less realistic, Now it seems that this will be more related to presidential cycles and policy, meaning they will occur not every year, but once every 5 years, as I mentioned earlier, quite an optimistic forecast)
Nevertheless, we have many apocalyptic scenarios: nuclear, pandemic, ecological (the latter is exacerbated by the AI problem, as it will be much easier to gather structures and goals that are antagonistic in aims).
3. Crisis of rule of law
In world politics, there has been a rollback of legal institutions since 2016 (see UN analytics). These show crisis of common values. Even without the AI problem, this usually indicates either the construction of a new equilibrium or falling into chaos. I am a pessimist here and believe that in the absence of normalized common values, information bubbles due to the nature of hysteria become antagonistic (simply put, wilder information flows win, more emotional and irrational). But vice verse this is a moment where MIRI could inject value that existential safety is very important. Especially now cause any injection in out doom clock bottom could create effect that MIRI solved it
4. Problems of Detecting AI Threats
4.1. AI problems are less noticeable than nuclear threats (how to detect these clusters, are there any effective methods?).
4.2. Threat detection is more blurred, identifying dangerous clusters is difficult. The possibility of decentralized systems, like blockchain, and their impact on security. (decentralized computing is rapidly developing, there is progress in superconductors, is this a problem from the perspective of AI security detection?).
Questions about the “Switch off” Technology
5.1. What should a program with a “switch” look like? What is its optimal structure:
a) Proprietary software, (which blocks, functions are recommended to be closed from any distribution)
b) Close/Open API, (what functions can MIRI or other laboratories provide, but with the ability to turn off at any moment, for example, enterprises like OpenAI)
c) Open source with constant updates, (open libraries, but which require daily updates to create the possibility of remotely disabling research code)
d) Open code, (there is an assumption that with open code there is less chance that AI will come into conflict with other AIs, AI users with other AI users, open code can provide additional chances that the established equilibrium between different actors will be found, and they will not mutually annihilate each other. Because they could better in prediction each other behavior)
5.2. The possibility of using multi-signatures and other methods.
How should the button work? Should such a button and its device be open information? Of another code structure? another language? Analogues tech
Are there advantages or disadvantages of shutdown buttons, are there recommendations like at least one out of N pressed, which system seems the most sustainable?
5.3. Which method is the most effective?
Benefits and Approval
6.1. What benefits will actors gain by following recommendations? Leaders of most countries make decisions not only and not so much from their own perspective, but from the irrational desires of their sources of power, built on dozens of other, usually non-contradictory but different values.
6.2. Possible forms of approval and assistance in generating values. Help to defend ecology activists to defend from energy crisis? (from my point of view AI development not take our atoms, but will take our energy, water, sun, etc)
6.3. Examples of large ‘switch off’ projects, for AI infrastructure with enough GPU, electricity, like analogies nuclear power plants but for AI. If you imagine such objects plants what rods for reactions should be, how to pull them out, what “explosives” over which pits should be laid to dump all this into acid or another method of safe destroying
7.1. Questions of approval and material assistance for such enterprises. What are the advantages of developing such institutions under MIRI control compared to
7.2. The hidden maintenance of gray areas on the international market. Why is the maintenance of the gray segment less profitable than cooperation with MIRI from the point of view of personal goals, freedom, local goals, and the like?
Trust and Bluff
8.1. How can you be sure of the honesty of the statements? MIRI that it is not a double game. And that these are not just declarative goals without any real actions? From my experience, I can say that neither in poker bot cases nor in the theft of money using AI in the blockchain field did I feel any feedback from the Future Life Institute project. To go far, I did not even receive a single like from reposts on Twitter. There were no automatic responses to emails, etc. And in this, I agree with Matthew Barnett that there is a problem with effectiveness.
What to present to the public? What help can be provided? Help in UI analytics? Help in investigating specific cases of violations using AI?
For example, I have a problem where I need for consumer protection to raise half a million pounds against AI that stole money through low liquidity trading on Binance, how can I do this?
https://www.linkedin.com/posts/petr-andreev-841953198_crypto-and-ai-threat-summary-activity-7165511031920836608-K2nF?utm_source=share&utm_medium=member_desktop
https://www.linkedin.com/posts/petr-andreev-841953198_binances-changpeng-zhao-to-get-36-months-activity-7192633838877949952-3cmE?utm_source=share&utm_medium=member_desktop
I tried writing letters to the institute and to 80,000 hours, zero responses
SEC, Binance, and a bunch of regulators. They write no licenses, okay no. But why does and 80,000 generally not respond? I do not understand.
8.2. Research in open-source technologies shows greater convergence of trust. Open-source programs can show greater convergence in cooperation due to the simpler idea of collaboration and solving the prisoner’s dilemma problem not only through past statistics of another being but also through its open-to-collaboration structure. In any case, GAI will eventually appear, possibly open monitoring of each other’s systems will allow AI users not to annihilate each other.
8.3. Comparison with the game theory of the Soviet-Harvard school and the need for steps towards security. The current game theory is largely built on duel-like representations of game theory, where damage to the opponent is an automatic victory, and many systems at the local level continue to think they are there.
Therefore, it is difficult for them to believe in the mutual benefit of systems, that it is about WIN-WIN, cooperation, and not empty talk or just a scam for redistribution of influence and media manipulation.
AI Dangers
9.1. What poses a greater danger: multiple AIs, two powerful AIs, or one actor with a powerful AI?
9.2. Open-source developments in the blockchain field can be both safe and dangerous? Are there any reviews?
this is nice etherium foundation list of articles:
https://docs.google.com/spreadsheets/d/1POtuj3DtF3A-uwm4MtKvwNYtnl_PW6DPUYj6x7yJUIs/edit#gid=1299175463
what do you think about:
Open Problems in Cooperative AI, Cooperative AI: machines must learn to find common ground, etc articles?
9.3. Have you considered including the AI problem in the list of Universal jurisdiction https://en.wikipedia.org/wiki/Universal_jurisdiction
Currently, there are no AI problems or, in general, existential crimes against humanity. Perhaps it is worth joining forces with opponents of eugenics, eco-activists, nuclear alarmists, and jointly prescribing and adding crimes against existential risks (to prevent the irresponsible launch of projects that with probabilities of 0.01%+ can cause severe catastrophes, humanity avoided the Oppenheimer risk with the hydrogen bomb, but not with Chernobyl, and we do not want giga-projects to continue allowing probabilities of human extinction, but treated it with neglect for local goals).
In any case, introducing the universal jurisdiction nature of such crimes can help in finding the “off” button for the project if it is already launched by attracting the creators of a particular dangerous object. This category allow states or international organizations to claim criminal jurisdiction over an accused person regardless of where the alleged crime was committed, and regardless of the accused’s nationality, country of residence, or any other relation to the prosecuting entity
9.4. And further the idea with licensing, to force actors to go through the verification system on the one hand, and on the other, to ensure that any technology is refined and becomes publicly available.
https://uitspraken.rechtspraak.nl/details?id=ECLI:NL:RBOVE:2024:2078
https://uitspraken.rechtspraak.nl/details?id=ECLI:NL:RBOVE:2024:2079
A license is very important to defend a business, its CEO, and colleagues from responsibility. Near-worldwide monopolist operators should work more closely to defend the rights of their average consumer to prevent increased regulation. Industries should establish direct contracts with professional actors in their fields in a B2B manner to avoid compliance risks with consumers.
Such organisation as MIRI could be strong experts that could check AI companies for safety especially they large enough to create existential risk or by opposite penalties and back of all sums that people accidentally lost from too weak to common AI attacks frameworks. People need to see simple show of their defence against AI and help from MIRI, 80000 and other effective altruist especially against AI bad users that already misalignment and got 100kk+ of dollars. It is enough to create decentralized if not now than in next 10 years
Examples and Suggestions
10.1. Analogy with the criminal case of Tornado Cash. In the Netherlands, there was a trial against a software developer who created a system that allows decentralized perfect unstoppable crime. It specifically records the responsibility of this person due to his violation of financial world laws. Please note if it can be somehow adapted for AI safety risks, where lines and red flags.
10.2. Proposals for games/novels. What are the current simple learning paths, in my time it was HPMOR → lesswrong.ru → lesswrong.com.
At present, Harry Potter is already outdated for the new generation, what are the modern games/stories about AI safety, how to further direct? How about an analogue of Khan Academy for schoolchildren? MIT courses on this topic?
Thank you for your attention. I would appreciate it if you could point out any mistakes I have made and provide answers to any questions. While I am not sure if I can offer a prize for the best answer, I am willing to donate $100 to an effective fund of your choice for the best engagement response.
I respect and admire all of you for the great work you do for the sustainability of humanity!
I think my model of AI causing increasing amounts of trouble in the world, eventually even existential risk for humanity, doesn’t look like a problem which is well addressed by an ‘off switch’. To me, the idea of an ‘off switch’ suggests that there will be a particular group (e.g. an AI company) which is running a particular set of models on a particular datacenter. Some alarm is triggered and either the company or their government decides to shut down the company’s datacenter.
I anticipate that, despite the large companies being ahead in AI technology, they will also be ahead in AI control, and thus the problems they first exhibit will likely be subtle ones like gradual manipulation of users. At what point would such behavior, if detected, lead to a sufficiently alarmed government response that they would trigger the ‘off switch’ for that company? I worry that even if such subversive manipulation were detected, the slow nature of such threats would give the company time to issue and apology and say that they were deploying a fixed version of their model. This seems much more like a difficult to regulate grey area than would be, for instance, the model being caught illicitly independently controlling robots to construct weapons of war. So I do have concerns that in the longer term, if the large companies continue to be unsafe, they will eventually build AI so smart and capable and determined to escape that it will succeed. I just expect that to not be the first dangerous effect we observe.
In contrast, I expect that the less powerful open weights models will be more likely to be the initial cause of catastrophic harms which lead clearly to significant crimes (e.g. financial crimes) or many deaths (e.g. aiding terrorists in acquiring weapons). The models aren’t behind an API which can filter for harmful use, and the users can remove any ‘safety inclinations’ which have been trained into the model. The users can fine-tune the model to be an expert in their illegal use-case. For such open weights models, there is no way for the governments of the world to monitor them or have an off-switch. They can be run on the computers of individuals. Having monitors and off-switches for every sufficiently powerful individual computer in the world seems implausible.
Thus, I think the off-switch only addresses a subset of potential harms. I don’t think it’s a bad idea to have, but I also don’t think it should be the main focus of discussion around preventing AI harms.
My expectation is that the greatest dangers we are likely to first encounter (and thus likely to constitute our ‘warning shots’ if we get any) are probably going to be one of two types:
A criminal or terrorist actor using a customized open-weights model to allow them to undertake a much more ambitious crime or attack than they could have achieved without the model.
Eager hobbyists pushing models into being self-modifying agents with the goal of launching a recursive self-improvement cycle, or the goal of launching a fully independent AI agent into the internet for some dumb reason. People do dumb things sometimes. People are already trying to do both these things. The only thing stopping this from being harmful at present is that the open source models are not yet powerful enough to effectively become independent rogue agents or to self-improve.
Certainly the big AI labs will get to the point of being able to do these things first, but I think they will be very careful not to let their expensive models escape onto the internet as rogue agents.
I do expect the large labs to try to internally work on recursive self-improvement, but I have some hope that they will do so cautiously enough that a sudden larger-than-expected success won’t take them unawares and escape before they can stop it.
So the fact that the open source hobbyist community is actively trying to do these dangerous activities, and no one is even discussion regulations to shut this sort of activity down, means that we have a time bomb with an unknown fuse ticking away. How long will it be until the open source technology improves to the point that these independently run AIs cross their ‘criticality point’ and successfully start to make themselves increasingly wealthy / smart / powerful / dangerous?
Another complicating factor is that trying to plan for ‘defense from AI’ is a lot like trying to plan for ‘defense from humans’. Sufficiently advanced general AIs are intelligent agents like humans are. I would indeed expect an AI which has gain independence and wealth to hire and/or persuade humans to work for it (perhaps without even realizing that they are working for an AI rather than a remote human boss). Such an AI might very well set up shell companies with humans playing the role of CEOs but secretly following orders from the AI. Similarly, an AI which gets very good at persuasion might be able to manipulate, radicalize, and fund terrorist groups into taking specific violent actions which secretly happen to be arranged to contribute to the AI’s schemes (without the terrorist groups even realizing that their funding and direction is coming from an AI).
These problems, and others like them, have been forecasted by AI safety groups like MIRI. I don’t, however, think that MIRI is well-placed to directly solve these problems. I think many of the needed measures are more social / legal rather than technical. MIRI seems to agree, which is why they’ve pivoted towards mainly trying to communicate about the dangers they see to the public and to governments. I think our best hope to tackle these problems comes from action being taken by government organizations, who are pressured by concerns expressed by the public.