This is a useful video because it is about the limits and possibilities of explanation (because it is an expert on ZKPs), while also (because of the format of the entire youtube series) nicely handling the issue that different audiences need different “levels of explanation”:
I consider some version of “foom” to just be obviously true to all experts who care to study it for long enough, and learn the needed background concepts.
Things like instrumental convergence, and the orthogonality thesis are relevant here, but also probably some security mindset.
Like part of my concern arises from the fact that I’m aware of lots of critical systems for human happiness that are held together with bailing wire, and full of zero-days, and the only reason they haven’t been knocked over yet is that no one smart is stupid-or-evil enough tochoose to knock them over and then robustly toact on that choice.
At the same time, I don’t believe that it is wise to spell out all the zero-days in the critical infrastructure of human civilization.
There is an ongoing debate within the infosec community about proper norms for “responsible disclosure” where “what does ‘responsible’ even mean?” is somewhat contested.
Like from the perspective of many profit-seeking system maintainers who want to build insecure systems and make money from deploying them, they would rather there just be no work at all by anyone to hack any systems, and maybe all such work is irresponsible, and if the work produces exploits them maybe those should just be censored, and if those people want to publish then instead they should give the exploits, for zero pay, to the people whose systems were buggy.
But then security researchers get no pay, so that’s probably unfair as a compensation model.
What happens in practice is (arguably) that some organizations create “bug bounty systems”… and then all the other organizations that don’t have bug bounty systems get hacked over and over and over and over?
But like… if you can solve all the problems for the publication of a relatively simple and seemingly-low-stakes version of “exploits that can root a linux kernel” then by all means, please explain how that should work and let’s start doing that kind of thing all over the rest of society too (including for exploits that become possible with AGI).
A key issue here, arguably, is that if a security measure needs democratic buy-in and the median voter can’t understand “a safe version of the explanation to spread around in public” then… there’s gonna be a bad time.
Options seem logically to include:
1) Just publish lots and lots and lots of concrete vivid workable recipes that could be used by any random idiot to burn the world down from inside any country on the planet using tools available at the local computer shop and free software they can download off the internet.… and keep doing this over and over… until the GLOBAL median voter sees why there should be some GLOBAL government coordination on GLOBAL fire-suppression systems, and then there is a vote, and then the vote is counted, and then a wise ruler is selected as the delegate of the median voter… and then that ruler tries to do some coherent thing that would prevent literally all the published “world burning plans” from succeeding.
2) Give up on legible democracy and/or distinct nation-states and/or many other cherished institutions.
3) Lose.
4) Something creative! <3
Personally, I’m still working off of “option 4” for now, which is something like “encourage global elite culture to courageously embrace their duties and just do the right thing whether or not they will keep office after having done the right thing”.
However, as option 3 rises higher and higher in my probability estimate I’m going to start looking at options 1 and 2 pretty closely.
Also, it is reasonable clear to me that option 2 is a shorter abstract super category that includes option 1, because option 1 also sounds like “not how many cherished institutions currently work”!
So probably what I’ll do is look at option 2, and examine lots of different constraints that might be relaxed, and lots of consequences, and try to pick the least scary “cherished institutional constraints” to give up on, and this will probably NOT include giving up on my norm against publishing lots and logs of world-burning recipes because “maybe that will convince the fools!”
Publishing lots of world-burning recipes still seems quite foolish to me, in comparison to something normal and reasonable.
Like I personally think it would be wiser to swiftly bootstrap a new “United People’s House of Representative Parliament” that treats the existing “United Nation-State Delegates” roughly as if it were a global “Senate” or global “House of Lords” or some such?
Then the “United People” could write legislation using their own internal processes and offer the “United Nations” the right to veto the laws in a single up/down vote?
And I think this would be LESS dangerous than just publishing all the world-burning plans? (But also this is a way to solve ONE of several distinct barriers to reaching a Win Condition (the part of the problem where global coordination institutions don’t actually exist at all) and so this is also probably an inadequate plan all by itself.)
Category 2 is a big category. It is full of plans that would cause lots and lots of people to complain about how the proposals are “too weird”, or sacrifice too many of their “cherished institutions”, or many other complaints.
In summary: this proposals feels like you’re personally asking to be “convinced in public using means that third parties can watch, so that third parties will grant that it isn’t your personal fault for believing something at variance with the herd’s beliefs” and not like your honest private assessment of the real situation is bleak. These are different things.
My private assessment is that if I talk for anyone for several days face-to-face about all of this, and they even understand what Aumann Agreement is (such as to be able to notice in a verbal formal way when their public performance of “belief” is violating core standards of a minimally tolerable epistemology) that seems like a high cost to pay… but one I would gladly pay three or four or five times if me paying those costs was enough to save the world. But in practice, I would have to put a halt to any pretense of having a day job for maybe 6 weeks to do that, and that hit to my family budget is not wise.
Also: I believe that no humans exist who offer that degree of leverage over the benevolent competent rulership of the world, and so it is not worth it, yet, to me, personally, to pay such costs-of-persuasion with… anyone?
Also: the plausible candidates wouldn’t listen to me, because they are currently too busy with things they (wrongly, in my opinion) think are more important to work on than AI stuff.
Also: if you could get the list of people, and plan the conversations, there are surely better people than me to be in those conversations, because the leaders will make decisions based off of lots of non-verbal factors, and I am neither male, nor tall, nor <other useful properties> and so normal-default-human leaders will not listen to me very efficiently.
In summary: this proposals feels like you’re personally asking to be “convinced in public using means that third parties can watch, so that third parties will grant that it isn’t your personal fault for believing something at variance with the herd’s beliefs” and not like your honest private assessment of the real situation is bleak. These are different things.
Well, thats very unfortunate because that was very much not what I was hoping for.
I’m hoping to convince someone somewhere that proposing a concrete model of foom will be useful to help think about policy proposals and steer public discourse. I don’t think such a model has to be exfohazardous at all (see for example the list of technical approaches to the singularity, in the paper I linked—they are good and quite convincing, and not at all exfohazardous)!
Can you say more about the “will be useful to help think about policy proposals and steer public discourse” step?
A new hypothesis is that maybe you want a way to convince OTHER people, in public, via methods that will give THEM plausibility deniability about having to understand or know things based on their own direct assessment of what might or might not be true.
give THEM plausibility deniability about having to understand or know things based on their own direct assessment
I don’t follow what you are getting at here.
I’m just thinking about historical cases of catastrophic risk, and what was done. One thing that was done, was the the government payed very clever people to put together models of what might happen.
My feeling is that the discussion around AI risk is stuck in an inadequate equilibrium, where everyone on the inside thinks its obvious but people on the outside don’t grok it. I’m trying to think of the minimum possible intervention to bridge that gap, something very very different from your ‘talk … for several days face-to-face about all of this’. As you mentioned, this is not scalable.
On a simple level, all exponential explosions work on the same principle, which is that there’s some core resource, and in each unit of time, the resource is roughly linearly usable to cause more of the resource to exist and be similarly usable.
Neutrons in radioactive material above a certain density causes more neutrons and so on to “explosion”.
Prions in living organisms catalyze more prions, which catalyze more prions, and so on until the body becomes “spongiform”.
Oxytocin causes uterine contractions, and uterine contractions are rigged to release more oxytocin, and so on until “the baby comes out”.
“Agentic effectiveness” that loops on itself to cause more agentic effectiveness can work the same way. The inner loop uses optimization power to get more optimization power. Spelling out detailed ways to use optimization power to get more optimization power is the part where it feels like talking about zero-days to me?
Maybe its just that quite a few people literally don’t know how exponential processes work? That part does seem safe to talk about, and if it isn’t safe then the horse is out of the barn anyway. Also, if there was a gap in such knowledge it might explain why they don’t seem to understand this issue, and it would also explain why many of the same people handled covid so poorly.
Do you have a cleaner model of the shape of the ignorance that is causing the current policy failure?
This is a useful video because it is about the limits and possibilities of explanation (because it is an expert on ZKPs), while also (because of the format of the entire youtube series) nicely handling the issue that different audiences need different “levels of explanation”:
I consider some version of “foom” to just be obviously true to all experts who care to study it for long enough, and learn the needed background concepts.
Things like instrumental convergence, and the orthogonality thesis are relevant here, but also probably some security mindset.
Like part of my concern arises from the fact that I’m aware of lots of critical systems for human happiness that are held together with bailing wire, and full of zero-days, and the only reason they haven’t been knocked over yet is that no one smart is stupid-or-evil enough to choose to knock them over and then robustly to act on that choice.
At the same time, I don’t believe that it is wise to spell out all the zero-days in the critical infrastructure of human civilization.
There is an ongoing debate within the infosec community about proper norms for “responsible disclosure” where “what does ‘responsible’ even mean?” is somewhat contested.
Like from the perspective of many profit-seeking system maintainers who want to build insecure systems and make money from deploying them, they would rather there just be no work at all by anyone to hack any systems, and maybe all such work is irresponsible, and if the work produces exploits them maybe those should just be censored, and if those people want to publish then instead they should give the exploits, for zero pay, to the people whose systems were buggy.
But then security researchers get no pay, so that’s probably unfair as a compensation model.
What happens in practice is (arguably) that some organizations create “bug bounty systems”… and then all the other organizations that don’t have bug bounty systems get hacked over and over and over and over?
But like… if you can solve all the problems for the publication of a relatively simple and seemingly-low-stakes version of “exploits that can root a linux kernel” then by all means, please explain how that should work and let’s start doing that kind of thing all over the rest of society too (including for exploits that become possible with AGI).
A key issue here, arguably, is that if a security measure needs democratic buy-in and the median voter can’t understand “a safe version of the explanation to spread around in public” then… there’s gonna be a bad time.
Options seem logically to include:
1) Just publish lots and lots and lots of concrete vivid workable recipes that could be used by any random idiot to burn the world down from inside any country on the planet using tools available at the local computer shop and free software they can download off the internet.… and keep doing this over and over… until the GLOBAL median voter sees why there should be some GLOBAL government coordination on GLOBAL fire-suppression systems, and then there is a vote, and then the vote is counted, and then a wise ruler is selected as the delegate of the median voter… and then that ruler tries to do some coherent thing that would prevent literally all the published “world burning plans” from succeeding.
2) Give up on legible democracy and/or distinct nation-states and/or many other cherished institutions.
3) Lose.
4) Something creative! <3
Personally, I’m still working off of “option 4” for now, which is something like “encourage global elite culture to courageously embrace their duties and just do the right thing whether or not they will keep office after having done the right thing”.
However, as option 3 rises higher and higher in my probability estimate I’m going to start looking at options 1 and 2 pretty closely.
Also, it is reasonable clear to me that option 2 is a shorter abstract super category that includes option 1, because option 1 also sounds like “not how many cherished institutions currently work”!
So probably what I’ll do is look at option 2, and examine lots of different constraints that might be relaxed, and lots of consequences, and try to pick the least scary “cherished institutional constraints” to give up on, and this will probably NOT include giving up on my norm against publishing lots and logs of world-burning recipes because “maybe that will convince the fools!”
Publishing lots of world-burning recipes still seems quite foolish to me, in comparison to something normal and reasonable.
Like I personally think it would be wiser to swiftly bootstrap a new “United People’s House of Representative Parliament” that treats the existing “United Nation-State Delegates” roughly as if it were a global “Senate” or global “House of Lords” or some such?
Then the “United People” could write legislation using their own internal processes and offer the “United Nations” the right to veto the laws in a single up/down vote?
And I think this would be LESS dangerous than just publishing all the world-burning plans? (But also this is a way to solve ONE of several distinct barriers to reaching a Win Condition (the part of the problem where global coordination institutions don’t actually exist at all) and so this is also probably an inadequate plan all by itself.)
Category 2 is a big category. It is full of plans that would cause lots and lots of people to complain about how the proposals are “too weird”, or sacrifice too many of their “cherished institutions”, or many other complaints.
In summary: this proposals feels like you’re personally asking to be “convinced in public using means that third parties can watch, so that third parties will grant that it isn’t your personal fault for believing something at variance with the herd’s beliefs” and not like your honest private assessment of the real situation is bleak. These are different things.
My private assessment is that if I talk for anyone for several days face-to-face about all of this, and they even understand what Aumann Agreement is (such as to be able to notice in a verbal formal way when their public performance of “belief” is violating core standards of a minimally tolerable epistemology) that seems like a high cost to pay… but one I would gladly pay three or four or five times if me paying those costs was enough to save the world. But in practice, I would have to put a halt to any pretense of having a day job for maybe 6 weeks to do that, and that hit to my family budget is not wise.
Also: I believe that no humans exist who offer that degree of leverage over the benevolent competent rulership of the world, and so it is not worth it, yet, to me, personally, to pay such costs-of-persuasion with… anyone?
Also: the plausible candidates wouldn’t listen to me, because they are currently too busy with things they (wrongly, in my opinion) think are more important to work on than AI stuff.
Also: if you could get the list of people, and plan the conversations, there are surely better people than me to be in those conversations, because the leaders will make decisions based off of lots of non-verbal factors, and I am neither male, nor tall, nor <other useful properties> and so normal-default-human leaders will not listen to me very efficiently.
Well, thats very unfortunate because that was very much not what I was hoping for.
I’m hoping to convince someone somewhere that proposing a concrete model of foom will be useful to help think about policy proposals and steer public discourse. I don’t think such a model has to be exfohazardous at all (see for example the list of technical approaches to the singularity, in the paper I linked—they are good and quite convincing, and not at all exfohazardous)!
Can you say more about the “will be useful to help think about policy proposals and steer public discourse” step?
A new hypothesis is that maybe you want a way to convince OTHER people, in public, via methods that will give THEM plausibility deniability about having to understand or know things based on their own direct assessment of what might or might not be true.
I don’t follow what you are getting at here.
I’m just thinking about historical cases of catastrophic risk, and what was done. One thing that was done, was the the government payed very clever people to put together models of what might happen.
My feeling is that the discussion around AI risk is stuck in an inadequate equilibrium, where everyone on the inside thinks its obvious but people on the outside don’t grok it. I’m trying to think of the minimum possible intervention to bridge that gap, something very very different from your ‘talk … for several days face-to-face about all of this’. As you mentioned, this is not scalable.
On a simple level, all exponential explosions work on the same principle, which is that there’s some core resource, and in each unit of time, the resource is roughly linearly usable to cause more of the resource to exist and be similarly usable.
Neutrons in radioactive material above a certain density causes more neutrons and so on to “explosion”.
Prions in living organisms catalyze more prions, which catalyze more prions, and so on until the body becomes “spongiform”.
Oxytocin causes uterine contractions, and uterine contractions are rigged to release more oxytocin, and so on until “the baby comes out”.
(Not all exponential processes are bad, just most. It is an idiom rarely used by biology, and when biology uses the idiom it tends to be used to cause phase transitions where humble begins lead to large outcomes.)
“Agentic effectiveness” that loops on itself to cause more agentic effectiveness can work the same way. The inner loop uses optimization power to get more optimization power. Spelling out detailed ways to use optimization power to get more optimization power is the part where it feels like talking about zero-days to me?
Maybe its just that quite a few people literally don’t know how exponential processes work? That part does seem safe to talk about, and if it isn’t safe then the horse is out of the barn anyway. Also, if there was a gap in such knowledge it might explain why they don’t seem to understand this issue, and it would also explain why many of the same people handled covid so poorly.
Do you have a cleaner model of the shape of the ignorance that is causing the current policy failure?