My guess is an attempt to explain where I think we actually differ in “generative intuitions” will be more useful than a direct response to your conclusions, so here it is. How to read it: roughly, this is attempting to just jump past several steps of double-crux to the area where I suspect actual cruxes lie.
Continuity
In my view, your ontology of thinking about the problem is fundamentally discrete. For example, you are imaging a sharp boundary between a class of systems “weak, won’t kill you, but also won’t help you with alignment” and “strong—would help you with alignment, but, unfortunately, will kill you by default”. Discontinuities everywhere—“bad systems are just one sign flip away”, sudden jumps in capabilities, etc. Thinking in symbolic terms.
In my inside view, in reality, things are instead mostly continuous. Discontinuities sometimes emerge out of continuity, sure, but this is often noticeable. If you get some interpretability and oversight things right, you can slow down before hitting the abyss. Also the jumps are often not true “jumps” under closer inspection.
I don’t think there is any practical way to reconcile this difference of intuitions—my guess is intuitions about continuity/discreteness are quite deep-seated, and based more on how people do maths, rather than some specific observation about the world. In practice, for most people, the “intuition” is something like a deep net trained on whole life of STEM reasoning - they won’t update on individual datapoints, and if they are smart, they are able to re-interpret the observations to be in line with their view. Also I think trying to get you to share my continuous intuition is mostly futile—my hypothesis is this is possibly the top deep crux of your disagreements with Paul, and reading the debates between you two gives me little hope of you switching to a “continuous” perspective.
I also believe that the “discrete” ontology is great for noticing problems and served you well in noticing many deep and hard problems. (I use it to spot problems sometimes too.) At the same time, it’s likely much less useful for solving the problems.
Also, if anything, how SOTA systems look suggest mostly continuity, stochasticity, “biology”, “emergence”. Usually no proofs, no symbolically verifiable guarantees..
Things will be weird before getting extremely weird
Assuming continuity, things will get weird before getting extremely weird. This likely includes domains such as politics, geopolitics, experience of individual humans,… My impression is that you are mostly imagining just slightly modified politics, quite similar to today.. In this context, a gradient-descending model in some datacentre hits the “core of consequentialist reasoning”, we are all soon dead. I see that this is possible, but I bet more on scenarios where we get AGI when politics is very different compared to today.
Models of politics
Actually, we also probably disagree about politics. Correct me if I’m wrong, but your “mainline” winning scenario was and still is something like the leading team creating an aligned AGI, this system gets decisive strategic advantage, and “solves” politics by forming a singleton (and preventing all other teams to develop AGI). Decisive pivotal acts, and so on.
To me, this seems an implausible and dangerous theory of how to solve politics, in the real world, in continuous takeoffs. Continuity will usually mean no one gets a decisive advantage—the most powerful AI system will be still much weaker than “rest of the world”, and the rest of the world will fight back against takeover.
Under the “ecosystem” view, we will need to solve “ecosystem alignment”—including possible coordination of the ecosystem to prevent formation of superintelligent and unbounded agents.
(It seems likely this would benefit from decent math, similarly to how the math of MAD was instrumental in us not nuking ourselves.)
Sociology of AI safety
I think you have a strange model about which position is “quiet”. Your writing is followed passionately by many: just the latest example, your “dying with dignity” framing got a lot of attention.
My guess is that following you too closely, which many people do, is currently net harmful. I’m sceptical that people who get caught up too much in your way of looking at the problem will make much progress. You’re a master of your way of looking at it, you’ve spent decades thinking about AI safety in this ontology and you don’t see any promising way to solve the problem.
Conclusion
I think what you parse as “a simply bad paradigm on which to approach things” would start to make more sense if you adopted the “continuous” assumptions, and an assumption that the world would be quite weird and complex at the decisive period.
(Personally I do understand how my conclusions would change if I adopted much more “discrete” view, and yes, I would be much more pessimistic about both what I work on, and our prospects.)
I think this comment is lumping together the following assumptions under the “continuity” label, as if there is a reason to believe that either they are all correct or all incorrect (and I don’t see why):
There is large distance in model space between models that behave very differently.
Takeoff will be slow.
It is feasible to create models that are weak enough to not pose an existential risk yet able to sufficiently help with alignment.
I bet more on scenarios where we get AGI when politics is very different compared to today.
I agree that just before “super transformative” ~AGI systems are first created, the world may look very differently than it does today. This is one of the reasons I think Eliezer has too much credence on doom.
To briefly hop in and say something that may be useful: I had a reaction pretty similar to what Eliezer commented, and I don’t see continuity or “Things will be weird before getting extremely weird” as a crux. (I don’t know why you think he does, and don’t know what he thinks, but would guess he doesn’t think it’s a crux either)
I’ve been part or read enough debates with Eliezer to have some guesses how the argument would go, so I made the move of skipping several steps of double-crux to the area where I suspect actual cruxes lie.
I think exploring the whole debate-tree or argument map would be quite long, so I’ll just try to gesture at how some of these things are connected, in my map.
- pivotal acts vs. pivotal processes —my take is people’s stance on feasibility of pivotal acts vs. processes partially depends on continuity assumptions—what do you believe about pivotal acts?
- assuming continuity, do you expect existing non-human agents to move important parts of their cognition to AI substrates? -- if yes, do you expect large-scale regulations around that? --- if yes, will it be also partially automated?
- different route: assuming continuity, do you expect a lot of alignment work to be done partially by AI systems, inside places like OpenAI? -- if at the same time this is a huge topic for the whole society, academia and politics, would you expect the rest of the world not trying to influence this?
- different route: assuming continuity, do you expect a lot of “how different entities in the world coordinate” to be done partially by AI systems? -- if yes, do you assume technical features of the system matter? like, eg., multi-agent deliberation dynamics?
- assuming the world notices AI safety as problem (it did much more since writing this post) -- do you expect large amount of attention and resources of academia and industry will be spent on AI alignment? --- would you expect this will be somehow related to the technical problems and how we understand them? --- eg do you think it makes no difference to the technical problem if 300 or 30k people work on it? ---- if it makes a difference, does it make a difference how is the attention allocated?
Not sure if the doublecrux between us would rest on the same cruxes, but I’m happy to try :)
My guess is an attempt to explain where I think we actually differ in “generative intuitions” will be more useful than a direct response to your conclusions, so here it is. How to read it: roughly, this is attempting to just jump past several steps of double-crux to the area where I suspect actual cruxes lie.
Continuity
In my view, your ontology of thinking about the problem is fundamentally discrete. For example, you are imaging a sharp boundary between a class of systems “weak, won’t kill you, but also won’t help you with alignment” and “strong—would help you with alignment, but, unfortunately, will kill you by default”. Discontinuities everywhere—“bad systems are just one sign flip away”, sudden jumps in capabilities, etc. Thinking in symbolic terms.
In my inside view, in reality, things are instead mostly continuous. Discontinuities sometimes emerge out of continuity, sure, but this is often noticeable. If you get some interpretability and oversight things right, you can slow down before hitting the abyss. Also the jumps are often not true “jumps” under closer inspection.
I don’t think there is any practical way to reconcile this difference of intuitions—my guess is intuitions about continuity/discreteness are quite deep-seated, and based more on how people do maths, rather than some specific observation about the world. In practice, for most people, the “intuition” is something like a deep net trained on whole life of STEM reasoning - they won’t update on individual datapoints, and if they are smart, they are able to re-interpret the observations to be in line with their view. Also I think trying to get you to share my continuous intuition is mostly futile—my hypothesis is this is possibly the top deep crux of your disagreements with Paul, and reading the debates between you two gives me little hope of you switching to a “continuous” perspective.
I also believe that the “discrete” ontology is great for noticing problems and served you well in noticing many deep and hard problems. (I use it to spot problems sometimes too.) At the same time, it’s likely much less useful for solving the problems.
Also, if anything, how SOTA systems look suggest mostly continuity, stochasticity, “biology”, “emergence”. Usually no proofs, no symbolically verifiable guarantees..
Things will be weird before getting extremely weird
Assuming continuity, things will get weird before getting extremely weird. This likely includes domains such as politics, geopolitics, experience of individual humans,… My impression is that you are mostly imagining just slightly modified politics, quite similar to today.. In this context, a gradient-descending model in some datacentre hits the “core of consequentialist reasoning”, we are all soon dead. I see that this is possible, but I bet more on scenarios where we get AGI when politics is very different compared to today.
Models of politics
Actually, we also probably disagree about politics. Correct me if I’m wrong, but your “mainline” winning scenario was and still is something like the leading team creating an aligned AGI, this system gets decisive strategic advantage, and “solves” politics by forming a singleton (and preventing all other teams to develop AGI). Decisive pivotal acts, and so on.
To me, this seems an implausible and dangerous theory of how to solve politics, in the real world, in continuous takeoffs. Continuity will usually mean no one gets a decisive advantage—the most powerful AI system will be still much weaker than “rest of the world”, and the rest of the world will fight back against takeover.
Under the “ecosystem” view, we will need to solve “ecosystem alignment”—including possible coordination of the ecosystem to prevent formation of superintelligent and unbounded agents.
(It seems likely this would benefit from decent math, similarly to how the math of MAD was instrumental in us not nuking ourselves.)
Sociology of AI safety
I think you have a strange model about which position is “quiet”. Your writing is followed passionately by many: just the latest example, your “dying with dignity” framing got a lot of attention.
My guess is that following you too closely, which many people do, is currently net harmful. I’m sceptical that people who get caught up too much in your way of looking at the problem will make much progress. You’re a master of your way of looking at it, you’ve spent decades thinking about AI safety in this ontology and you don’t see any promising way to solve the problem.
Conclusion
I think what you parse as “a simply bad paradigm on which to approach things” would start to make more sense if you adopted the “continuous” assumptions, and an assumption that the world would be quite weird and complex at the decisive period.
(Personally I do understand how my conclusions would change if I adopted much more “discrete” view, and yes, I would be much more pessimistic about both what I work on, and our prospects.)
I think this comment is lumping together the following assumptions under the “continuity” label, as if there is a reason to believe that either they are all correct or all incorrect (and I don’t see why):
There is large distance in model space between models that behave very differently.
Takeoff will be slow.
It is feasible to create models that are weak enough to not pose an existential risk yet able to sufficiently help with alignment.
I agree that just before
“super transformative”~AGI systems are first created, the world may look very differently than it does today. This is one of the reasons I think Eliezer has too much credence on doom.To briefly hop in and say something that may be useful: I had a reaction pretty similar to what Eliezer commented, and I don’t see continuity or “Things will be weird before getting extremely weird” as a crux. (I don’t know why you think he does, and don’t know what he thinks, but would guess he doesn’t think it’s a crux either)
I’ve been part or read enough debates with Eliezer to have some guesses how the argument would go, so I made the move of skipping several steps of double-crux to the area where I suspect actual cruxes lie.
I think exploring the whole debate-tree or argument map would be quite long, so I’ll just try to gesture at how some of these things are connected, in my map.
- pivotal acts vs. pivotal processes
—my take is people’s stance on feasibility of pivotal acts vs. processes partially depends on continuity assumptions—what do you believe about pivotal acts?
- assuming continuity, do you expect existing non-human agents to move important parts of their cognition to AI substrates?
-- if yes, do you expect large-scale regulations around that?
--- if yes, will it be also partially automated?
- different route: assuming continuity, do you expect a lot of alignment work to be done partially by AI systems, inside places like OpenAI?
-- if at the same time this is a huge topic for the whole society, academia and politics, would you expect the rest of the world not trying to influence this?
- different route: assuming continuity, do you expect a lot of “how different entities in the world coordinate” to be done partially by AI systems?
-- if yes, do you assume technical features of the system matter? like, eg., multi-agent deliberation dynamics?
- assuming the world notices AI safety as problem (it did much more since writing this post)
-- do you expect large amount of attention and resources of academia and industry will be spent on AI alignment?
--- would you expect this will be somehow related to the technical problems and how we understand them?
--- eg do you think it makes no difference to the technical problem if 300 or 30k people work on it?
---- if it makes a difference, does it make a difference how is the attention allocated?
Not sure if the doublecrux between us would rest on the same cruxes, but I’m happy to try :)