No, the way I used the term was to point to robust abstractions to ontological concepts. Here’s an example: Say 1+1=A. A here obviously means 2 in our language, but it doesn’t change what A represents, ontologically. If A+1=4, then you have broken math, and that results in you being less capable in your reasoning and being “dutch booked”. Your world model is then incorrect, and it is very unlikely that any ontological shift will result in such a break in world model capabilities.
Math is a robust abstraction. “Natural abstractions”, as I use the term, points to abstractions for objects in the real world that share the same level of robustness to ontological shifts, such that as an AI gets better and better at modelling the world, its ontology tends more towards representing the objects in question with these abstractions.
Meaning that even* if* AGI could internally define a goal robustly with respect to natural abstractions, AGI cannot conceptually contain within their modelling of natural abstractions all but a tiny portion of the (side-)effects propagating through the environment – as a result of all the interactions of the machinery’s functional components with connected physical surroundings.
That seems like a claim about the capabilities of arbitrarily powerful AI systems, one that relies on chaos theory or complex systems theory. I share your sentiment but doubt that things such as successor AI alignment will be difficult for ASIs.
I agree that natural abstractions would tend to get selected for in the agents that continue to exist and gain/uphold power to make changes in the world. Including because of Dutch-booking of incoherent preferences, because of instrumental convergence, and because relatively poorly functioning agents get selected out of the population.
However, those natural abstractions are still leaky in a sense similar to how platonic concepts are leaky abstractions. The natural abstraction of a circle does not map precisely to the actual physical shape of eg. a wheel identified to exist in the outside world.
In this sense, whatever natural abstractions AGI would use that allow the learning machinery to compress observations of actual physical instantiations of matter or energetic interactions in their modelling of the outside world, those natural abstractions would still fail to capture all the long-term-relevant features in the outside world.
This point I’m sure is obvious to you. But it bears repeating.
That seems like a claim about the capabilities of arbitrarily powerful AI systems,
Yes, or more specifically: about fundamental limits of any AI system to control how its (side)-effects propagate and feed back over time.
one that relies on chaos theory or complex systems theory.
Pretty much. Where “complex” refers to both internal algorithmic complexity (NP-computation branches, etc) and physical functional complexity (distributed non-linear amplifying feedback, etc).
I share your sentiment but doubt that things but doubt that things such as successor AI alignment will be difficult for ASIs.
This is not an argument. Given that people here are assessing what to do about x-risks, they should not rely on you stating your “doubt that...alignment will be difficult”.
I doubt that you thought this through comprehensively enough, and that your reasoning addresses the fundamental limits to controllability I summarised in this post.
The burden of proof is on you to comprehensively clarify your reasoning, given that you are in effect claiming that extinction risks can be engineered away.
You’d need to clarify specifically why functional components iteratively learned/assembled within AGI could have long-term predictable effects in physical interactions with shifting connected surroundings of a more physically complex outside world.
I don’t mind whether that’s framed as “AGI redesigns a successor version of their physically instantiated components” or “AGI keeps persisting in some modified form”.
No, the way I used the term was to point to robust abstractions to ontological concepts. Here’s an example: Say 1+1=A. A here obviously means 2 in our language, but it doesn’t change what A represents, ontologically. If A+1=4, then you have broken math, and that results in you being less capable in your reasoning and being “dutch booked”. Your world model is then incorrect, and it is very unlikely that any ontological shift will result in such a break in world model capabilities.
Math is a robust abstraction. “Natural abstractions”, as I use the term, points to abstractions for objects in the real world that share the same level of robustness to ontological shifts, such that as an AI gets better and better at modelling the world, its ontology tends more towards representing the objects in question with these abstractions.
That seems like a claim about the capabilities of arbitrarily powerful AI systems, one that relies on chaos theory or complex systems theory. I share your sentiment but doubt that things such as successor AI alignment will be difficult for ASIs.
Thanks for the clear elaboration.
I agree that natural abstractions would tend to get selected for in the agents that continue to exist and gain/uphold power to make changes in the world. Including because of Dutch-booking of incoherent preferences, because of instrumental convergence, and because relatively poorly functioning agents get selected out of the population.
However, those natural abstractions are still leaky in a sense similar to how platonic concepts are leaky abstractions. The natural abstraction of a circle does not map precisely to the actual physical shape of eg. a wheel identified to exist in the outside world.
In this sense, whatever natural abstractions AGI would use that allow the learning machinery to compress observations of actual physical instantiations of matter or energetic interactions in their modelling of the outside world, those natural abstractions would still fail to capture all the long-term-relevant features in the outside world.
This point I’m sure is obvious to you. But it bears repeating.
Yes, or more specifically: about fundamental limits of any AI system to control how its (side)-effects propagate and feed back over time.
Pretty much. Where “complex” refers to both internal algorithmic complexity (NP-computation branches, etc) and physical functional complexity (distributed non-linear amplifying feedback, etc).
This is not an argument. Given that people here are assessing what to do about x-risks, they should not rely on you stating your “doubt that...alignment will be difficult”.
I doubt that you thought this through comprehensively enough, and that your reasoning addresses the fundamental limits to controllability I summarised in this post.
The burden of proof is on you to comprehensively clarify your reasoning, given that you are in effect claiming that extinction risks can be engineered away.
You’d need to clarify specifically why functional components iteratively learned/assembled within AGI could have long-term predictable effects in physical interactions with shifting connected surroundings of a more physically complex outside world.
I don’t mind whether that’s framed as “AGI redesigns a successor version of their physically instantiated components” or “AGI keeps persisting in some modified form”.