Thanks for taking the time to explain this. This is a clears a lot of things up.
Let me see if I understand. So one reason that an agent might develop an abstraction is that it has a utility function that deals with that abstraction (if my utility function is ‘maximize the number of trees’, its helpful to have an abstraction for ‘trees’). But the NAH goes further than this and says that, even if an agent had a very ‘unnatural’ utility function which didn’t deal with abstractions (eg. it was something very fine-grained like ‘I value this atom being in this exact position and this atom being in a different position etc…’) it would still, for instrumental reasons, end up using the ‘natural’ set of abstractions because the natural abstractions are in some sense the only ‘proper’ set of abstractions for interacting with the world. Similarly, while there might be perceptual systems/brains/etc which favour using certain unnatural abstractions, once agents become capable enough to start pursuing complex goals (or rather goals requiring a high level of generality), the universe will force them to use the natural abstractions (or else fail to achieve their goals). Does this sound right?
Presumably its possible to define some ‘unnatural’ abstractions. Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?
… the universe will force them to use the natural abstractions (or else fail to achieve their goals). [...] Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?
It’s not quite that it’s impossible to model the world without the use of natural abstractions. Rather, it’s far instrumentally “cheaper” to use the natural abstractions (in some sense). Rather than routing through natural abstractions, a system with a highly capable world model could instead e.g. use exponentially large amounts of compute (e.g. doing full quantum-level simulation), or might need enormous amounts of data (e.g. exponentially many training cycles), or both. So we expect to see basically-all highly capable systems use natural abstractions in practice.
I’m assuming “natural abstraction” is also a scalar property. Reading this paragraph, I refactored the concept in my mind to “some abstractions tend to be cheaper to abstract than others. agents will converge to using cheaper abstractions. Many cheapness properties generalize reasonably well across agents/observation-systems/environments, but, all of those could in theory come apart.”
And the Strong NAH would be “cheap-to-abstract-ness will be very punctuated, or something” (i.e. you might expect less of a smooth gradient of cheapnesses across abstractions)
The way I think of it, it’s not quite that some abstractions are cheaper to use than others, but rather:
One can in-principle reason at the “low(er) level”, i.e. just not use any given abstraction. That reasoning is correct but costly.
One can also just be wrong, e.g. use an abstraction which doesn’t actually match the world and/or one’s own lower level model. Then predictions will be wrong, actions will be suboptimal, etc.
Reasoning which is both cheap and correct routes through natural abstractions. There’s some degrees of freedom insofar as a given system could use some natural abstractions but not others, or be wrong about some things but not others.
Got it, that makes sense. I think I was trying to get at something like this when I was talking about constraints/selection pressure (a system has less need to use abstractions if its compute is unconstrained or there is no selection pressure in the ‘produce short/quick programs’ direction) but your explanation makes this clearer. Thanks again for clearing this up!
Thanks for taking the time to explain this. This is a clears a lot of things up.
Let me see if I understand. So one reason that an agent might develop an abstraction is that it has a utility function that deals with that abstraction (if my utility function is ‘maximize the number of trees’, its helpful to have an abstraction for ‘trees’). But the NAH goes further than this and says that, even if an agent had a very ‘unnatural’ utility function which didn’t deal with abstractions (eg. it was something very fine-grained like ‘I value this atom being in this exact position and this atom being in a different position etc…’) it would still, for instrumental reasons, end up using the ‘natural’ set of abstractions because the natural abstractions are in some sense the only ‘proper’ set of abstractions for interacting with the world. Similarly, while there might be perceptual systems/brains/etc which favour using certain unnatural abstractions, once agents become capable enough to start pursuing complex goals (or rather goals requiring a high level of generality), the universe will force them to use the natural abstractions (or else fail to achieve their goals). Does this sound right?
Presumably its possible to define some ‘unnatural’ abstractions. Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?
All dead-on up until this:
It’s not quite that it’s impossible to model the world without the use of natural abstractions. Rather, it’s far instrumentally “cheaper” to use the natural abstractions (in some sense). Rather than routing through natural abstractions, a system with a highly capable world model could instead e.g. use exponentially large amounts of compute (e.g. doing full quantum-level simulation), or might need enormous amounts of data (e.g. exponentially many training cycles), or both. So we expect to see basically-all highly capable systems use natural abstractions in practice.
I’m assuming “natural abstraction” is also a scalar property. Reading this paragraph, I refactored the concept in my mind to “some abstractions tend to be cheaper to abstract than others. agents will converge to using cheaper abstractions. Many cheapness properties generalize reasonably well across agents/observation-systems/environments, but, all of those could in theory come apart.”
And the Strong NAH would be “cheap-to-abstract-ness will be very punctuated, or something” (i.e. you might expect less of a smooth gradient of cheapnesses across abstractions)
The way I think of it, it’s not quite that some abstractions are cheaper to use than others, but rather:
One can in-principle reason at the “low(er) level”, i.e. just not use any given abstraction. That reasoning is correct but costly.
One can also just be wrong, e.g. use an abstraction which doesn’t actually match the world and/or one’s own lower level model. Then predictions will be wrong, actions will be suboptimal, etc.
Reasoning which is both cheap and correct routes through natural abstractions. There’s some degrees of freedom insofar as a given system could use some natural abstractions but not others, or be wrong about some things but not others.
Got it, that makes sense. I think I was trying to get at something like this when I was talking about constraints/selection pressure (a system has less need to use abstractions if its compute is unconstrained or there is no selection pressure in the ‘produce short/quick programs’ direction) but your explanation makes this clearer. Thanks again for clearing this up!