I appreciate the clarity of the pixel game as a concrete thought experiment. Its clarity makes it easier for me to see where I disagree with your understanding of the Natural Abstraction Hypothesis.
The Natural Abstraction Hypothesis is about the abstractions available in Nature, that is to say, the environment. So we have to decide where to draw the boundary around Nature. Options:
Nature is just the pixel game itself (Cartesian)
Nature is the pixel game and the agent(s) flipping pixels (Embedded)
Nature if the pixel game and the utility function(s) but not the decision algorithms (Hybrid)
In the Cartesian frame, none of “top half”, “bottom half”, “outer rim”, and “middle square” are all Unnatural Abstractions, because they’re not in Nature, they’re in the utility functions.
In the Hybrid and Embedded frames, when System A is playing the game, then “top half” and “bottom half” are Natural Abstractions, but “outer rim” and “middle square” are not. The opposite is true when System B is playing the game.
Let’s make this a multi-player game, and have both systems playing on the same board. In that case all of “top half”, “bottom half”, “outer rim”, and “middle square” are Natural Abstractions. We expect system A to learn “outer rim” and “middle square” as it needs to predict the actions of system B, at least given sufficient learning capabilities. I think this is a clean counter-example to your claim:
Two systems require similar utility functions in order to converge on similar abstractions.
Let’s expand on this line of argument and look at your example of bee waggle-dances. You question whether the abstractions represented by the various dances are natural. I agree! Using a Cartesian frame that treats bees and humans as separate agents, not part of Nature, they are not Natural Abstractions. With an Embedded frame they are a Natural Abstraction for anyone seeking to understand bees, but in a trivial way. As you say, “one of the systems explicitly values and works towards understanding the abstractions the other system is using”.
Also, the meter is not a natural abstraction, which we can see by observing other cultures using yards, cubits, and stadia. If we re-ran cultural evolution, we’d expect to see different measurements of distance chosen. The Natural Abstraction isn’t the meter, it’s Distance. Related concepts like relative distance are also Natural Abstractions. If we re-ran cultural evolution, we would still think that trees are taller than grass.
I’m not a bee expect, but Wikipedia says:
In the case of Apis mellifera ligustica, the round dance is performed until the resource is about 10 meters away from the hive, transitional dances are performed when the resource is at a distance of 20 to 30 meters away from the hive, and finally, when it is located at distances greater than 40 meters from the hive, the waggle dance is performed
The dance doesn’t actually mean “greater than 40 meters”, because bees don’t use the metric system. There is some distance, the Waggle Distance, where bees switch from a transitional dance to a waggle dance. Claude says, with low confidence, that the Waggle Distance varies based on energy expenditure. In strong winds, the Waggle Distance goes down.
Humans also have ways of communicating energy expenditure or effort. I don’t know enough about bees or humans to know if there is a shared abstraction of Effort here. It may be that the Waggle Distance is bee-specific. And that’s an important limitation on the NAH, it says, as you quote, “there exist abstractions which are natural”, but I think we should also believe the Artificial Abstraction Hypothesis that says that there exist abstractions which are not natural.
The AI does not think like you do, the AI doesn’t have thoughts built up from the same concepts you use, it is utterly alien on a staggering scale. Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien—nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
But then in a comment on that post he appears to partially endorse the NAH:
I think that the AI’s internal ontology is liable to have some noticeable alignments to human ontology w/r/t the purely predictive aspects of the natural world; it wouldn’t surprise me to find distinct thoughts in there about electrons.
But also endorses the AAH:
As the internal ontology takes on any reflective aspects, parts of the representation that mix with facts about the AI’s internals, I expect to find much larger differences—not just that the AI has a different concept boundary around “easy to understand”, say, but that it maybe doesn’t have any such internal notion as “easy to understand” at all, because easiness isn’t in the environment and the AI doesn’t have any such thing as “effort”.
I appreciate the clarity of the pixel game as a concrete thought experiment. Its clarity makes it easier for me to see where I disagree with your understanding of the Natural Abstraction Hypothesis.
The Natural Abstraction Hypothesis is about the abstractions available in Nature, that is to say, the environment. So we have to decide where to draw the boundary around Nature. Options:
Nature is just the pixel game itself (Cartesian)
Nature is the pixel game and the agent(s) flipping pixels (Embedded)
Nature if the pixel game and the utility function(s) but not the decision algorithms (Hybrid)
In the Cartesian frame, none of “top half”, “bottom half”, “outer rim”, and “middle square” are all Unnatural Abstractions, because they’re not in Nature, they’re in the utility functions.
In the Hybrid and Embedded frames, when System A is playing the game, then “top half” and “bottom half” are Natural Abstractions, but “outer rim” and “middle square” are not. The opposite is true when System B is playing the game.
Let’s make this a multi-player game, and have both systems playing on the same board. In that case all of “top half”, “bottom half”, “outer rim”, and “middle square” are Natural Abstractions. We expect system A to learn “outer rim” and “middle square” as it needs to predict the actions of system B, at least given sufficient learning capabilities. I think this is a clean counter-example to your claim:
Let’s expand on this line of argument and look at your example of bee waggle-dances. You question whether the abstractions represented by the various dances are natural. I agree! Using a Cartesian frame that treats bees and humans as separate agents, not part of Nature, they are not Natural Abstractions. With an Embedded frame they are a Natural Abstraction for anyone seeking to understand bees, but in a trivial way. As you say, “one of the systems explicitly values and works towards understanding the abstractions the other system is using”.
Also, the meter is not a natural abstraction, which we can see by observing other cultures using yards, cubits, and stadia. If we re-ran cultural evolution, we’d expect to see different measurements of distance chosen. The Natural Abstraction isn’t the meter, it’s Distance. Related concepts like relative distance are also Natural Abstractions. If we re-ran cultural evolution, we would still think that trees are taller than grass.
I’m not a bee expect, but Wikipedia says:
The dance doesn’t actually mean “greater than 40 meters”, because bees don’t use the metric system. There is some distance, the Waggle Distance, where bees switch from a transitional dance to a waggle dance. Claude says, with low confidence, that the Waggle Distance varies based on energy expenditure. In strong winds, the Waggle Distance goes down.
Humans also have ways of communicating energy expenditure or effort. I don’t know enough about bees or humans to know if there is a shared abstraction of Effort here. It may be that the Waggle Distance is bee-specific. And that’s an important limitation on the NAH, it says, as you quote, “there exist abstractions which are natural”, but I think we should also believe the Artificial Abstraction Hypothesis that says that there exist abstractions which are not natural.
This confusion is on display in the discussion around My AI Model Delta Compared To Yudkowsky, where Yudkowsky is quoted as apparently rejecting the NAH:
But then in a comment on that post he appears to partially endorse the NAH:
But also endorses the AAH: