Here’s my guess as to how the universality hypothesis a.k.a. natural abstractions will turn out. (This is not written to be particularly understandable.)
At the very “bottom”, or perceptual level of the conceptual hierarchy, there will be a pretty straight-forward objective set of concept. Think the first layer of CNNs in image processing, the neurons in the retina/V1, letter frequencies, how to break text strings into words. There’s some parameterization here, but the functional form will be clear (like having a basis of n vectors in R^n, but it (almost) doesn’t matter which vectors).
For a few levels above that, it’s much less clear to me that the concepts will be objective. Curve detectors may be universal, but the way they get combined is less obviously objective to me.
This continues until we get to a middle level that I’d call “objects”. I think it’s clear that things like cats and trees are objective concepts. Sufficiently good language models will all share concepts that correspond to a bunch of words. This level is very much due to the part where we live in this universe, which tends to create objects, and on earth, which has a biosphere with a bunch of mid-level complexity going on.
Then there will be another series of layers that are less obvious. Partly these levels are filled with whatever content is relevant to the system. If you study cats a lot then there is a bunch of objectively discernible cat behavior. But it’s not necessary to know that to operate in the world competently. Rivers and waterfalls will be a level 3 concept, but the details of fluid dynamics are in this level.
Somewhere around the top level of the conceptual hierarchy, I think there will be kind of a weird split. Some of the concepts up here will be profoundly objective; things like “and”, mathematics, and the abstract concept of “object”. Absolutely every competent system will have these. But then there will also be this other set of concepts that I would map onto “philosophy” or “worldview”. Humans demonstrate that you can have vastly different versions of these very high-level concepts, given very similar data, each of which is in some sense a functional local optimum. If this also holds for AIs, then that seems very tricky.
Actually my guess is that there is also a basically objective top-level of the conceptual hierarchy. Humans are capable of figuring it out but most of them get it wrong. So sufficiently advanced AIs will converge on this, but it may be hard to interact with humans about it. Also, some humans’ values may be defined in terms of their incorrect worldviews, leading to ontological crises with what the AIs are trying to do.
Here’s my guess as to how the universality hypothesis a.k.a. natural abstractions will turn out. (This is not written to be particularly understandable.)
At the very “bottom”, or perceptual level of the conceptual hierarchy, there will be a pretty straight-forward objective set of concept. Think the first layer of CNNs in image processing, the neurons in the retina/V1, letter frequencies, how to break text strings into words. There’s some parameterization here, but the functional form will be clear (like having a basis of n vectors in R^n, but it (almost) doesn’t matter which vectors).
For a few levels above that, it’s much less clear to me that the concepts will be objective. Curve detectors may be universal, but the way they get combined is less obviously objective to me.
This continues until we get to a middle level that I’d call “objects”. I think it’s clear that things like cats and trees are objective concepts. Sufficiently good language models will all share concepts that correspond to a bunch of words. This level is very much due to the part where we live in this universe, which tends to create objects, and on earth, which has a biosphere with a bunch of mid-level complexity going on.
Then there will be another series of layers that are less obvious. Partly these levels are filled with whatever content is relevant to the system. If you study cats a lot then there is a bunch of objectively discernible cat behavior. But it’s not necessary to know that to operate in the world competently. Rivers and waterfalls will be a level 3 concept, but the details of fluid dynamics are in this level.
Somewhere around the top level of the conceptual hierarchy, I think there will be kind of a weird split. Some of the concepts up here will be profoundly objective; things like “and”, mathematics, and the abstract concept of “object”. Absolutely every competent system will have these. But then there will also be this other set of concepts that I would map onto “philosophy” or “worldview”. Humans demonstrate that you can have vastly different versions of these very high-level concepts, given very similar data, each of which is in some sense a functional local optimum. If this also holds for AIs, then that seems very tricky.
Actually my guess is that there is also a basically objective top-level of the conceptual hierarchy. Humans are capable of figuring it out but most of them get it wrong. So sufficiently advanced AIs will converge on this, but it may be hard to interact with humans about it. Also, some humans’ values may be defined in terms of their incorrect worldviews, leading to ontological crises with what the AIs are trying to do.