I don’t see how the cluster argument resolves the circularity problem.
The circularity problem, as I see it, is that your definition of an abstraction shouldn’t be dependent on already having the abstraction. I.e., if the only way to define the abstraction “dog” involves you already knowing the abstraction “dog” well enough to create the set of all dogs, then probably you’re missing some of the explanation for abstraction. But the clusters in thingspace argument also depends on having an abstraction—knowing to look for genomes, or fur, or bark, is dependent on us already understanding what dogs are like. After all, there are nearly infinite “axes” one could look at, but we already know to only consider some of them. In other words, it seems like this has just passed the buck from choice of object to choice of properties, but you’re still making that choice based on the abstraction.
The fact that choice of axis—from among the axes we already know to be relevant—is stable (i.e., creates the same clusterings) feels like a central and interesting point about abstractions. But it doesn’t seem like it resolves the circularity problem.
(In retrospect the rest of this comment is thinking-out-loud for myself, mostly :p but you might find it interesting nonetheless).
I think it’s hard to completely escape this problem—we need to use some of our own concepts when understanding the territory, as we can’t see it directly—but I do think it’s possible to get a bit more objective than this. E.g., I consider thermodynamics/stat mech to be pretty centrally about abstractions, but it does so in a way that feels more “territory first,” if that makes any sense. Like, it doesn’t start with the conclusion. It started with the observation that “heat moves stuff” and “what’s up with that” and then eventually landed with an analysis of entropy involving macrostates. Somehow that progression feels more natural to me than starting with “dogs are things” and working backwards. E.g., I think I’m wanting something more like “if we understand these basic facts about the world, we can talk about dogs” rather than “if we start with dogs, we can talk sensibly about dogs.”
To be clear, I consider some of your work to be addressing this. E.g., I think the telephone theorem is a pretty important step in this direction. Much of the stuff about redundancy and modularity feels pretty tip-of-the-tongue onto something important, to me. But, at the very least, my goal with understanding abstractions is something like “how do we understand the world such that abstractions are natural kinds”? How do we find the joints such that, conditioning on those, there isn’t much room to vary? What are those joints like? The reason I like the telephone theorem is that it gives me one such handle: all else equal, information will dissipate quickly—anytime you see information persisting, it’s evidence of abstraction.
My own sense is that answering this question will have a lot more to do with how useful abstractions are, rather than how predictive/descriptive they are, which are related questions, but not quite the same. E.g., with the gears example you use to illustrate redundancy, I think the fact that we can predict almost everything about the gear from understanding a part of it is the same reason why the gear is useful. You don’t have to manipulate every atom in the gear to get it to move, you only have to press down on one of the… spokes(?), and the entire thing will turn. These are related properties. But they are not the same. E.g., you can think about the word “stop” as an abstraction in the sense that many sound waves map to the same “concept,” but that’s not very related to why the sound wave is so useful. It’s useful because it fits into the structure of the world: other minds will do things in response to it.
I want better ways to talk about how agents get work out of their environments by leveraging abstractions. I think this is the reason we ultimately care about them ourselves; and why AI will too. I also think it’s a big part of how we should be defining them—that the natural joint is less “what are the aggregate statistics of this set” but more “what does having this information allow us to do”?
Sounds like I’ve maybe not communicated the thing about circularity. I’ll try again, it would be useful to let me know whether or not this new explanation matches what you were already picturing from the previous one.
Let’s think about circular definitions in terms of equations for a moment. We’ll have two equations: one which “defines” x in terms of y, and one which “defines” y in terms of x:
x=f(y)
y=g(x)
Now, ifg=f−1, then (I claim) that’s what we normally think of as a “circular definition”. It’s “pretending” to fully specify x and y, but in fact it doesn’t, because one of the two equations is just a copy of the other equation but written differently. The practical problem, in this case, is that x and y are very underspecified by the supposed joint “definition”.
But now suppose g is notf−1, and more generally the equations are not degenerate. Then our two equations are typically totally fine and useful, and indeed we use equations like this all the time in the sciences and they work great. Even though they’re written in a “circular” way, they’re substantively non-circular. (They might still allow for multiple solutions, but the solutions will typically at least be locally unique, so there’s a discrete and typically relatively small set of solutions.)
That’s the sort of thing which clustering algorithms do: they have some equations “defining” cluster-membership in terms of the data points and cluster parameters, and equations “defining” the cluster parameters in terms of the data points and the cluster-membership:
cluster_membership = f(data, cluster_params)
cluster_params = g(data, cluster_membership)
… where f and g are different (i.e. non-degenerate; g is not just f−1 with data held constant). Together, these “definitions” specify a discrete and typically relatively small set of candidate (cluster_membership, cluster_params) values given some data.
That, I claim, is also part of what’s going on with abstractions like “dog”.
(Now, choice of axes is still a separate degree of freedom which has to be handled somehow. And that’s where I expect the robustness to choice of axes does load-bearing work. As you say, that’s separate from the circularity issue.)
I don’t see how the cluster argument resolves the circularity problem.
The circularity problem, as I see it, is that your definition of an abstraction shouldn’t be dependent on already having the abstraction. I.e., if the only way to define the abstraction “dog” involves you already knowing the abstraction “dog” well enough to create the set of all dogs, then probably you’re missing some of the explanation for abstraction. But the clusters in thingspace argument also depends on having an abstraction—knowing to look for genomes, or fur, or bark, is dependent on us already understanding what dogs are like. After all, there are nearly infinite “axes” one could look at, but we already know to only consider some of them. In other words, it seems like this has just passed the buck from choice of object to choice of properties, but you’re still making that choice based on the abstraction.
The fact that choice of axis—from among the axes we already know to be relevant—is stable (i.e., creates the same clusterings) feels like a central and interesting point about abstractions. But it doesn’t seem like it resolves the circularity problem.
(In retrospect the rest of this comment is thinking-out-loud for myself, mostly :p but you might find it interesting nonetheless).
I think it’s hard to completely escape this problem—we need to use some of our own concepts when understanding the territory, as we can’t see it directly—but I do think it’s possible to get a bit more objective than this. E.g., I consider thermodynamics/stat mech to be pretty centrally about abstractions, but it does so in a way that feels more “territory first,” if that makes any sense. Like, it doesn’t start with the conclusion. It started with the observation that “heat moves stuff” and “what’s up with that” and then eventually landed with an analysis of entropy involving macrostates. Somehow that progression feels more natural to me than starting with “dogs are things” and working backwards. E.g., I think I’m wanting something more like “if we understand these basic facts about the world, we can talk about dogs” rather than “if we start with dogs, we can talk sensibly about dogs.”
To be clear, I consider some of your work to be addressing this. E.g., I think the telephone theorem is a pretty important step in this direction. Much of the stuff about redundancy and modularity feels pretty tip-of-the-tongue onto something important, to me. But, at the very least, my goal with understanding abstractions is something like “how do we understand the world such that abstractions are natural kinds”? How do we find the joints such that, conditioning on those, there isn’t much room to vary? What are those joints like? The reason I like the telephone theorem is that it gives me one such handle: all else equal, information will dissipate quickly—anytime you see information persisting, it’s evidence of abstraction.
My own sense is that answering this question will have a lot more to do with how useful abstractions are, rather than how predictive/descriptive they are, which are related questions, but not quite the same. E.g., with the gears example you use to illustrate redundancy, I think the fact that we can predict almost everything about the gear from understanding a part of it is the same reason why the gear is useful. You don’t have to manipulate every atom in the gear to get it to move, you only have to press down on one of the… spokes(?), and the entire thing will turn. These are related properties. But they are not the same. E.g., you can think about the word “stop” as an abstraction in the sense that many sound waves map to the same “concept,” but that’s not very related to why the sound wave is so useful. It’s useful because it fits into the structure of the world: other minds will do things in response to it.
I want better ways to talk about how agents get work out of their environments by leveraging abstractions. I think this is the reason we ultimately care about them ourselves; and why AI will too. I also think it’s a big part of how we should be defining them—that the natural joint is less “what are the aggregate statistics of this set” but more “what does having this information allow us to do”?
Sounds like I’ve maybe not communicated the thing about circularity. I’ll try again, it would be useful to let me know whether or not this new explanation matches what you were already picturing from the previous one.
Let’s think about circular definitions in terms of equations for a moment. We’ll have two equations: one which “defines” x in terms of y, and one which “defines” y in terms of x:
x=f(y)
y=g(x)
Now, if g=f−1, then (I claim) that’s what we normally think of as a “circular definition”. It’s “pretending” to fully specify x and y, but in fact it doesn’t, because one of the two equations is just a copy of the other equation but written differently. The practical problem, in this case, is that x and y are very underspecified by the supposed joint “definition”.
But now suppose g is not f−1, and more generally the equations are not degenerate. Then our two equations are typically totally fine and useful, and indeed we use equations like this all the time in the sciences and they work great. Even though they’re written in a “circular” way, they’re substantively non-circular. (They might still allow for multiple solutions, but the solutions will typically at least be locally unique, so there’s a discrete and typically relatively small set of solutions.)
That’s the sort of thing which clustering algorithms do: they have some equations “defining” cluster-membership in terms of the data points and cluster parameters, and equations “defining” the cluster parameters in terms of the data points and the cluster-membership:
cluster_membership = f(data, cluster_params)
cluster_params = g(data, cluster_membership)
… where f and g are different (i.e. non-degenerate; g is not just f−1 with data held constant). Together, these “definitions” specify a discrete and typically relatively small set of candidate (cluster_membership, cluster_params) values given some data.
That, I claim, is also part of what’s going on with abstractions like “dog”.
(Now, choice of axes is still a separate degree of freedom which has to be handled somehow. And that’s where I expect the robustness to choice of axes does load-bearing work. As you say, that’s separate from the circularity issue.)