I think you’re confusing behavior with implementation.
I’m definitely not treating these as interchangeable—my argument is about how, in a certain set of cases, they are importantly not interchangeable.
Specifically, I’m arguing that certain characterizations of ideal behavior cannot help us explain why any given implementation approximates that behavior well or poorly.
I don’t understand how the rest of your points engage with my argument. Yes, there is a good reason Solomonoff does a weighted average and not an argmax; I don’t see how this affects my argument one way or the other. Yes, fully general theories can be valuable even when they’re not practical to apply directly to real problems; I was arguing that a specific type of fully general theory lacks a specific type of practical value, one which people sometimes expect that type of theory to have.
I was arguing that a specific type of fully general theory lacks a specific type of practical value
In that case, your argument lacks value in its own right because it is vague and confusing. I don’t know any theories that fall in the “specific type” of general theory you tried to describe. You used Solomonoff as an example when it doesn’t match your description.
one which people sometimes expect that type of theory to have.
When someone develops a formalization, they have to explicitly state its context and any assumptions. If someone expects to use Kolmogorov complexity theory to write the next hit game, they’re going to have a bad time. That’s not Kolmogorov’s fault.
I’m arguing that certain characterizations of ideal behavior cannot help us explain why any given implementation approximates that behavior well or poorly.
Of course it can. It provides a different way of constructing a solution. You can start with an ideal then add assumptions that allow you to arrive at a more practicable implementation.
For instance, in computer vision; determining how a depth camera is moving in a scene is very difficult if you use an ideal formalization directly, but if you assume that the differences between two point-clouds are due primarily to affine transformations, then you can use the computationally cheap iterative-closest-point method based on Procrustes analysis to approximate the formal solution. Then, when you observe anomalous behavior, your usual suspects will be the list of assumptions you made to render the problem tractable. Are there non-affine transformations dominating the deltas between point clouds? Maybe that’s causing my computer vision system to glitch. Maybe I need some way to detect such situations and/or some sort of fall-back.
Not only that, but there are many other reasons to formalize ideas like intelligence other than to guide the practical implementation of intelligent systems. You can explore the concept of intelligence and its bounds.
Again if you understand a tool for what it is, there’s no problem. Of-course trying to use a purely formalized theory directly to solve real-world problems is going to yield confusing results. Trying to engineer a bridge using the standard model of particle physics is going to be just as difficult. It’s not a fault of the theory, nor does it mean studying the theory is pointless. The problem is that you want it to be something it’s not.
I don’t understand how the rest of your points engage with my argument.
It’s hard to engage much with your argument because it’s made up of vague straw men:
Most “theories of rational belief” I have encountered
I have no solid context to engage you about. If you’re talking about AIXI, then you’ve misunderstood AIXI because it isn’t about choosing strategies out of a set of all strategies. In-fact, you’ve got Solomonoff Inductive inference completely wrong too:
For example, in Solomonoff, S is defined by computability while R is allowed to be uncomputable.
Solomonoff inductive inference is defined in the context of an agent observing an environment. That’s all. It doesn’t take actions. It just observes and predicts. There is no set of strategies. There is no rule for selecting a strategy, and given your definition of S and R:
We have some class of “practically achievable” strategies S, which can actually be implemented by agents. We note that an agent’s observations provide some information about the quality of different strategies s∈S. So if it were possible to follow a rule like R≡ “find the best s∈S given your observations, and then follow that s,” this rule would spit out very good agent behavior.
It doesn’t even make sense that R would be incomputable given that S is computable.
When you say:
Concretely, this talk of approximations is like saying that a very successful chess player “approximates” the rule “consult all possible chess players, then weight their moves by past performance.” Yes, the skilled player will play similarly to this rule, but they are not following it, not even approximately! They are only themselves, not any other player.
On what grounds do you even justify the claim that the chess player’s behavior is “not even approximately” following the rule of “consult all possible chess players, then weight their moves by past performance.”?
Actually, what vanilla AIXI would prescribe is a full tree traversal similar to the min-max algorithm. Which is, of-course; impractical. However, there are things you can do to approximate a full tree traversal more practically. You can build approximate models based on experience like “given the state of the board, what moves should I consider” which prunes the width of the tree, and “given the state of the board, how likely am I to win” which limits the depth of the tree. So instead of considering every possible move at every possible step of the game to every possible conclusion, you only consider 3-4 possible moves per step and only maybe 4-5 steps into the future. Maybe diminishing the number of moves per step.
Yes, there is a good reason Solomonoff does a weighted average and not an argmax
Did you edit your original comment? Because I could have sworn you said more disparaging the use of “arbitrary” weights. At any rate, it’s not a “performance-weighted average” as it isn’t about performance. It’s about uncertainty.
I’m definitely not treating these as interchangeable—my argument is about how, in a certain set of cases, they are importantly not interchangeable.
Specifically, I’m arguing that certain characterizations of ideal behavior cannot help us explain why any given implementation approximates that behavior well or poorly.
I don’t understand how the rest of your points engage with my argument. Yes, there is a good reason Solomonoff does a weighted average and not an argmax; I don’t see how this affects my argument one way or the other. Yes, fully general theories can be valuable even when they’re not practical to apply directly to real problems; I was arguing that a specific type of fully general theory lacks a specific type of practical value, one which people sometimes expect that type of theory to have.
In that case, your argument lacks value in its own right because it is vague and confusing. I don’t know any theories that fall in the “specific type” of general theory you tried to describe. You used Solomonoff as an example when it doesn’t match your description.
When someone develops a formalization, they have to explicitly state its context and any assumptions. If someone expects to use Kolmogorov complexity theory to write the next hit game, they’re going to have a bad time. That’s not Kolmogorov’s fault.
Of course it can. It provides a different way of constructing a solution. You can start with an ideal then add assumptions that allow you to arrive at a more practicable implementation.
For instance, in computer vision; determining how a depth camera is moving in a scene is very difficult if you use an ideal formalization directly, but if you assume that the differences between two point-clouds are due primarily to affine transformations, then you can use the computationally cheap iterative-closest-point method based on Procrustes analysis to approximate the formal solution. Then, when you observe anomalous behavior, your usual suspects will be the list of assumptions you made to render the problem tractable. Are there non-affine transformations dominating the deltas between point clouds? Maybe that’s causing my computer vision system to glitch. Maybe I need some way to detect such situations and/or some sort of fall-back.
Not only that, but there are many other reasons to formalize ideas like intelligence other than to guide the practical implementation of intelligent systems. You can explore the concept of intelligence and its bounds.
Again if you understand a tool for what it is, there’s no problem. Of-course trying to use a purely formalized theory directly to solve real-world problems is going to yield confusing results. Trying to engineer a bridge using the standard model of particle physics is going to be just as difficult. It’s not a fault of the theory, nor does it mean studying the theory is pointless. The problem is that you want it to be something it’s not.
It’s hard to engage much with your argument because it’s made up of vague straw men:
I have no solid context to engage you about. If you’re talking about AIXI, then you’ve misunderstood AIXI because it isn’t about choosing strategies out of a set of all strategies. In-fact, you’ve got Solomonoff Inductive inference completely wrong too:
Solomonoff inductive inference is defined in the context of an agent observing an environment. That’s all. It doesn’t take actions. It just observes and predicts. There is no set of strategies. There is no rule for selecting a strategy, and given your definition of S and R:
It doesn’t even make sense that R would be incomputable given that S is computable.
When you say:
On what grounds do you even justify the claim that the chess player’s behavior is “not even approximately” following the rule of “consult all possible chess players, then weight their moves by past performance.”?
Actually, what vanilla AIXI would prescribe is a full tree traversal similar to the min-max algorithm. Which is, of-course; impractical. However, there are things you can do to approximate a full tree traversal more practically. You can build approximate models based on experience like “given the state of the board, what moves should I consider” which prunes the width of the tree, and “given the state of the board, how likely am I to win” which limits the depth of the tree. So instead of considering every possible move at every possible step of the game to every possible conclusion, you only consider 3-4 possible moves per step and only maybe 4-5 steps into the future. Maybe diminishing the number of moves per step.
Did you edit your original comment? Because I could have sworn you said more disparaging the use of “arbitrary” weights. At any rate, it’s not a “performance-weighted average” as it isn’t about performance. It’s about uncertainty.