For my purposes, consequntalism is a consumer only of world models and does not produce them at all.
I mostly agree with that.
Not entirely, though—there are concepts called “intrinsic curiosity” where ML researchers use consequentialism to perform reinforcement learning towards reaching edge cases where the models break down, so that they can collect more data for their models. Similarly, expected utility maximization automatically gives you a drive to reduce uncertainty in variables that you expect to matter.
But overall I expect there will be a need to explicitly think about how to learn good world models.
The new improved model is done with reinforcement learning and not the consequentialism part. Even to the extent that a question of “How I act wrong?” is answered, what is wrong comes from the world-model. An actual inconsistency is not caught unless it is already concievable as one.
The EUM drive steers you to situations where your map is good, use the confident parts or the ones that are the least self-contradictory. It does not lead you to a better map, it does not take a bad variable and make it more representative.
The new improved model is done with reinforcement learning and not the consequentialism part.
Reinforcement learning is a form of consequentialism.
Even to the extent that a question of “How I act wrong?” is answered, what is wrong comes from the world-model. An actual inconsistency is not caught unless it is already concievable as one.
Instrinsic curiosity systems try to create ways to unsupervisedly make inconsistencies conceivable. For instance, they might train multiple models to predict the same thing, and then they can treat disagreements between the models as being inconsistencies.
The EUM drive steers you to situations where your map is good, use the confident parts or the ones that are the least self-contradictory. It does not lead you to a better map, it does not take a bad variable and make it more representative.
I don’t think you can say this all that absolutely. Lots of complex stuff can happen with EUM in complex environments.
How safe I am to say such things about complex environments?
So you use choosement to act and you never vary your comparison-value-generator but just feed it with new kinds of inputs as you encounter new situations. You are in a situation you have essentially been in before. You want to do better than last time (not be mad by expecting different results). This means the tie-breaker on what you will do needs to depend on details that make this situation different from the last one. Luckily the situation is only essentially rather than totally the same so those exist. If your set-in-stone comparison-value-generator picks out the correct inessential detail you do an action better than last time.
So wasn’t it important that the right inessential detail was just the way it was? But that kind of makes it important for control to have been so. So last time step you might have had some other essential choice to make, but you might have also had control over inessential details. So for the sake of the future if the future control detail is among them it is important to set it in the good position. But how can you manage that? The detail is ultimately (objectively) inessential, so you do not have any “proper” motive to pay any attention to it. Well you could be similarly superstisous to have random scratchings help you pick the right one.
All these external dependencies either require all to be unique or be conditional on some other detail also being present. Maybe you get lucky and at some point you can use a detail that is for unrelated reasons a good bet. A lot of your comparison-value-picker probably deals with essential details too. However if it only depends on essential details on some choice, you better make a choice you can repeat for eternity in those situations because there ain’t gonna be improvement.
So a EUM that has a comparison-value-generator that is a function of immediately causable world-state only, learns only to the extent it can use the environment as a map and let it think for it. That such “scratch-pads” would be only-for-this-purpose mindstates still keeps it so that a lot of essential stuff about the policy is not apparent in the comparison-value-generator. And you need at install time make a very detailed module and know even for the late parts that will be used late how they need to be or atleast that they are okay to be as they are (and okay to be in the interrim).
Or you could allow the agent to change comparison-value-generator at will. Then it only needs one superstition at each junction to jump to the next one. Correspondingly any inessential dependence can mean more or different inessential dependencies at future times. Install time is still going to be pretty rough but you only need find one needle, instead of 1 per timestamp. If you manage to verify each “phase” separately you do not need to worry about unused inessential dependencies to be carried over to the next phase. The essential parts can also exists only when they are actually needed.
Choosement is not part of my definition of consequentialism.
Searching based on consequences is part of it, and you are right that in the real world you would want to update your model based on new data you learn. In the EUM framework, these updates are captured by Bayesian conditioning. There are other frameworks which capture the updates in other ways, but the basic points are basically the same.
Linking to totalities of very long posts have downsides comparable to writing wall-of-text replies.
I understand how “searching” can fail to be choosement when it ends up being “solving algebraicly” without actually checking any values of the open variables.
Going from abstract descriptions to more and more concrete solutions is not coupled how many elementary ground-level-concrete solutions get disregarded so it can be fast. I thought part of the worryingness of “checks every option” is that it doesn’t get fooled by faulty (or non-existent) abstractions
So to me it is surprising that an agent that never considers alternative avenues gets under the umbrella “consequentialist”. So an agent that changes policy if it is in pain and keeps policy if it feels pleasure, “is consequentialist” based on that its policy was caused by life-events, even if the policy is pure reflex.
There were vibes also to the effect of “this gets me what I want” is a consequentialist stance because of appearance of “gets”. So
Well so you are right that a functioning consequentialist must either magically have perfect knowledge, or must have some way of observing and understanding the world to improve its knowledge. Since magic isn’t real, in reality for advanced capable agents it must be the latter.
In the EUM framework, the observations and improvements in understanding are captured by Bayesian updating. In different frameworks, it may be captured by different things.
“improve knowledge” here can be “its cognition is more fit to the environment”. Somebody could understand “represent the environment more” which it does not need to be.
With such wide understanding it start to liken to me “the agent isn’t broken” which is not exactly structure-anticipation-limiting.
“improve knowledge” here can be “its cognition is more fit to the environment”. Somebody could understand “represent the environment more” which it does not need to be.
Yes, classical Bayesian decision theory often requires a realizability assumption, which is unrealistic.
With such wide understanding it start to liken to me “the agent isn’t broken” which is not exactly structure-anticipation-limiting.
Realizability is anticipation-limiting but unrealistic.
While EUM captures the core of consequentialism, it does so in a way that is not very computationally feasible and leads to certain paradoxes pushed so far. So yes, EUM is unrealistic. The details are discussed in the embedded agency post.
So is intrinsic curiocity a reinforcement learning or unsupervised learning approach?
With comparing claims of different models in the same expression language you do not need to have a dynamic model of inconsistency.
If you need the model to decide about itself where it is wrong, there is the possiblity that the model ,which can be dynamic, is poor quality about it.
What if you are wrong about what is wrong?
Suppose we are making an inconsistency detector by choosement. Consider all our previous situations. Then generate a value representing how wrong that choice was in that situation. Then stick with the one that has highest wrongness and start “here we need to improve” whatever that means.
So at a given situation how wrong was the actual choice made? Generate values of how wrong the other options would have been and then return the difference between the highest and this one. If we use the same option-evaluator as when acting, suprise-suprise, we always picked the highest value option. If we use a different one why are we not using that one in the other? Every situation being wrongness zero means where we improve is arbitrary.
So is intrinsic curiocity a reinforcement learning or unsupervised learning approach?
Intrinsic curiosity uses reinforcement learning to find places where the map is missing information, and then unsupervised learning to include that information in the map.
The new improved model is done with reinforcement learning and not the consequentialism part.
Reinforcement learning is a form of consequentialism.
Trying the rephrase to use more correct/accurate concepts.
In the step where we use what actually happened to tweak our agents world-model (this is called “interpreter”?), it is usually a straight-forward calculation (a “reflex”) what kind of mental change ends up happening. There is no formation of alternatives. There is no choice. This process is essentially the same even if we use the other approaches.
I might have understood consequentialism overly narrowly (curses of -isms). So for disambiguation: choosement is creating a lot of items, forming a comparison item for each and a picker that is a function of the pool of comparison items only to pick the associated item to continue with and discarding others.
Consequentalism action choosement leaves the world-model unchanged. In incorporating feedback in a consequentalist approach there is no choosement employed and the world-model might change (+non-world-model comparison item former influence).
One could try to have an approach where choosement was used in feedback incorporation. Generate many options to randomly tweak the world-model. Then form a comparison item for each by running the action formation bit and note the utility-cardinality of the action that gets picked (reverse and take min if feedback is negative). Take the world-model-tweak with extrema action-cardinality, implement and carry on with that world-model.
Choosement could be used in supervised learning. Use different hyperparameters to get different bias and actually only use the one that is most singleminded about its result on this specific new situation.
The world-model changing parts of reinforcement learning do not come from choosement.
I might have understood consequentialism overly narrowly (curses of -isms). So for disambiguation: choosement is creating a lot of items, forming a comparison item for each and a picker that is a function of the pool of comparison items only to pick the associated item to continue with and discarding others.
I am not sure whether “choosement” here refers to a specific search algorithm, or search algorithms in general. As mentioned in the post, there are many search algorithms.
I mostly agree with that.
Not entirely, though—there are concepts called “intrinsic curiosity” where ML researchers use consequentialism to perform reinforcement learning towards reaching edge cases where the models break down, so that they can collect more data for their models. Similarly, expected utility maximization automatically gives you a drive to reduce uncertainty in variables that you expect to matter.
But overall I expect there will be a need to explicitly think about how to learn good world models.
The new improved model is done with reinforcement learning and not the consequentialism part. Even to the extent that a question of “How I act wrong?” is answered, what is wrong comes from the world-model. An actual inconsistency is not caught unless it is already concievable as one.
The EUM drive steers you to situations where your map is good, use the confident parts or the ones that are the least self-contradictory. It does not lead you to a better map, it does not take a bad variable and make it more representative.
Reinforcement learning is a form of consequentialism.
Instrinsic curiosity systems try to create ways to unsupervisedly make inconsistencies conceivable. For instance, they might train multiple models to predict the same thing, and then they can treat disagreements between the models as being inconsistencies.
I don’t think you can say this all that absolutely. Lots of complex stuff can happen with EUM in complex environments.
How safe I am to say such things about complex environments?
So you use choosement to act and you never vary your comparison-value-generator but just feed it with new kinds of inputs as you encounter new situations. You are in a situation you have essentially been in before. You want to do better than last time (not be mad by expecting different results). This means the tie-breaker on what you will do needs to depend on details that make this situation different from the last one. Luckily the situation is only essentially rather than totally the same so those exist. If your set-in-stone comparison-value-generator picks out the correct inessential detail you do an action better than last time.
So wasn’t it important that the right inessential detail was just the way it was? But that kind of makes it important for control to have been so. So last time step you might have had some other essential choice to make, but you might have also had control over inessential details. So for the sake of the future if the future control detail is among them it is important to set it in the good position. But how can you manage that? The detail is ultimately (objectively) inessential, so you do not have any “proper” motive to pay any attention to it. Well you could be similarly superstisous to have random scratchings help you pick the right one.
All these external dependencies either require all to be unique or be conditional on some other detail also being present. Maybe you get lucky and at some point you can use a detail that is for unrelated reasons a good bet. A lot of your comparison-value-picker probably deals with essential details too. However if it only depends on essential details on some choice, you better make a choice you can repeat for eternity in those situations because there ain’t gonna be improvement.
So a EUM that has a comparison-value-generator that is a function of immediately causable world-state only, learns only to the extent it can use the environment as a map and let it think for it. That such “scratch-pads” would be only-for-this-purpose mindstates still keeps it so that a lot of essential stuff about the policy is not apparent in the comparison-value-generator. And you need at install time make a very detailed module and know even for the late parts that will be used late how they need to be or atleast that they are okay to be as they are (and okay to be in the interrim).
Or you could allow the agent to change comparison-value-generator at will. Then it only needs one superstition at each junction to jump to the next one. Correspondingly any inessential dependence can mean more or different inessential dependencies at future times. Install time is still going to be pretty rough but you only need find one needle, instead of 1 per timestamp. If you manage to verify each “phase” separately you do not need to worry about unused inessential dependencies to be carried over to the next phase. The essential parts can also exists only when they are actually needed.
Choosement is not part of my definition of consequentialism.
Searching based on consequences is part of it, and you are right that in the real world you would want to update your model based on new data you learn. In the EUM framework, these updates are captured by Bayesian conditioning. There are other frameworks which capture the updates in other ways, but the basic points are basically the same.
How does “searching based on consequences” fail to ever use choosement?
The possibility of alternatives to choosement is discussed here.
Linking to totalities of very long posts have downsides comparable to writing wall-of-text replies.
I understand how “searching” can fail to be choosement when it ends up being “solving algebraicly” without actually checking any values of the open variables.
Going from abstract descriptions to more and more concrete solutions is not coupled how many elementary ground-level-concrete solutions get disregarded so it can be fast. I thought part of the worryingness of “checks every option” is that it doesn’t get fooled by faulty (or non-existent) abstractions
So to me it is surprising that an agent that never considers alternative avenues gets under the umbrella “consequentialist”. So an agent that changes policy if it is in pain and keeps policy if it feels pleasure, “is consequentialist” based on that its policy was caused by life-events, even if the policy is pure reflex.
There were vibes also to the effect of “this gets me what I want” is a consequentialist stance because of appearance of “gets”. So
is consequentialist because it projects winning.
Well so you are right that a functioning consequentialist must either magically have perfect knowledge, or must have some way of observing and understanding the world to improve its knowledge. Since magic isn’t real, in reality for advanced capable agents it must be the latter.
In the EUM framework, the observations and improvements in understanding are captured by Bayesian updating. In different frameworks, it may be captured by different things.
“improve knowledge” here can be “its cognition is more fit to the environment”. Somebody could understand “represent the environment more” which it does not need to be.
With such wide understanding it start to liken to me “the agent isn’t broken” which is not exactly structure-anticipation-limiting.
Yes, classical Bayesian decision theory often requires a realizability assumption, which is unrealistic.
Realizability is anticipation-limiting but unrealistic.
While EUM captures the core of consequentialism, it does so in a way that is not very computationally feasible and leads to certain paradoxes pushed so far. So yes, EUM is unrealistic. The details are discussed in the embedded agency post.
So is intrinsic curiocity a reinforcement learning or unsupervised learning approach?
With comparing claims of different models in the same expression language you do not need to have a dynamic model of inconsistency.
If you need the model to decide about itself where it is wrong, there is the possiblity that the model ,which can be dynamic, is poor quality about it.
What if you are wrong about what is wrong?
Suppose we are making an inconsistency detector by choosement. Consider all our previous situations. Then generate a value representing how wrong that choice was in that situation. Then stick with the one that has highest wrongness and start “here we need to improve” whatever that means.
So at a given situation how wrong was the actual choice made? Generate values of how wrong the other options would have been and then return the difference between the highest and this one. If we use the same option-evaluator as when acting, suprise-suprise, we always picked the highest value option. If we use a different one why are we not using that one in the other? Every situation being wrongness zero means where we improve is arbitrary.
Intrinsic curiosity uses reinforcement learning to find places where the map is missing information, and then unsupervised learning to include that information in the map.
Trying the rephrase to use more correct/accurate concepts.
In the step where we use what actually happened to tweak our agents world-model (this is called “interpreter”?), it is usually a straight-forward calculation (a “reflex”) what kind of mental change ends up happening. There is no formation of alternatives. There is no choice. This process is essentially the same even if we use the other approaches.
I might have understood consequentialism overly narrowly (curses of -isms). So for disambiguation: choosement is creating a lot of items, forming a comparison item for each and a picker that is a function of the pool of comparison items only to pick the associated item to continue with and discarding others.
Consequentalism action choosement leaves the world-model unchanged. In incorporating feedback in a consequentalist approach there is no choosement employed and the world-model might change (+non-world-model comparison item former influence).
One could try to have an approach where choosement was used in feedback incorporation. Generate many options to randomly tweak the world-model. Then form a comparison item for each by running the action formation bit and note the utility-cardinality of the action that gets picked (reverse and take min if feedback is negative). Take the world-model-tweak with extrema action-cardinality, implement and carry on with that world-model.
Choosement could be used in supervised learning. Use different hyperparameters to get different bias and actually only use the one that is most singleminded about its result on this specific new situation.
The world-model changing parts of reinforcement learning do not come from choosement.
I am not sure whether “choosement” here refers to a specific search algorithm, or search algorithms in general. As mentioned in the post, there are many search algorithms.
It is supposed to be a pattern that you can say whether a particular concrete algorithm or class of algoritms has or does not have.
But what pattern exactly?
edit: allowed evaluation to know about context
This is not necessarily part of my definition of consequentialism, since it is a specific search pattern and there are other search patterns.
I am clarifying what I meant in
If there is consequentialism that is not based or use choosement in what you mean that would probably be pretty essential for clarification.
The possibility of alternatives to choosement is discussed here.