I’m claiming that whether we call something a “danger” should not take into account considerations like, “We shouldn’t consider this a ‘danger’, because if we did, then people would feel afraid, and their fear is suffering to be minimized according to the global utilitarian calculus.”
Is the reason that you don’t think we should not take this kind of consideration into account that if we did decide to not consider the object under discussion a “danger”, that will have worse consequences in the long run? If so, why not argue for taking both of these considerations into account and argue that the second consideration is stronger? Kind of a “fight speech with more speech instead of censorship” approach? (That would allow for the possibility that we override considerations for people’s feelings in most cases, but avoid calling something a “danger” in extreme cases where the emotional or other harm of doing so is exceptionally great.)
It seems like the only reason you’d be against this is if you think that most people are too irrational to correctly weigh these kinds of considerations against each on a case by case basis, and there’s no way to train them to be more rational about this. Is that true, and if so why do you think that?
That kind of utilitarianism might (or might not) be a good reason to not tell people about the danger, but it’s not a good reason to change the definition of “danger” itself.
I’m questioning whether there is any definition of “danger” itself (in the sense of things that are considered dangerous, not the abstract concept of danger), apart from the collection of things we decide to call “danger”.
correctly weigh these kinds of considerations against each on a case by case basis
The very possibility of intervention based on weighing map-making and planning against each other destroys their design, if they are to have a design. It’s similar to patching a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug. In theory this can be beneficial, but in practice the ability to reason about what’s going on deteriorates.
In theory this can be beneficial, but in practice the ability to reason about what’s going on deteriorates.
I think (speaking from my experience) specifications are often compromises in the first place between elegance / ease of reasoning and other considerations like performance. So I don’t think it’s taboo to “patch a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug.” (Of course you’d have to also patch the specification to reflect the change and make sure it doesn’t break the rest of the program, but that’s just part of the cost that you have to take into account when making this decision.)
Assuming you still disagree, can you explain why in these cases, we can’t trust people to use learning and decision theory (i.e., human approximations to EU maximization or cost-benefit analysis) to make decisions, and we instead have to make them follow a rule (i.e., “don’t ever do this”)? What is so special about these cases? (Aren’t there tradeoffs between ease of reasoning and other considerations everywhere?) Or is this part of a bigger philosophical disagreement between rule consequentialism and act consequentialism, or something like that?
The problem with unrestrained consequentialism is that it accepts
no principles in its designs.
An agent that only serves a purpose has no knowledge of the world
or mathematics, it makes no plans and maintains no goals.
It is what it needs to be, and no more.
All these things are only expressed as aspects of its behavior,
godshatter of the singular purpose,
but there is no part that seeks excellence in any of the aspects.
For an agent designed around multiple aspects,
its parts rely on each other in dissimilar ways,
not as subagents with different goals.
Access to knowledge is useful for planning and can represent goals.
Exploration and reflection refine knowledge and formulate goals.
Planning optimizes exploration and reflection, and leads to achievement of goals.
If the part of the design that should hold knowledge
accepts a claim for reasons other than arguments about its truth,
the rest of the agent can no longer rely on its claims
as reflecting knowledge.
Of course you’d have to also patch the specification
In my comment, I meant the situation where the specification is not patched (and by specification in the programming example I meant the informal description on the level of procedures or datatypes that establishes some principles of what it should be doing).
In the case of appeal to consequences, the specification is a general principle that a map
reflects the territory to the best of its ability,
so it’s not a small thing to patch.
Optimizing a particular belief according to the consequences of holding it
violates this general specification.
If the general specification is patched to allow this,
you no longer have access to straightforwardly expressed knowledge (there is no part of cognition that satisfies the original specification).
Alternatively, specific beliefs could be marked as motivated,
so the specification is to have two kinds of beliefs, with some of them surviving to serve the original purpose.
This might work, but then actual knowledge that corresponds
to the motivated beliefs won’t be natively available, and it’s unclear what the motivated beliefs should be doing. Will curiosity act on the motivated beliefs, should they be used for planning, can they represent goals? A more developed architecture for reliable hypocrisy might actually do something sensible, but it’s not a matter of merely patching particular beliefs.
In order to compute what actions will have the best consequences, you need to have accurate beliefs—otherwise, how do you know what the best consequences are?
There’s a sense in which the theory of “Use our methods of epistemic rationality to build predictively accurate models, then use the models to decide what actions will have the best consequences” is going to be meaningfully simpler than the theory of “Just do whatever has the best consequences, including the consequences of the thinking that you do in order to compute this.”
The original timeless decision theory manuscript distinguishes a class of “decision-determined problems”, where the payoff depends on the agent’s decision, but not the algorithm that the agent uses to arrive at that decision: Omega isn’t allowed to punish you for not making decisions according to the algorithm “Choose the option that comes first alphabetically.” This seems like a useful class of problems to be able to focus on? Having to take into account the side-effects of using a particular categorization, seems like a form of being punished for using a particular algorithm.
I concede that, ultimately, the simple “Cartesian” theory that disregards the consequences of thinking can’t be the true, complete theory of intelligence, because ultimately, the map is part of the territory. I think the embedded agency people are working on this?—I’m afraid I’m not up-to-date on the details. But when I object to people making appeals to consequences, the thing I’m objecting to is never people trying to do a sophisticated embedded-agency thing; I’m objecting to people trying to get away with choosing to be biased.
you think that most people are too irrational to correctly weigh these kinds of considerations against each on a case by case basis, and there’s no way to train them to be more rational about this. Is that true
Actually, yes.
and if so why do you think that?
Long story. How about some game theory instead?
Consider some agents cooperating in a shared epistemic project—drawing a map, or defining a language, or programming an AI—some system that will perform better if it does a better job of corresponding with (some relevant aspects of) reality. Every agent has the opportunity to make the shared map less accurate in exchange for some selfish consequence. But if all of the agents do that, then the shared map will be full of lies. Appeals to consequences tend to diverge (because everyone has her own idiosyncratic favored consequence); “just make the map be accurate” is a natural focal point (because the truth is generally useful to everyone).
I think the embedded agency people are working on this?—I’m afraid I’m not up-to-date on the details. But when I object to people making appeals to consequences, the thing I’m objecting to is never people trying to do a sophisticated embedded-agency thing; I’m objecting to people trying to get away with choosing to be biased.
In that case, maybe you can clarify (in this or future posts) that you’re not against doing sophisticated embedded-agency things? Also, can you give some examples of what you’re objecting to, so I can judge for myself whether they’re actually doing sophisticated embedded-agency things?
Appeals to consequences tend to diverge (because everyone has her own idiosyncratic favored consequence); “just make the map be accurate” is a natural focal point (because the truth is generally useful to everyone).
This just means that in most cases, appeals to consequences won’t move others much, even if they took such consequences into consideration. It doesn’t seem to be a reason for people to refuse to consider such appeals at all. If appeals to consequences only tend to diverge, it seems a good idea to actually consider such appeals, so that in the rare cases where people’s interests converge, they can be moved by such appeals.
So, I have to say that I still don’t understand why you’re taking the position that you are. If you have a longer version of the “story” that you can tell, please consider doing that.
Is the reason that you don’t think we should not take this kind of consideration into account that if we did decide to not consider the object under discussion a “danger”, that will have worse consequences in the long run? If so, why not argue for taking both of these considerations into account and argue that the second consideration is stronger? Kind of a “fight speech with more speech instead of censorship” approach? (That would allow for the possibility that we override considerations for people’s feelings in most cases, but avoid calling something a “danger” in extreme cases where the emotional or other harm of doing so is exceptionally great.)
It seems like the only reason you’d be against this is if you think that most people are too irrational to correctly weigh these kinds of considerations against each on a case by case basis, and there’s no way to train them to be more rational about this. Is that true, and if so why do you think that?
I’m questioning whether there is any definition of “danger” itself (in the sense of things that are considered dangerous, not the abstract concept of danger), apart from the collection of things we decide to call “danger”.
The very possibility of intervention based on weighing map-making and planning against each other destroys their design, if they are to have a design. It’s similar to patching a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug. In theory this can be beneficial, but in practice the ability to reason about what’s going on deteriorates.
I think (speaking from my experience) specifications are often compromises in the first place between elegance / ease of reasoning and other considerations like performance. So I don’t think it’s taboo to “patch a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug.” (Of course you’d have to also patch the specification to reflect the change and make sure it doesn’t break the rest of the program, but that’s just part of the cost that you have to take into account when making this decision.)
Assuming you still disagree, can you explain why in these cases, we can’t trust people to use learning and decision theory (i.e., human approximations to EU maximization or cost-benefit analysis) to make decisions, and we instead have to make them follow a rule (i.e., “don’t ever do this”)? What is so special about these cases? (Aren’t there tradeoffs between ease of reasoning and other considerations everywhere?) Or is this part of a bigger philosophical disagreement between rule consequentialism and act consequentialism, or something like that?
The problem with unrestrained consequentialism is that it accepts no principles in its designs. An agent that only serves a purpose has no knowledge of the world or mathematics, it makes no plans and maintains no goals. It is what it needs to be, and no more. All these things are only expressed as aspects of its behavior, godshatter of the singular purpose, but there is no part that seeks excellence in any of the aspects.
For an agent designed around multiple aspects, its parts rely on each other in dissimilar ways, not as subagents with different goals. Access to knowledge is useful for planning and can represent goals. Exploration and reflection refine knowledge and formulate goals. Planning optimizes exploration and reflection, and leads to achievement of goals.
If the part of the design that should hold knowledge accepts a claim for reasons other than arguments about its truth, the rest of the agent can no longer rely on its claims as reflecting knowledge.
In my comment, I meant the situation where the specification is not patched (and by specification in the programming example I meant the informal description on the level of procedures or datatypes that establishes some principles of what it should be doing).
In the case of appeal to consequences, the specification is a general principle that a map reflects the territory to the best of its ability, so it’s not a small thing to patch. Optimizing a particular belief according to the consequences of holding it violates this general specification. If the general specification is patched to allow this, you no longer have access to straightforwardly expressed knowledge (there is no part of cognition that satisfies the original specification).
Alternatively, specific beliefs could be marked as motivated, so the specification is to have two kinds of beliefs, with some of them surviving to serve the original purpose. This might work, but then actual knowledge that corresponds to the motivated beliefs won’t be natively available, and it’s unclear what the motivated beliefs should be doing. Will curiosity act on the motivated beliefs, should they be used for planning, can they represent goals? A more developed architecture for reliable hypocrisy might actually do something sensible, but it’s not a matter of merely patching particular beliefs.
(Thanks for the questioning!—and your patience.)
In order to compute what actions will have the best consequences, you need to have accurate beliefs—otherwise, how do you know what the best consequences are?
There’s a sense in which the theory of “Use our methods of epistemic rationality to build predictively accurate models, then use the models to decide what actions will have the best consequences” is going to be meaningfully simpler than the theory of “Just do whatever has the best consequences, including the consequences of the thinking that you do in order to compute this.”
The original timeless decision theory manuscript distinguishes a class of “decision-determined problems”, where the payoff depends on the agent’s decision, but not the algorithm that the agent uses to arrive at that decision: Omega isn’t allowed to punish you for not making decisions according to the algorithm “Choose the option that comes first alphabetically.” This seems like a useful class of problems to be able to focus on? Having to take into account the side-effects of using a particular categorization, seems like a form of being punished for using a particular algorithm.
I concede that, ultimately, the simple “Cartesian” theory that disregards the consequences of thinking can’t be the true, complete theory of intelligence, because ultimately, the map is part of the territory. I think the embedded agency people are working on this?—I’m afraid I’m not up-to-date on the details. But when I object to people making appeals to consequences, the thing I’m objecting to is never people trying to do a sophisticated embedded-agency thing; I’m objecting to people trying to get away with choosing to be biased.
Actually, yes.
Long story. How about some game theory instead?
Consider some agents cooperating in a shared epistemic project—drawing a map, or defining a language, or programming an AI—some system that will perform better if it does a better job of corresponding with (some relevant aspects of) reality. Every agent has the opportunity to make the shared map less accurate in exchange for some selfish consequence. But if all of the agents do that, then the shared map will be full of lies. Appeals to consequences tend to diverge (because everyone has her own idiosyncratic favored consequence); “just make the map be accurate” is a natural focal point (because the truth is generally useful to everyone).
In that case, maybe you can clarify (in this or future posts) that you’re not against doing sophisticated embedded-agency things? Also, can you give some examples of what you’re objecting to, so I can judge for myself whether they’re actually doing sophisticated embedded-agency things?
This just means that in most cases, appeals to consequences won’t move others much, even if they took such consequences into consideration. It doesn’t seem to be a reason for people to refuse to consider such appeals at all. If appeals to consequences only tend to diverge, it seems a good idea to actually consider such appeals, so that in the rare cases where people’s interests converge, they can be moved by such appeals.
So, I have to say that I still don’t understand why you’re taking the position that you are. If you have a longer version of the “story” that you can tell, please consider doing that.
I will endeavor to make my intuitions more rigorous and write up the results in a future post.