In theory this can be beneficial, but in practice the ability to reason about what’s going on deteriorates.
I think (speaking from my experience) specifications are often compromises in the first place between elegance / ease of reasoning and other considerations like performance. So I don’t think it’s taboo to “patch a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug.” (Of course you’d have to also patch the specification to reflect the change and make sure it doesn’t break the rest of the program, but that’s just part of the cost that you have to take into account when making this decision.)
Assuming you still disagree, can you explain why in these cases, we can’t trust people to use learning and decision theory (i.e., human approximations to EU maximization or cost-benefit analysis) to make decisions, and we instead have to make them follow a rule (i.e., “don’t ever do this”)? What is so special about these cases? (Aren’t there tradeoffs between ease of reasoning and other considerations everywhere?) Or is this part of a bigger philosophical disagreement between rule consequentialism and act consequentialism, or something like that?
The problem with unrestrained consequentialism is that it accepts
no principles in its designs.
An agent that only serves a purpose has no knowledge of the world
or mathematics, it makes no plans and maintains no goals.
It is what it needs to be, and no more.
All these things are only expressed as aspects of its behavior,
godshatter of the singular purpose,
but there is no part that seeks excellence in any of the aspects.
For an agent designed around multiple aspects,
its parts rely on each other in dissimilar ways,
not as subagents with different goals.
Access to knowledge is useful for planning and can represent goals.
Exploration and reflection refine knowledge and formulate goals.
Planning optimizes exploration and reflection, and leads to achievement of goals.
If the part of the design that should hold knowledge
accepts a claim for reasons other than arguments about its truth,
the rest of the agent can no longer rely on its claims
as reflecting knowledge.
Of course you’d have to also patch the specification
In my comment, I meant the situation where the specification is not patched (and by specification in the programming example I meant the informal description on the level of procedures or datatypes that establishes some principles of what it should be doing).
In the case of appeal to consequences, the specification is a general principle that a map
reflects the territory to the best of its ability,
so it’s not a small thing to patch.
Optimizing a particular belief according to the consequences of holding it
violates this general specification.
If the general specification is patched to allow this,
you no longer have access to straightforwardly expressed knowledge (there is no part of cognition that satisfies the original specification).
Alternatively, specific beliefs could be marked as motivated,
so the specification is to have two kinds of beliefs, with some of them surviving to serve the original purpose.
This might work, but then actual knowledge that corresponds
to the motivated beliefs won’t be natively available, and it’s unclear what the motivated beliefs should be doing. Will curiosity act on the motivated beliefs, should they be used for planning, can they represent goals? A more developed architecture for reliable hypocrisy might actually do something sensible, but it’s not a matter of merely patching particular beliefs.
I think (speaking from my experience) specifications are often compromises in the first place between elegance / ease of reasoning and other considerations like performance. So I don’t think it’s taboo to “patch a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug.” (Of course you’d have to also patch the specification to reflect the change and make sure it doesn’t break the rest of the program, but that’s just part of the cost that you have to take into account when making this decision.)
Assuming you still disagree, can you explain why in these cases, we can’t trust people to use learning and decision theory (i.e., human approximations to EU maximization or cost-benefit analysis) to make decisions, and we instead have to make them follow a rule (i.e., “don’t ever do this”)? What is so special about these cases? (Aren’t there tradeoffs between ease of reasoning and other considerations everywhere?) Or is this part of a bigger philosophical disagreement between rule consequentialism and act consequentialism, or something like that?
The problem with unrestrained consequentialism is that it accepts no principles in its designs. An agent that only serves a purpose has no knowledge of the world or mathematics, it makes no plans and maintains no goals. It is what it needs to be, and no more. All these things are only expressed as aspects of its behavior, godshatter of the singular purpose, but there is no part that seeks excellence in any of the aspects.
For an agent designed around multiple aspects, its parts rely on each other in dissimilar ways, not as subagents with different goals. Access to knowledge is useful for planning and can represent goals. Exploration and reflection refine knowledge and formulate goals. Planning optimizes exploration and reflection, and leads to achievement of goals.
If the part of the design that should hold knowledge accepts a claim for reasons other than arguments about its truth, the rest of the agent can no longer rely on its claims as reflecting knowledge.
In my comment, I meant the situation where the specification is not patched (and by specification in the programming example I meant the informal description on the level of procedures or datatypes that establishes some principles of what it should be doing).
In the case of appeal to consequences, the specification is a general principle that a map reflects the territory to the best of its ability, so it’s not a small thing to patch. Optimizing a particular belief according to the consequences of holding it violates this general specification. If the general specification is patched to allow this, you no longer have access to straightforwardly expressed knowledge (there is no part of cognition that satisfies the original specification).
Alternatively, specific beliefs could be marked as motivated, so the specification is to have two kinds of beliefs, with some of them surviving to serve the original purpose. This might work, but then actual knowledge that corresponds to the motivated beliefs won’t be natively available, and it’s unclear what the motivated beliefs should be doing. Will curiosity act on the motivated beliefs, should they be used for planning, can they represent goals? A more developed architecture for reliable hypocrisy might actually do something sensible, but it’s not a matter of merely patching particular beliefs.