Can you expand on what you mean by “demonic”? Is it a shorthand for “indicative of broken cognition, because it’s both cruel and unnecessary”, or something else? I THINK what you’re wondering about is whether these techniques/behaviors are ever actually optimal when dealing with misaligned agents who you nonetheless consider to be moral patients. Is that close?
I think that both questions are related to uncertainty about the other agent(s). Bargaining implies costly changes to future behaviors (of both parties). Which makes signaling of capability and willingness important. Bargainers need to signal that they will change something in a meaningful way based on whatever agreement/concession is reached. In repeated interaction (which is almost all of them), actual follow-through is the strongest signal.
So, actual torture is the strongest signal of willingness and ability to torture. Building a torturizer shows capability, but only hints at willingness. Having materials that could build a torturizer or an orgasmatron is pretty weak, but not zero. Likewise with strength and wealth—it’s shows capability of benefit/reduced-harm from cooperation, which is an important prerequisite.
I don’t think you can assert that threats are never carried out, unless you somehow have perfect mutual knowledge (and then, it’s not bargaining, it’s just optimization). Thomas Schelling won a Nobel for his work in bargaining under uncertainty, and I think most of those calculations are valid, no matter how adavnced and rational the involved agents are, when their knowledge is incomplete and they’re misaligned in their goals.
Since acausal trade issues are basically spiritual, when the trade is bad I seek a word that means “spiritually bad.” You can read it as just “bad” if you want.
So, actual torture is the strongest signal of willingness and ability to torture. Building a torturizer shows capability, but only hints at willingness. Having materials that could build a torturizer or an orgasmatron is pretty weak, but not zero
Probable crux: Cognitive transparency is actually easy for advanced agencies. It’s difficult for a human to prove to a distant human that they have the means to build and deploy a torturizer without actually doing it. It wouldn’t be difficult for brains that were designed to be capable of proving the state of their beliefs, and AGI participating in a community with other AGI would want to be capable of that. (The contemporary analog is trusted computing. The number of coordination problems it could solve for us, today, if it were fully applied, is actually depressing.)
There would still be uncertainties as a result of mutual comprehensibility issues, but they could turn out to be of negligible importance, especially once nobody’s lying any more.
Ah, sorry—I missed the acausal assumption in the post. I generally ignore such those explorations, as I don’t think “decision” is the right word without causality and conditional probability.
I think you’re right that cognitive transparency is a crux. I strongly doubt it’s possible to be mutual, or possible between agents near each other in cognitive power. It may be possible for a hyperintelligence to understand/predict a human-level intelligence, but in that case the human is so outclassed that “trade” is the wrong word, and “manipulation” or “slavery” (or maybe “absorption”) is a better model.
You don’t have to be able to simulate something to trust it for this or that. EG, the specification of alphazero is much simpler than the final weights, and knowing its training process, without knowing its weights, you can still trust that it will never, say, take a bribe to throw a match. Even if it comprehended bribery, we know from its spec info that it’s solely interested in winning whatever match it’s currently playing, and no sum would be enough.
To generalize, if we know something’s utility function, and if we know it had a robust design, even if we know nothing else about its history, we know what it’ll do.
A promise-keeping capacity is a property utility functions can have.
A promise-keeping capacity is a property utility functions can have.
Yeah, definitely cruxy. It may be a property that utility functions could have, but it’s not a property that any necessarily do have. Moreover, we have zero examples of robust-designed agents with known utility functions, so it’s extremely unclear whether that will become the norm, let alone the universal assumption.
Can you expand on what you mean by “demonic”? Is it a shorthand for “indicative of broken cognition, because it’s both cruel and unnecessary”, or something else? I THINK what you’re wondering about is whether these techniques/behaviors are ever actually optimal when dealing with misaligned agents who you nonetheless consider to be moral patients. Is that close?
I think that both questions are related to uncertainty about the other agent(s). Bargaining implies costly changes to future behaviors (of both parties). Which makes signaling of capability and willingness important. Bargainers need to signal that they will change something in a meaningful way based on whatever agreement/concession is reached. In repeated interaction (which is almost all of them), actual follow-through is the strongest signal.
So, actual torture is the strongest signal of willingness and ability to torture. Building a torturizer shows capability, but only hints at willingness. Having materials that could build a torturizer or an orgasmatron is pretty weak, but not zero. Likewise with strength and wealth—it’s shows capability of benefit/reduced-harm from cooperation, which is an important prerequisite.
I don’t think you can assert that threats are never carried out, unless you somehow have perfect mutual knowledge (and then, it’s not bargaining, it’s just optimization). Thomas Schelling won a Nobel for his work in bargaining under uncertainty, and I think most of those calculations are valid, no matter how adavnced and rational the involved agents are, when their knowledge is incomplete and they’re misaligned in their goals.
Since acausal trade issues are basically spiritual, when the trade is bad I seek a word that means “spiritually bad.” You can read it as just “bad” if you want.
Probable crux: Cognitive transparency is actually easy for advanced agencies. It’s difficult for a human to prove to a distant human that they have the means to build and deploy a torturizer without actually doing it. It wouldn’t be difficult for brains that were designed to be capable of proving the state of their beliefs, and AGI participating in a community with other AGI would want to be capable of that. (The contemporary analog is trusted computing. The number of coordination problems it could solve for us, today, if it were fully applied, is actually depressing.)
There would still be uncertainties as a result of mutual comprehensibility issues, but they could turn out to be of negligible importance, especially once nobody’s lying any more.
Obviously it is not clear why “acausal trade issues” would be “spiritual” or what you mean by those terms.
So what?
Ah, sorry—I missed the acausal assumption in the post. I generally ignore such those explorations, as I don’t think “decision” is the right word without causality and conditional probability.
I think you’re right that cognitive transparency is a crux. I strongly doubt it’s possible to be mutual, or possible between agents near each other in cognitive power. It may be possible for a hyperintelligence to understand/predict a human-level intelligence, but in that case the human is so outclassed that “trade” is the wrong word, and “manipulation” or “slavery” (or maybe “absorption”) is a better model.
You don’t have to be able to simulate something to trust it for this or that. EG, the specification of alphazero is much simpler than the final weights, and knowing its training process, without knowing its weights, you can still trust that it will never, say, take a bribe to throw a match. Even if it comprehended bribery, we know from its spec info that it’s solely interested in winning whatever match it’s currently playing, and no sum would be enough.
To generalize, if we know something’s utility function, and if we know it had a robust design, even if we know nothing else about its history, we know what it’ll do.
A promise-keeping capacity is a property utility functions can have.
Yeah, definitely cruxy. It may be a property that utility functions could have, but it’s not a property that any necessarily do have. Moreover, we have zero examples of robust-designed agents with known utility functions, so it’s extremely unclear whether that will become the norm, let alone the universal assumption.