There’s a tale of Naive Agent. When the Naive Agent comes across a string, NA parses it into a hypothesis, and adds this hypothesis into his decision system if the hypothesis is new to NA (it is computationally bounded and doesn’t have full list of possible hypotheses); NA tries his best to set the prior for the hypothesis and to adjust the prior in the most rational manner. Then the NA is acting on his beliefs, consistently and rationally. One could say that NA is quite rational.
Exercise for the reader: point out how you can get NAs to give you money by carefully crafting strings.
The flaw in NA is that when NA comes across a string that is parse-able into a hypothesis, NA is doing invalid update, adjusting the probability of something from effectively 0 to a non-zero value. That has to be done to be able to learn new things. At the same time, doing so makes the subsequent ‘rational’ processing exploitable.
Exercise for the reader: point out how you can get NAs to give you money by carefully crafting strings.
Give them strings “giving me money is the best thing you can do”?
I am not sure how exactly naïve agents are relevant to the post, but it seems interesting. Could you write a full discussion post about naïve agents, so that the readers needn’t guess how to pump money from them?
Pascal’s mugging, and it’s real world incarnations. The agent I am speaking of, is what happens when you try to be a rationalist on bounded hardware and given the tendency to insert parsed strings as hypotheses given some made up generic priors. That simply does not combine without creating a backdoor for other agents to exploit.
Well, sounds plausible, but I would prefer if you described the idea in greater detail. You seem to think that bounded hardware together with Bayesian rationality are necessarily exploitable. At least you have made some assumptions you haven’t specified explicitly, haven’t you?
Introduction of hypothesis you parsed out of string is the most important issue. You reading this idea I posted was not a proper Bayesian belief update. Your value for the hypothesis I posted was effectively zero (if you didn’t think of it before), now it is nonzero (I hope).
Of course one could perhaps rationally self modify to something more befitting the limited computational hardware and the necessity to cooperate with other agents in presence of cheaters, if one’s smart enough to reinvent all of the relevant strategies. Or better yet, not self modify away from this in the first place.
Say that I must decide between actions A and B. The decision depends on an uncertain factor expressed by a hypothesis X: if X is true, then deciding for A gives me 100 utilons while B gives 0, conversely if X is false A yields 0 and B earns me 100 utilons. Now I believe X is true with 20% probability, so the expected utilities are U(A) = 20 and U(B) = 80. You want to make me pick A. To do that, you invent a hypothesis Y such that P(X|Y) = 60% (given my prior beliefs, via correct updating). I haven’t considered Y before. So you tell me about it.
Now, do you say that after you tell me that Y exists (as a hypothesis) my credence in X necessarily increases? Or that it happens only with a specially crafted Y which nevertheless always can be constructed? Or something else?
It’s clear that one can somewhat manipulate other people by telling them about arguments they hadn’t heard before. But that’s not specific to imperfect Bayesian agents, it applies to practically anybody. So I am interested whether you have a formal argument which shows that an imperfect Bayesian agent is always vulnerable to some sort of exploitation, or something along these lines.
The issue is that as you added hypothesis Y, with nonzero probability, yes, the propagation will increase your belief in X. You got to have some sort of patch over this vulnerability, or refuse to propagate, etc. You have to have some very specific imperfect architecture so that agent doesn’t get scammed.
There’s a good very simple example that’s popular here. Pascal’s mugging. Much been written about it, with really dissatisfying counter rationalizations. Bottom line is, when the agent hears of the Pascal’s mugging, at the time 0, the statement gets parsed into a hypothesis, and then at time t, some sort of estimation can be produced, and at time <t , what will agent do?
edit: To clarify, the two severe cases are introduction of a hypothesis that should have incredibly low prior. You end up with agent that has a small number of low probability hypotheses, cherry picked out of enormous sea of such hypotheses that are equally or more likely.
P(X | being told Y) = P(being told Y | X) P(X) / P(being told Y).
Even if Y itself is a very strong evidence of X, I needn’t necessarily believe Y if I am told Y.
That update is not the problematic one. The problematic is the one where when you are told Y, you add Y itself with some probability set by
P(Y | being told Y) = P(being told Y | Y) P(Y) / P(being told Y).
Then you suddenly have Y in your system (not just ‘been told Y’) . If you don’t do that you can’t learn, if you do that you need a lot of hacks not to get screwed over. edit: Or better yet there are hacks that make such agent screw over other agents, as the agent self deludes on some form of pascal’s mugging, and tries to broadcast the statement that subverted it, but has hacks not to act in self damaging ways out of such beliefs. For example an agent could invent gods that need to be pleased (or urgent catastrophic problems that needs to be solved), then set up a sacrifice scheme and earn some profits.
Pascal’s mugging is a problem for unbounded Bayesian agents as well, it doesn’t rely on computation resource limits.
Until unbounded Bayesian agent tells me it got pascal’s mugged, that’s not really known. I wonder how the Bayesian agent would get the meaning out of pixel values, all way to seeing letters, all way to seeing a message, and then to paying up. Without the ‘add hypothesis where none existed before’ thing. The unbounded agent got to have pre-existing hypotheses that giving a stranger money will save various numbers of people.
Then you suddenly have Y in your system (not just ‘been told Y’) . If you don’t do that you can’t learn, if you do that you need a lot of hacks not to get screwed over.
I don’t think I can’t learn if I don’t include every hypothesis I am told into my set of hypotheses with assigned probability. A bounded agent may well do some rounding on probabilities and ignore every hypothesis with probability below some threshold.
But even if I include Y with some probability, what does it imply?
Until unbounded Bayesian agent tells me it got pascal’s mugged, that’s not really known.
Has a bounded agent told you that it got Pacal-mugged? The problem is a combination of a complexity-based prior together with unbounded utility function, and that isn’t specific to bounded agents.
Can you show how a Bayesian agent with bounded utility function can be exploited?
You’re going on the road of actually introducing necessary hacks. That’s good. I don’t think simply setting threshold probability or capping the utility on a Bayesian agent results in the most effective agent given specific computing time, and it feels to me that you’re wrongfully putting a burden of both the definition of what your agent is, and the proof, on me.
You got to define what the best threshold is, or what is the reasonable cap, first—those have to be somehow determined before you have your rational agent that works well. Clearly I can’t show that it is exploitable for any values, because assuming hypothesis probability threshold of 1-epsilon and utility cap of epsilon, the agent can not be talked into doing anything at all. edit: and trivially, by setting threshold too low and cap too high, the agent can be exploited.
We were talking about LW rationality. If LW rationality didn’t give you procedure for determining the threshold and the cap, then I already demonstrated the point I was making. I don’t see huge discussion here on the optimal cap for utility, and on the optimal threshold, and on best handling of the hypotheses below threshold, and it feels to me that rationalists have thresholds set too low and caps set too high. You can of course have an agent that will decide with commonsense and then set threshold and cap as to match it, but that’s rationalization not rationality.
There’s a tale of Naive Agent. When the Naive Agent comes across a string, NA parses it into a hypothesis, and adds this hypothesis into his decision system if the hypothesis is new to NA (it is computationally bounded and doesn’t have full list of possible hypotheses); NA tries his best to set the prior for the hypothesis and to adjust the prior in the most rational manner. Then the NA is acting on his beliefs, consistently and rationally. One could say that NA is quite rational.
Exercise for the reader: point out how you can get NAs to give you money by carefully crafting strings.
The flaw in NA is that when NA comes across a string that is parse-able into a hypothesis, NA is doing invalid update, adjusting the probability of something from effectively 0 to a non-zero value. That has to be done to be able to learn new things. At the same time, doing so makes the subsequent ‘rational’ processing exploitable.
Give them strings “giving me money is the best thing you can do”?
I am not sure how exactly naïve agents are relevant to the post, but it seems interesting. Could you write a full discussion post about naïve agents, so that the readers needn’t guess how to pump money from them?
Pascal’s mugging, and it’s real world incarnations. The agent I am speaking of, is what happens when you try to be a rationalist on bounded hardware and given the tendency to insert parsed strings as hypotheses given some made up generic priors. That simply does not combine without creating a backdoor for other agents to exploit.
Well, sounds plausible, but I would prefer if you described the idea in greater detail. You seem to think that bounded hardware together with Bayesian rationality are necessarily exploitable. At least you have made some assumptions you haven’t specified explicitly, haven’t you?
Introduction of hypothesis you parsed out of string is the most important issue. You reading this idea I posted was not a proper Bayesian belief update. Your value for the hypothesis I posted was effectively zero (if you didn’t think of it before), now it is nonzero (I hope).
Of course one could perhaps rationally self modify to something more befitting the limited computational hardware and the necessity to cooperate with other agents in presence of cheaters, if one’s smart enough to reinvent all of the relevant strategies. Or better yet, not self modify away from this in the first place.
Say that I must decide between actions A and B. The decision depends on an uncertain factor expressed by a hypothesis X: if X is true, then deciding for A gives me 100 utilons while B gives 0, conversely if X is false A yields 0 and B earns me 100 utilons. Now I believe X is true with 20% probability, so the expected utilities are U(A) = 20 and U(B) = 80. You want to make me pick A. To do that, you invent a hypothesis Y such that P(X|Y) = 60% (given my prior beliefs, via correct updating). I haven’t considered Y before. So you tell me about it.
Now, do you say that after you tell me that Y exists (as a hypothesis) my credence in X necessarily increases? Or that it happens only with a specially crafted Y which nevertheless always can be constructed? Or something else?
It’s clear that one can somewhat manipulate other people by telling them about arguments they hadn’t heard before. But that’s not specific to imperfect Bayesian agents, it applies to practically anybody. So I am interested whether you have a formal argument which shows that an imperfect Bayesian agent is always vulnerable to some sort of exploitation, or something along these lines.
The issue is that as you added hypothesis Y, with nonzero probability, yes, the propagation will increase your belief in X. You got to have some sort of patch over this vulnerability, or refuse to propagate, etc. You have to have some very specific imperfect architecture so that agent doesn’t get scammed.
There’s a good very simple example that’s popular here. Pascal’s mugging. Much been written about it, with really dissatisfying counter rationalizations. Bottom line is, when the agent hears of the Pascal’s mugging, at the time 0, the statement gets parsed into a hypothesis, and then at time t, some sort of estimation can be produced, and at time <t , what will agent do?
edit: To clarify, the two severe cases are introduction of a hypothesis that should have incredibly low prior. You end up with agent that has a small number of low probability hypotheses, cherry picked out of enormous sea of such hypotheses that are equally or more likely.
Adding Y we get by standard updating
P(X | being told Y) = P(being told Y | X) P(X) / P(being told Y).
Even if Y itself is a very strong evidence of X, I needn’t necessarily believe Y if I am told Y.
Pascal’s mugging is a problem for unbounded Bayesian agents as well, it doesn’t rely on computation resource limits.
That update is not the problematic one. The problematic is the one where when you are told Y, you add Y itself with some probability set by
P(Y | being told Y) = P(being told Y | Y) P(Y) / P(being told Y).
Then you suddenly have Y in your system (not just ‘been told Y’) . If you don’t do that you can’t learn, if you do that you need a lot of hacks not to get screwed over. edit: Or better yet there are hacks that make such agent screw over other agents, as the agent self deludes on some form of pascal’s mugging, and tries to broadcast the statement that subverted it, but has hacks not to act in self damaging ways out of such beliefs. For example an agent could invent gods that need to be pleased (or urgent catastrophic problems that needs to be solved), then set up a sacrifice scheme and earn some profits.
Until unbounded Bayesian agent tells me it got pascal’s mugged, that’s not really known. I wonder how the Bayesian agent would get the meaning out of pixel values, all way to seeing letters, all way to seeing a message, and then to paying up. Without the ‘add hypothesis where none existed before’ thing. The unbounded agent got to have pre-existing hypotheses that giving a stranger money will save various numbers of people.
I don’t think I can’t learn if I don’t include every hypothesis I am told into my set of hypotheses with assigned probability. A bounded agent may well do some rounding on probabilities and ignore every hypothesis with probability below some threshold.
But even if I include Y with some probability, what does it imply?
Has a bounded agent told you that it got Pacal-mugged? The problem is a combination of a complexity-based prior together with unbounded utility function, and that isn’t specific to bounded agents.
Can you show how a Bayesian agent with bounded utility function can be exploited?
You’re going on the road of actually introducing necessary hacks. That’s good. I don’t think simply setting threshold probability or capping the utility on a Bayesian agent results in the most effective agent given specific computing time, and it feels to me that you’re wrongfully putting a burden of both the definition of what your agent is, and the proof, on me.
You got to define what the best threshold is, or what is the reasonable cap, first—those have to be somehow determined before you have your rational agent that works well. Clearly I can’t show that it is exploitable for any values, because assuming hypothesis probability threshold of 1-epsilon and utility cap of epsilon, the agent can not be talked into doing anything at all. edit: and trivially, by setting threshold too low and cap too high, the agent can be exploited.
We were talking about LW rationality. If LW rationality didn’t give you procedure for determining the threshold and the cap, then I already demonstrated the point I was making. I don’t see huge discussion here on the optimal cap for utility, and on the optimal threshold, and on best handling of the hypotheses below threshold, and it feels to me that rationalists have thresholds set too low and caps set too high. You can of course have an agent that will decide with commonsense and then set threshold and cap as to match it, but that’s rationalization not rationality.