Note also that your definition implies that if an agent alieves something, it must also believe it.
I find it interesting that you (seemingly) nodded along with my descriptions, but then proposed a definition which was almost opposite mine!
I don’t know how you so misread what I said; I explicitly wrote that aliefs constitute the larger logic, so that beliefs are contained in aliefs (which I’m pretty sure is what you were going for!) and not vice versa. Maybe you got confused because I put beliefs first in this description, or because I described the smaller of the two logics as the “reasoning engine” (for the reason I subsequently provided)? What you said almost convinced me that our definitions actually align, until I reached the point where you said that that beliefs could be “more complicated” than aliefs, which made me unsure.
Anyway, since you keep taking the time to thoroughly reply in good faith, I’ll do my best to clarify and address some of the rest of what you’ve said. However, thanks to the discussion we’ve had so far, a more formal presentation of my ideas is crystallizing in my mind; I prefer to save that for another proper post, since I anticipate it will involve rejigging the terminology again, and I don’t want to muddy the waters further!
Rather, for the Lobstacle, “A trusts B” has to be defined as “A willingly relies on B to perform mission-critical tasks”. This definition does indeed fail to be true for naive logical agents. But this should be an argument against naive logical agents, not our notion of trust.
Hence my perception that you do indeed have to question the theorems themselves, in order to dispute their “relevance” to the situation. The definition of trust seems fixed in place to me; indeed, I would instead have to question the relevance of your alternative definition, since what I actually want is the thing studied in the paper (IE being able to delegate critical tasks to another agent).
Perhaps we have different intuitive notions of trust, since I certainly trust myself (to perform “mission-critical tasks”), at least as far as my own logical reasoning is concerned, and an agent that doesn’t trust itself is going to waste a lot of time second-guessing its own actions. So I don’t think you’ve addressed my argument that the definition of trust that leads to the Löbstacle is faulty because it fails to be reflexive.
Attaining this guarantee in practice, so as to be able to trust that B will do what they have promised to do, is a separate but important problem. In general, the above notion of trust will only apply to what another agent says, or more precisely to the proofs they produce.
Is this a crux for you? My thinking is that this is going to be a deadly sticking point. It seems like you’re admitting that your approach has this problem, but, you think there’s value in what you’ve done so far because you’ve solved one part of the problem and you think this other part could also work with time. Is that what you’re intending to say? Whereas to me, it looks like this other part is just doomed to fail, so I don’t see what the value in your proposal could be. For me, solving the Lobstacle means being able to actually decide to delegate.
There are two separate issues here, and this response makes it apparent that you are conflating them. The fact that the second agent in the original Löbstacle paper is constrained to act only once it has produced a provably effective strategy and is constrained to follow that strategy means that the Löbstacle only applies to questions concerning the (formal) reasoning of a subordinate agent. Whether or not I manage to convince you that the Löbstacle doesn’t exist (because it’s founded on an untenable definition of trust), you have to acknowledge that the argument as presented there doesn’t address the following second problem. Suppose I can guarantee that my subordinate uses reasoning that I believe to be valid. How can I guarantee that it will act on that reasoning in a way I approve of? This is (obviously) a rather general version of the alignment problem. If you’re claiming that Löb’s theorem has a bearing on this, then that would be big news, especially if it vindicates your opinion that it is “doomed to fail”.
The reason I see my post as progress is that currently the Löbstacle is blocking serious research in using simple systems of formal agents to investigate such important problems as the alignment problem.
Your implication is “there was not a problem to begin with” rather than “I have solved the problem”. I asked whether you objected to details of the math in the original paper, and you said no—so apparently you would agree with the result that naive logical agents fail to trust their future self (which is the lobstacle!).
Taking the revised definition of trust I described, that last sentence is no longer the content of any formal mathematical result in that paper, so I do not agree with it, and I stand by what I said.
Indeed, my claim boils down to saying that there is no problem. But I don’t see why that doesn’t constitute a solution to the apparent problem. It’s like the Missing Dollar Riddle; explaining why there’s no problem is the solution.
I’m somewhat curious if you think you’ve communicated your perspective shift to any other person; so far, I’m like “there just doesn’t seem to be anything real here”, but maybe there are other people who get what you’re trying to say?
There’s no real way for me to know. Everyone who I’ve spoken to about this in person has gotten it, but that only amounts to a handful of people. It’s hard to find an audience; I hoped LW would supply one, but so far it seems not. Hopefully a more formal presentation will improve the situation.
Anyway, since you keep taking the time to thoroughly reply in good faith, I’ll do my best to clarify and address some of the rest of what you’ve said. However, thanks to the discussion we’ve had so far, a more formal presentation of my ideas is crystallizing in my mind; I prefer to save that for another proper post, since I anticipate it will involve rejigging the terminology again, and I don’t want to muddy the waters further!
Looks like I forgot about this discussion! Did you post a more formal treatment?
I don’t know how you so misread what I said; I explicitly wrote that aliefs constitute the larger logic, so that beliefs are contained in aliefs (which I’m pretty sure is what you were going for!) and not vice versa. Maybe you got confused because I put beliefs first in this description, or because I described the smaller of the two logics as the “reasoning engine” (for the reason I subsequently provided)? What you said almost convinced me that our definitions actually align, until I reached the point where you said that that beliefs could be “more complicated” than aliefs, which made me unsure.
Sorry for the confusion here! I haven’t re-oriented myself to the whole context, but it sounds like I did invent a big disagreement that didn’t exist. This has to do with my continued confusion about your approach. But in retrospect I do think your early accusation that I was insisting on some rigid assumptions holds water; I needed to go a bit further afield to try and interpret what you were getting at.
Whether or not I manage to convince you that the Löbstacle doesn’t exist (because it’s founded on an untenable definition of trust), you have to acknowledge that the argument as presented there doesn’t address the following second problem.
Again, I haven’t yet understood your approach or even re-read the whole conversation here, but it now seems to me that I was doing something wrong and silly by insisting on a definition of trust that forces the Löbstacle. The original paper is careful to only state that Löb naively seems to present an obstacle, not that it really truly does so. It looks to me like I was repeatedly stubborn on this point in an unproductive way.
I don’t know how you so misread what I said; I explicitly wrote that aliefs constitute the larger logic, so that beliefs are contained in aliefs (which I’m pretty sure is what you were going for!) and not vice versa. Maybe you got confused because I put beliefs first in this description, or because I described the smaller of the two logics as the “reasoning engine” (for the reason I subsequently provided)? What you said almost convinced me that our definitions actually align, until I reached the point where you said that that beliefs could be “more complicated” than aliefs, which made me unsure.
Anyway, since you keep taking the time to thoroughly reply in good faith, I’ll do my best to clarify and address some of the rest of what you’ve said. However, thanks to the discussion we’ve had so far, a more formal presentation of my ideas is crystallizing in my mind; I prefer to save that for another proper post, since I anticipate it will involve rejigging the terminology again, and I don’t want to muddy the waters further!
Perhaps we have different intuitive notions of trust, since I certainly trust myself (to perform “mission-critical tasks”), at least as far as my own logical reasoning is concerned, and an agent that doesn’t trust itself is going to waste a lot of time second-guessing its own actions. So I don’t think you’ve addressed my argument that the definition of trust that leads to the Löbstacle is faulty because it fails to be reflexive.
There are two separate issues here, and this response makes it apparent that you are conflating them. The fact that the second agent in the original Löbstacle paper is constrained to act only once it has produced a provably effective strategy and is constrained to follow that strategy means that the Löbstacle only applies to questions concerning the (formal) reasoning of a subordinate agent. Whether or not I manage to convince you that the Löbstacle doesn’t exist (because it’s founded on an untenable definition of trust), you have to acknowledge that the argument as presented there doesn’t address the following second problem. Suppose I can guarantee that my subordinate uses reasoning that I believe to be valid. How can I guarantee that it will act on that reasoning in a way I approve of? This is (obviously) a rather general version of the alignment problem. If you’re claiming that Löb’s theorem has a bearing on this, then that would be big news, especially if it vindicates your opinion that it is “doomed to fail”.
The reason I see my post as progress is that currently the Löbstacle is blocking serious research in using simple systems of formal agents to investigate such important problems as the alignment problem.
Taking the revised definition of trust I described, that last sentence is no longer the content of any formal mathematical result in that paper, so I do not agree with it, and I stand by what I said.
Indeed, my claim boils down to saying that there is no problem. But I don’t see why that doesn’t constitute a solution to the apparent problem. It’s like the Missing Dollar Riddle; explaining why there’s no problem is the solution.
There’s no real way for me to know. Everyone who I’ve spoken to about this in person has gotten it, but that only amounts to a handful of people. It’s hard to find an audience; I hoped LW would supply one, but so far it seems not. Hopefully a more formal presentation will improve the situation.
Looks like I forgot about this discussion! Did you post a more formal treatment?
Sorry for the confusion here! I haven’t re-oriented myself to the whole context, but it sounds like I did invent a big disagreement that didn’t exist. This has to do with my continued confusion about your approach. But in retrospect I do think your early accusation that I was insisting on some rigid assumptions holds water; I needed to go a bit further afield to try and interpret what you were getting at.
Again, I haven’t yet understood your approach or even re-read the whole conversation here, but it now seems to me that I was doing something wrong and silly by insisting on a definition of trust that forces the Löbstacle. The original paper is careful to only state that Löb naively seems to present an obstacle, not that it really truly does so. It looks to me like I was repeatedly stubborn on this point in an unproductive way.