If you just said a bunch of trivial statements 1 billion times, and then demanded to give you money, it would seem extremely suspicious. It does not fit with your pattern of behavior.
If, on the other hand, you gave useful and non-obvious advice, I would do it. Because the demand to give you money wouldn’t seem any different than all the other things you told me to do that worked out.
I mean, that’s the essence of the human concept of earning trust, and betrayal.
Yes, but expecting any reasoner to develop well-grounded abstract concepts without any grounding in features and then care about them is… well, it’s not actually complete bullshit, but expecting it to actually happen relies on solving some problems I haven’t seen solved.
You could, hypothetically, just program your AI to infer “goodness” as a causal-role concept from the vast sums of data it gains about the real world and our human opinions of it, and then “maximize goodness”, formulated as another causal role. But this requires sophisticated machinery for dealing with causal-role concepts, which I haven’t seen developed to that extent in any literature yet.
Usually, reasoners develop causal-role concepts in order to explain what their feature-level concepts are doing, and thus, causal-role concepts abstracted over concepts that don’t eventually root themselves in features are usually dismissed as useless metaphysical speculation, or at least abstract wankery one doesn’t care about.
If those 100 billion true statements were all (or even mostly) useful and better calibrated than my own priors, then I’d be likely to believe you, so yes. On the other hand, if you replace $100,000 with $100,000,000,000, I don’t think that would still hold.
I think you found an important caveat, which is that the fact that an agent will benefit from you believing a statement weakens the evidence that the statement is true, to the point that it’s literally zero for an agent that you don’t trust at all. And if an AI will have a human-like architecture, or even if not, I think that would still hold.
Yes, I would, assuming you don’t mean statements like “1+1 = 2”, but rather true statements spread over a variety of contexts such that I would reasonably believe that you would be trustworthy to that degree over random situations (and thus including such as whether I should give you money.)
(Also, the 100 billion true statements themselves would probably be much more valuable than $100,000).
Yes, I would, assuming you don’t mean statements like “1+1 = 2”, but rather true statements spread over a variety of contexts such that I would reasonably believe that you would be trustworthy to that degree over random situations (and thus including such as whether I should give you money.)
According to game theory, this opens you to exploitation by an agent that wants your money for its own gain and can generate 100 billion true statements at a little cost.
So if I spouted 100 billion true statements at you, then said, “It would be good for you to give me $100,000,” you’d pay up?
If you just said a bunch of trivial statements 1 billion times, and then demanded to give you money, it would seem extremely suspicious. It does not fit with your pattern of behavior.
If, on the other hand, you gave useful and non-obvious advice, I would do it. Because the demand to give you money wouldn’t seem any different than all the other things you told me to do that worked out.
I mean, that’s the essence of the human concept of earning trust, and betrayal.
Yes, but expecting any reasoner to develop well-grounded abstract concepts without any grounding in features and then care about them is… well, it’s not actually complete bullshit, but expecting it to actually happen relies on solving some problems I haven’t seen solved.
You could, hypothetically, just program your AI to infer “goodness” as a causal-role concept from the vast sums of data it gains about the real world and our human opinions of it, and then “maximize goodness”, formulated as another causal role. But this requires sophisticated machinery for dealing with causal-role concepts, which I haven’t seen developed to that extent in any literature yet.
Usually, reasoners develop causal-role concepts in order to explain what their feature-level concepts are doing, and thus, causal-role concepts abstracted over concepts that don’t eventually root themselves in features are usually dismissed as useless metaphysical speculation, or at least abstract wankery one doesn’t care about.
I don’t think you are responding the the correct comment. Or at least I have no idea what you are talking about.
If those 100 billion true statements were all (or even mostly) useful and better calibrated than my own priors, then I’d be likely to believe you, so yes. On the other hand, if you replace $100,000 with $100,000,000,000, I don’t think that would still hold.
I think you found an important caveat, which is that the fact that an agent will benefit from you believing a statement weakens the evidence that the statement is true, to the point that it’s literally zero for an agent that you don’t trust at all. And if an AI will have a human-like architecture, or even if not, I think that would still hold.
Yes, I would, assuming you don’t mean statements like “1+1 = 2”, but rather true statements spread over a variety of contexts such that I would reasonably believe that you would be trustworthy to that degree over random situations (and thus including such as whether I should give you money.)
(Also, the 100 billion true statements themselves would probably be much more valuable than $100,000).
According to game theory, this opens you to exploitation by an agent that wants your money for its own gain and can generate 100 billion true statements at a little cost.
You may be already doiving this, giving money to people whose claims you believe yoursel