“Well, since it’s too late there,” said the Scientist, “would you maybe agree with me that ‘eternal returns’ is a prediction derived by looking at observations in a simple way, and then doing some pretty simple reasoning on it; and that’s, like, cool? Even if that coolness is not the single overwhelming decisive factor in what to believe?”
“Depends exactly what you mean by ‘cool’,” said the Epistemologist.
“Okay, let me give it a shot,” said the Scientist. “Suppose you model me as having a bunch of subagents who make trades on some kind of internal prediction market. The whole time I’ve been watching Ponzi Pyramid Incorporated, I’ve had a very simple and dumb internal trader who has been making a bunch of money betting that they will keep going up by 20%. Of course, my mind contains a whole range of other traders too, so this one isn’t able to swing the market by itself, but what I mean by ‘cool’ is that this trader does have a bunch of money now! (More than others do, because in my internal prediction markets, simpler traders start off with more money.)”
“The problem,” said the Epistemologist, “is that you’re in an adversarial context, where the observations you’re seeing have been designed to make that particular simple trader rich. In that context, you shouldn’t be giving those simple traders so much money to start off with; they’ll just continue being exploited until you learn better.”
“But is that the right place to intervene? After all, my internal prediction market is itself an adversarial process. And so the simple internal trader who just predicts that things will continue going up the same amount every year will be exploited by other internal traders as soon as it dares to venture a bet on, say, the returns of the previous company that our friend the Spokesperson worked at. Indeed, those savvier traders might even push me to go look up that data (using, perhaps, some kind of internal action auction), in order to more effectively take the simple trader’s money.”
“You claim,” said the Epistemologist, “to have these more sophisticated internal traders. Yet you started this conversation by defending the coolness, aka wealth, of the trader corresponding to the Spokesperson’s predictions. So it seems like these sophisticated internal traders are not doing their work so well after all.”
“They haven’t taken its money yet,” said the Scientist, “But they will before it gets a chance to invest any of my money. Nevertheless, your point is a good one; it’s not very cool to only have money temporarily. Hmmm, let me muse on this.”
The Scientist thinks for a few minutes, then speaks again.
”I’ll try another attempt to describe what I mean by ‘cool’. Often-times, clever arguers suggest new traders to me, and point out that those traders would have made a lot of money if they’d been trading earlier. Now, if I were an ideal Garrabrant inductor I would ignore these arguments, and only pay attention to these new traders’ future trades. But I have not world enough or time for this; so I’ve decided to subsidize new traders based on how they would have done if they’d been trading earlier. Of course, though, this leaves me vulnerable to clever arguers inventing overfitted traders. So the subsidy has to be proportional to how likely it is that the clever arguer could have picked out this specific trader in advance. And for all Spokesperson’s flaws, I do think that 5 years ago he was probably saying something that sounded reasonably similar to ’20% returns indefinitely!′ That is the sense in which his claim is cool.”
“Hmm,” said the Epistemologist. “An interesting suggestion, but I note that you’ve departed from the language of traders in doing so. I feel suspicious that you’re smuggling something in, in a way which I can’t immediately notice.”
“Right, which would be not very cool. Alas, I feel uncertain about how to put my observation into the language of traders. But… well, I’ve already said that simple traders start off with more money. So perhaps it’s just the same thing as before, except that when evaluating new traders on old data I put extra weight on simplicity when deciding how much money they start with—because now it also helps prevent clever arguers from fooling me (and potentially themselves) with overfitted post-hoc hypotheses.”
(“Parenthetically,” added the Scientist, “there are plenty of other signals of overfitting I take into account when deciding how much to subsidize new traders—like where I heard about them, and whether they match my biases and society’s biases, and so on. Indeed, there are enough such signals that perhaps it’s best to think of this as a process of many traders bidding on the question of how easy/hard it would have been for the clever arguer to have picked out this specific trader in advance. But this is getting into the weeds—the key point is that simplicity needs to be extra-strongly-prioritized when evaluating new traders on past data.”)
Cool connections! Resonates with how I’ve been thinking about intelligence and learning lately. Some more connections:
Indeed, those savvier traders might even push me to go look up that data (using, perhaps, some kind of internal action auction), in order to more effectively take the simple trader’s money
That’s reward/exploration hacking. Although I do think most times we “look up some data” in real life it’s not due to an internal heuristic / subagent being strategic enough to purposefully try and exploit others, but rather just because some earnest simple heuristics recommending to look up information have scored well in the past.
They haven’t taken its money yet,” said the Scientist, “But they will before it gets a chance to invest any of my money
I think this doesn’t always happen. As good as the internal traders might be, the agent sometimes needs to explore, and that means giving up some of the agent’s money.
Now, if I were an ideal Garrabrant inductor I would ignore these arguments, and only pay attention to these new traders’ future trades. But I have not world enough or time for this; so I’ve decided to subsidize new traders based on how they would have done if they’d been trading earlier.
Here (starting at “Put in terms of Logical Inductors”) I mention other “computational shortcuts” for inductors. Mainly, if two “categories of bets” seem pretty unrelated (they are two different specialized magisteria), then not having thick trade between them won’t lose you out on much performance (and will avoid much computation). You can have “meta-traders” betting on which categories of bets are unrelated (and testing them but only sparsely, etc.), and use them to make your inductor more computationally efficient. Of course object-level traders already do this (decide where to look, etc.), and in the limit this will converge like a Logical Inductor, but I have the intuition this will converge faster (at least, in structured enough domains). This is of course very related to my ideas and formalism on meta-heuristics.
helps prevent clever arguers from fooling me (and potentially themselves) with overfitted post-hoc hypotheses
This adversarial selection is also a problem for heuristic arguments: Your heuristic estimator might be very good at assessing likelihoods given a list of heuristic arguments, but what if the latter has been selected against your estimator, top drive it in a wrong direction? Last time I discussed this with them (very long ago), they were just happy to pick an apparently random process to generate the heuristic arguments, that they’re confident enough hasn’t been tampered with. Something more ambitious would be to have the heuristic estimator also know about the process that generated the list of heuristic arguments, and use these same heuristic arguments to assess whether something fishy is going on. This will never work perfectly, but probably helps a lot in practice. (And I think this is for similar reasons to why deception might be hard: When not the output, but also the “thoughts”, of the generating process are scrutinized, it seems hard for it to scheme without being caught.)
“Okay, let me give it a shot,” said the Scientist. “Suppose you model me as having a bunch of subagents who make trades on some kind of internal prediction market. The whole time I’ve been watching Ponzi Pyramid Incorporated, I’ve had a very simple and dumb internal trader who has been making a bunch of money betting that they will keep going up by 20%. Of course, my mind contains a whole range of other traders too, so this one isn’t able to swing the market by itself, but what I mean by ‘cool’ is that this trader does have a bunch of money now! (More than others do, because in my internal prediction markets, simpler traders start off with more money.)”
“The problem,” said the Epistemologist, “is that you’re in an adversarial context, where the observations you’re seeing have been designed to make that particular simple trader rich. In that context, you shouldn’t be giving those simple traders so much money to start off with; they’ll just continue being exploited until you learn better.”
“But is that the right place to intervene? After all, my internal prediction market is itself an adversarial process. And so the simple internal trader who just predicts that things will continue going up the same amount every year will be exploited by other internal traders as soon as it dares to venture a bet on, say, the returns of the previous company that our friend the Spokesperson worked at. Indeed, those savvier traders might even push me to go look up that data (using, perhaps, some kind of internal action auction), in order to more effectively take the simple trader’s money.”
“You claim,” said the Epistemologist, “to have these more sophisticated internal traders. Yet you started this conversation by defending the coolness, aka wealth, of the trader corresponding to the Spokesperson’s predictions. So it seems like these sophisticated internal traders are not doing their work so well after all.”
“They haven’t taken its money yet,” said the Scientist, “But they will before it gets a chance to invest any of my money. Nevertheless, your point is a good one; it’s not very cool to only have money temporarily. Hmmm, let me muse on this.”
The Scientist thinks for a few minutes, then speaks again.
”I’ll try another attempt to describe what I mean by ‘cool’. Often-times, clever arguers suggest new traders to me, and point out that those traders would have made a lot of money if they’d been trading earlier. Now, if I were an ideal Garrabrant inductor I would ignore these arguments, and only pay attention to these new traders’ future trades. But I have not world enough or time for this; so I’ve decided to subsidize new traders based on how they would have done if they’d been trading earlier. Of course, though, this leaves me vulnerable to clever arguers inventing overfitted traders. So the subsidy has to be proportional to how likely it is that the clever arguer could have picked out this specific trader in advance. And for all Spokesperson’s flaws, I do think that 5 years ago he was probably saying something that sounded reasonably similar to ’20% returns indefinitely!′ That is the sense in which his claim is cool.”
“Hmm,” said the Epistemologist. “An interesting suggestion, but I note that you’ve departed from the language of traders in doing so. I feel suspicious that you’re smuggling something in, in a way which I can’t immediately notice.”
“Right, which would be not very cool. Alas, I feel uncertain about how to put my observation into the language of traders. But… well, I’ve already said that simple traders start off with more money. So perhaps it’s just the same thing as before, except that when evaluating new traders on old data I put extra weight on simplicity when deciding how much money they start with—because now it also helps prevent clever arguers from fooling me (and potentially themselves) with overfitted post-hoc hypotheses.”
(“Parenthetically,” added the Scientist, “there are plenty of other signals of overfitting I take into account when deciding how much to subsidize new traders—like where I heard about them, and whether they match my biases and society’s biases, and so on. Indeed, there are enough such signals that perhaps it’s best to think of this as a process of many traders bidding on the question of how easy/hard it would have been for the clever arguer to have picked out this specific trader in advance. But this is getting into the weeds—the key point is that simplicity needs to be extra-strongly-prioritized when evaluating new traders on past data.”)
Cool connections! Resonates with how I’ve been thinking about intelligence and learning lately.
Some more connections:
That’s reward/exploration hacking.
Although I do think most times we “look up some data” in real life it’s not due to an internal heuristic / subagent being strategic enough to purposefully try and exploit others, but rather just because some earnest simple heuristics recommending to look up information have scored well in the past.
I think this doesn’t always happen. As good as the internal traders might be, the agent sometimes needs to explore, and that means giving up some of the agent’s money.
Here (starting at “Put in terms of Logical Inductors”) I mention other “computational shortcuts” for inductors. Mainly, if two “categories of bets” seem pretty unrelated (they are two different specialized magisteria), then not having thick trade between them won’t lose you out on much performance (and will avoid much computation).
You can have “meta-traders” betting on which categories of bets are unrelated (and testing them but only sparsely, etc.), and use them to make your inductor more computationally efficient. Of course object-level traders already do this (decide where to look, etc.), and in the limit this will converge like a Logical Inductor, but I have the intuition this will converge faster (at least, in structured enough domains).
This is of course very related to my ideas and formalism on meta-heuristics.
This adversarial selection is also a problem for heuristic arguments: Your heuristic estimator might be very good at assessing likelihoods given a list of heuristic arguments, but what if the latter has been selected against your estimator, top drive it in a wrong direction?
Last time I discussed this with them (very long ago), they were just happy to pick an apparently random process to generate the heuristic arguments, that they’re confident enough hasn’t been tampered with.
Something more ambitious would be to have the heuristic estimator also know about the process that generated the list of heuristic arguments, and use these same heuristic arguments to assess whether something fishy is going on. This will never work perfectly, but probably helps a lot in practice.
(And I think this is for similar reasons to why deception might be hard: When not the output, but also the “thoughts”, of the generating process are scrutinized, it seems hard for it to scheme without being caught.)