Cool connections! Resonates with how I’ve been thinking about intelligence and learning lately. Some more connections:
Indeed, those savvier traders might even push me to go look up that data (using, perhaps, some kind of internal action auction), in order to more effectively take the simple trader’s money
That’s reward/exploration hacking. Although I do think most times we “look up some data” in real life it’s not due to an internal heuristic / subagent being strategic enough to purposefully try and exploit others, but rather just because some earnest simple heuristics recommending to look up information have scored well in the past.
They haven’t taken its money yet,” said the Scientist, “But they will before it gets a chance to invest any of my money
I think this doesn’t always happen. As good as the internal traders might be, the agent sometimes needs to explore, and that means giving up some of the agent’s money.
Now, if I were an ideal Garrabrant inductor I would ignore these arguments, and only pay attention to these new traders’ future trades. But I have not world enough or time for this; so I’ve decided to subsidize new traders based on how they would have done if they’d been trading earlier.
Here (starting at “Put in terms of Logical Inductors”) I mention other “computational shortcuts” for inductors. Mainly, if two “categories of bets” seem pretty unrelated (they are two different specialized magisteria), then not having thick trade between them won’t lose you out on much performance (and will avoid much computation). You can have “meta-traders” betting on which categories of bets are unrelated (and testing them but only sparsely, etc.), and use them to make your inductor more computationally efficient. Of course object-level traders already do this (decide where to look, etc.), and in the limit this will converge like a Logical Inductor, but I have the intuition this will converge faster (at least, in structured enough domains). This is of course very related to my ideas and formalism on meta-heuristics.
helps prevent clever arguers from fooling me (and potentially themselves) with overfitted post-hoc hypotheses
This adversarial selection is also a problem for heuristic arguments: Your heuristic estimator might be very good at assessing likelihoods given a list of heuristic arguments, but what if the latter has been selected against your estimator, top drive it in a wrong direction? Last time I discussed this with them (very long ago), they were just happy to pick an apparently random process to generate the heuristic arguments, that they’re confident enough hasn’t been tampered with. Something more ambitious would be to have the heuristic estimator also know about the process that generated the list of heuristic arguments, and use these same heuristic arguments to assess whether something fishy is going on. This will never work perfectly, but probably helps a lot in practice. (And I think this is for similar reasons to why deception might be hard: When not the output, but also the “thoughts”, of the generating process are scrutinized, it seems hard for it to scheme without being caught.)
Cool connections! Resonates with how I’ve been thinking about intelligence and learning lately.
Some more connections:
That’s reward/exploration hacking.
Although I do think most times we “look up some data” in real life it’s not due to an internal heuristic / subagent being strategic enough to purposefully try and exploit others, but rather just because some earnest simple heuristics recommending to look up information have scored well in the past.
I think this doesn’t always happen. As good as the internal traders might be, the agent sometimes needs to explore, and that means giving up some of the agent’s money.
Here (starting at “Put in terms of Logical Inductors”) I mention other “computational shortcuts” for inductors. Mainly, if two “categories of bets” seem pretty unrelated (they are two different specialized magisteria), then not having thick trade between them won’t lose you out on much performance (and will avoid much computation).
You can have “meta-traders” betting on which categories of bets are unrelated (and testing them but only sparsely, etc.), and use them to make your inductor more computationally efficient. Of course object-level traders already do this (decide where to look, etc.), and in the limit this will converge like a Logical Inductor, but I have the intuition this will converge faster (at least, in structured enough domains).
This is of course very related to my ideas and formalism on meta-heuristics.
This adversarial selection is also a problem for heuristic arguments: Your heuristic estimator might be very good at assessing likelihoods given a list of heuristic arguments, but what if the latter has been selected against your estimator, top drive it in a wrong direction?
Last time I discussed this with them (very long ago), they were just happy to pick an apparently random process to generate the heuristic arguments, that they’re confident enough hasn’t been tampered with.
Something more ambitious would be to have the heuristic estimator also know about the process that generated the list of heuristic arguments, and use these same heuristic arguments to assess whether something fishy is going on. This will never work perfectly, but probably helps a lot in practice.
(And I think this is for similar reasons to why deception might be hard: When not the output, but also the “thoughts”, of the generating process are scrutinized, it seems hard for it to scheme without being caught.)