Absolutely, wireheading is a real phenomenon, so the question is how can real agents exist that mostly don’t fall to it. And I was asking for a story about how your model can be altered/expanded to make sense of that.
Ah, I see. In that case I think I disagree that it happens “by default” in this model. A few dynamics which prevent it:
If the wealthy trader makes reward easier to get, then the price of actions will go up accordingly (because other traders will notice that they can get a lot of reward by winning actions). So in order for the wealthy trader to keep making money, they need to reward outcomes which only they can achieve, which seems a lot harder.
I don’t yet know how traders would best aggregate votes into a reward function, but it should be something which has diminishing marginal return to spending, i.e. you can’t just spend 100x as much to get 100x higher reward on your preferred outcome. (Maybe quadratic voting?)
Other traders will still make money by predicting sensory observations. Now, perhaps the wealthy trader could avoid this by making observations as predictable as possible (e.g. going into a dark room where nothing happens—kinda like depression, maybe?) But this outcome would be assigned very low reward by most other traders, so it only works once a single trader already has a large proportion of the wealth.
Yep, that’s why I believe “in the limit your traders will already do this”. I just think it will be a dominant dynamic of efficient agents in the real world, so it’s better to represent it explicitly
IMO the best way to explicitly represent this is via a bias towards simpler traders, who will in general pay attention to fewer things.
But actually I don’t think that this is a “dominant dynamic” because in fact we have a strong tendency to try to pull different ideas and beliefs together into a small set of worldviews. And so even if you start off with simple traders who pay attention to fewer things, you’ll end up with these big worldviews that have opinions on everything. (These are what I call frames here.)
they need to reward outcomes which only they can achieve,
Yep! But this didn’t seem so hard for me to happen, especially in the form of “I pick some easy task (that I can do perfectly), and of course others will also be able to do it perfectly, but since I already have most of the money, if I just keep investing my money in doing it I will reign forever”. You prevent this from happening through epsilon-exploration, or something equivalent like giving money randomly to other traders. These solutions feel bad, but I think they’re the only real solutions. Although I also think stuff about meta-learning (traders explicitly learn about how they should learn, etc.) probably pragmatically helps make these failures less likely.
it should be something which has diminishing marginal return to spending
Yep, that should help (also at the trade-off of making new good ideas slower to implement, but I’m happy to make that trade-off).
But actually I don’t think that this is a “dominant dynamic” because in fact we have a strong tendency to try to pull different ideas and beliefs together into a small set of worldviews
Yeah. To be clear, the dynamic I think is “dominant” is “learning to learn better”. Which I think is not equivalent to simplicity-weighing traders. It is instead equivalent to having some more hierarchichal structure on traders.
Ah, I see. In that case I think I disagree that it happens “by default” in this model. A few dynamics which prevent it:
If the wealthy trader makes reward easier to get, then the price of actions will go up accordingly (because other traders will notice that they can get a lot of reward by winning actions). So in order for the wealthy trader to keep making money, they need to reward outcomes which only they can achieve, which seems a lot harder.
I don’t yet know how traders would best aggregate votes into a reward function, but it should be something which has diminishing marginal return to spending, i.e. you can’t just spend 100x as much to get 100x higher reward on your preferred outcome. (Maybe quadratic voting?)
Other traders will still make money by predicting sensory observations. Now, perhaps the wealthy trader could avoid this by making observations as predictable as possible (e.g. going into a dark room where nothing happens—kinda like depression, maybe?) But this outcome would be assigned very low reward by most other traders, so it only works once a single trader already has a large proportion of the wealth.
IMO the best way to explicitly represent this is via a bias towards simpler traders, who will in general pay attention to fewer things.
But actually I don’t think that this is a “dominant dynamic” because in fact we have a strong tendency to try to pull different ideas and beliefs together into a small set of worldviews. And so even if you start off with simple traders who pay attention to fewer things, you’ll end up with these big worldviews that have opinions on everything. (These are what I call frames here.)
Yep! But this didn’t seem so hard for me to happen, especially in the form of “I pick some easy task (that I can do perfectly), and of course others will also be able to do it perfectly, but since I already have most of the money, if I just keep investing my money in doing it I will reign forever”. You prevent this from happening through epsilon-exploration, or something equivalent like giving money randomly to other traders. These solutions feel bad, but I think they’re the only real solutions. Although I also think stuff about meta-learning (traders explicitly learn about how they should learn, etc.) probably pragmatically helps make these failures less likely.
Yep, that should help (also at the trade-off of making new good ideas slower to implement, but I’m happy to make that trade-off).
Yeah. To be clear, the dynamic I think is “dominant” is “learning to learn better”. Which I think is not equivalent to simplicity-weighing traders. It is instead equivalent to having some more hierarchichal structure on traders.