I feel like most AI safety work today doesn’t engage sufficiently with the idea that social media recommenders are the central example of a misaligned AI: a reinforcement learner with a bad objective with some form of ~online learning (most recommenders do some sort of nightly batch weight update). we can align language models all we want, but if companies don’t care and proceed to deploy language models or anything else for the purpose of maximizing engagement and with an online learning system to match, none of this will matter. we need to be able to say to the world, “here is a type of machine we all can make that will reliably defend everyone against anyone who attempts to maximize something terrible”. anything less than a switchover to a cooperative dynamic as a result of reliable omnidirectional mutual defense seems like a near guaranteed failure due to the global interaction/conflict/trade network system’s incentives. you can’t just say oh, hooray, we solved some technical problem about doing what the boss wants. the boss wants to manipulate customers, and will themselves be a target of the system they’re asking to build, just like sundar pichai has to use self-discipline to avoid being addicted by the youtube recommender same as anyone else.
you can’t just say oh, hooray, we solved some technical problem about doing what the boss wants. the boss wants to manipulate customers, and will themselves be a target of the system they’re asking to build, just like sundar pichai has to use self-discipline to avoid being addicted by the youtube recommender same as anyone else.
Agreed. I wrote about this concern (or a very similar one) here. In general I think the AI safety community seems to be too focused on intent alignment and deception to the exclusion of other risks, and have complained about this a few times before. (Let me know if you think the example you raise is adequately covered by the existing items on that list, or should have its own bullet point, and if so how would you phrase it?)
It sounds like you’re describing Maloch here. I agree entirely, but I’d go much further than you and claim “Humans aren’t aligned with eachother or even themselves” (self-dicipline is a kind of tool against internal misalignment, no?). I also think that basically all suffering and issues in the world can be said to stem from a lack of balance, which is simply just optimization gone wrong (since said optimization is always for something insatiable, unlike things like hunger, in which the desire goes away once the need is met).
Companies don’t optimize for providing value, but for their income. If they earn a trillion, they will just invest a trillion into their own growth, so that they can earn the next trillion. And all the optimal strategies exploit human weaknesses, clickbait being an easy example. In fact, it’s technology which has made this exploitation possible. So companies end up becoming tool-assisted cancers. But it’s not just companies which are the problem here, it’s everything which lives by darwinian/memetic principles. The only exception is “humanity”, which is when optimality is exchanged for positive valence. This requires direct human manipulation. Even an interface (online comments and such) are slightly dehumanized compared to direct communication. So any amount of indirectness will reduce this humanity.
Yeah. A way I like to put this is that we need to durably solve the inter being alignment problem for the first time ever. There are flaky attempts at it around to learn from, but none of them are leak proof and we’re expecting to go to metaphorical sea (the abundance of opportunity for systems to exploit vulnerability in each other) in this metaphorical boat of a civilization, as opposed to previously just boating in lakes. Or something. But yeah, core point I’m making is that the minimum bar to get out of the ai mess requires a fundamental change in incentives.
[edit: pinned to profile]
I feel like most AI safety work today doesn’t engage sufficiently with the idea that social media recommenders are the central example of a misaligned AI: a reinforcement learner with a bad objective with some form of ~online learning (most recommenders do some sort of nightly batch weight update). we can align language models all we want, but if companies don’t care and proceed to deploy language models or anything else for the purpose of maximizing engagement and with an online learning system to match, none of this will matter. we need to be able to say to the world, “here is a type of machine we all can make that will reliably defend everyone against anyone who attempts to maximize something terrible”. anything less than a switchover to a cooperative dynamic as a result of reliable omnidirectional mutual defense seems like a near guaranteed failure due to the global interaction/conflict/trade network system’s incentives. you can’t just say oh, hooray, we solved some technical problem about doing what the boss wants. the boss wants to manipulate customers, and will themselves be a target of the system they’re asking to build, just like sundar pichai has to use self-discipline to avoid being addicted by the youtube recommender same as anyone else.
Agreed. I wrote about this concern (or a very similar one) here. In general I think the AI safety community seems to be too focused on intent alignment and deception to the exclusion of other risks, and have complained about this a few times before. (Let me know if you think the example you raise is adequately covered by the existing items on that list, or should have its own bullet point, and if so how would you phrase it?)
David Chapman actually uses social media recommendation algorithms as a central example of AI that is already dangerous: https://betterwithout.ai/apocalypse-now
It sounds like you’re describing Maloch here. I agree entirely, but I’d go much further than you and claim “Humans aren’t aligned with eachother or even themselves” (self-dicipline is a kind of tool against internal misalignment, no?). I also think that basically all suffering and issues in the world can be said to stem from a lack of balance, which is simply just optimization gone wrong (since said optimization is always for something insatiable, unlike things like hunger, in which the desire goes away once the need is met).
Companies don’t optimize for providing value, but for their income. If they earn a trillion, they will just invest a trillion into their own growth, so that they can earn the next trillion. And all the optimal strategies exploit human weaknesses, clickbait being an easy example. In fact, it’s technology which has made this exploitation possible. So companies end up becoming tool-assisted cancers. But it’s not just companies which are the problem here, it’s everything which lives by darwinian/memetic principles. The only exception is “humanity”, which is when optimality is exchanged for positive valence. This requires direct human manipulation. Even an interface (online comments and such) are slightly dehumanized compared to direct communication. So any amount of indirectness will reduce this humanity.
[edit: pinned to profile]
Yeah. A way I like to put this is that we need to durably solve the inter being alignment problem for the first time ever. There are flaky attempts at it around to learn from, but none of them are leak proof and we’re expecting to go to metaphorical sea (the abundance of opportunity for systems to exploit vulnerability in each other) in this metaphorical boat of a civilization, as opposed to previously just boating in lakes. Or something. But yeah, core point I’m making is that the minimum bar to get out of the ai mess requires a fundamental change in incentives.