Of course tons of this research is going on. Do you think people who work at Facebook or YouTube are happy that their algorithms suggest outrageous or misleading content? I know a bit about the work at YouTube (I work on an unrelated applied AI team at Google) and they are altering metrics, penalizing certain kinds of content, looking for user journeys that appear to have undesirable outcomes and figuring out what to do about them, and so on.
I’m also friends with a political science professor who consults with Facebook on similar kinds of issues, basically applying mechanism design to think about how people will act given different kinds of things in their feeds.
Also you can think about spam or abuse with AI systems, which have similar patterns. If someone figures out how to trick the quality rating system for ads into thinking this is a high quality ad, then they’ll get a discount (this is how e.g. Google search ads works). All kinds of tricky things happen, with web pages showing one thing to the ad system and a different thing to the user, for instance, or selling one thing to harvest emails to spam for something else.
In general the observation from working in the field is that if you have a simple metric, people will figure out how to game it. So you need to build in a lot of safeguards, and you need to evolve all the time as the spammers/abusers evolve. There’s no end point, no place where you think you’re done, just an ever changing competition.
I’m not sure this provides much comfort for the AGI alignment folks....
In general the observation from working in the field is that if you have a simple metric, people will figure out how to game it. So you need to build in a lot of safeguards, and you need to evolve all the time as the spammers/abusers evolve. There’s no end point, no place where you think you’re done, just an ever changing competition.
That’s what I was trying to point at in regards to the problem not being patchable. It doesn’t seem like there is some simple patch you can write, and then be done. A solution that would work more permanently seems to have some of the “impossible” character of AGI alignment and trying to solve it on that level seems like it could be valuable for AGI alignment researchers.
Of course tons of this research is going on. Do you think people who work at Facebook or YouTube are happy that their algorithms suggest outrageous or misleading content? I know a bit about the work at YouTube (I work on an unrelated applied AI team at Google) and they are altering metrics, penalizing certain kinds of content, looking for user journeys that appear to have undesirable outcomes and figuring out what to do about them, and so on.
I’m also friends with a political science professor who consults with Facebook on similar kinds of issues, basically applying mechanism design to think about how people will act given different kinds of things in their feeds.
Also you can think about spam or abuse with AI systems, which have similar patterns. If someone figures out how to trick the quality rating system for ads into thinking this is a high quality ad, then they’ll get a discount (this is how e.g. Google search ads works). All kinds of tricky things happen, with web pages showing one thing to the ad system and a different thing to the user, for instance, or selling one thing to harvest emails to spam for something else.
In general the observation from working in the field is that if you have a simple metric, people will figure out how to game it. So you need to build in a lot of safeguards, and you need to evolve all the time as the spammers/abusers evolve. There’s no end point, no place where you think you’re done, just an ever changing competition.
I’m not sure this provides much comfort for the AGI alignment folks....
That’s what I was trying to point at in regards to the problem not being patchable. It doesn’t seem like there is some simple patch you can write, and then be done. A solution that would work more permanently seems to have some of the “impossible” character of AGI alignment and trying to solve it on that level seems like it could be valuable for AGI alignment researchers.