whpearson comments on Is Google Paperclipping the Web? The Perils of Optimization by Proxy in Social Systems

whpearson 22 May 2010 11:42 UTC
1 point
0
I’m still not quite getting how this is going to work.

Lets say I am a spam blog bot. What it does is take popular (for a niche) articles and reposts automated summaries. So lets say it does this for cars. These aren’t very good, but aren’t very bad either. Perhaps it makes automatic word changes to real peoples summaries. It gets lots of other spam bots of this type and they form self-supportive networks (each up voting each other) and also liking popular things to do with cars. People come across these links and up vote them, because they go somewhere interesting. They gain lots of karma in these communities and then start pimping car related products or spreading FUD about rival companies. Automated astro-turf if you want.

Does anyone regulate the creation of new users?

How long before they stop being interesting to the car people? Or how much effort would it be to track them down and remove them from the circle of people you are interested in.

Also who keeps track of these votes? Can people ballot stuff?

I’ve thought a long these lines before and realised it is a non-trivial problem.
- avalot 22 May 2010 17:59 UTC
  0 points
  0
  Parent
  There’s a few questions in there. Let’s see.
  
  Authentication and identity are an interesting issue. My concept is to allow anonymous users, with a very low initial influence level. But there would be many ways for users to strengthen their “identity score” (credit card verification, address verification via snail-mailed verif code, etc.), which would greatly and rapidly increase their influence score. A username that is tied to a specific person, and therefore wields much more influence, could undo the efforts of 100 bots with a single downvote.
  
  But if you want to stay anonymous, you can. You’ll just have to patiently work on earning the same level of trust that is awarded to people who put their real-life reputation on the line.
  
  I’m also conceiving of a richly semantic system, where simply “upvoting” or facebook-liking are the least influential actions one can take. Up from there, you can rate content on many factors, comment on it, review it, tag it, share it, reference it, relate it to other content. The more editorial and cerebral actions would probably do more to change one’s influence than a simple thumbs up. If a bot can compete with a human in writing content that gets rated high on “useful”, “factual”, “verifiable”, “unbiased”, AND “original” (by people who have high influence score in these categories), then I think the bot deserves a good influence score, because it’s a benevolent AI. ;)
  
  Another concept, which would reduce incentives to game the system, is vouching. You can vouch for other users’ identity, integrity, maturity, etc. If you vouched for a bot, and the bot’s influence gets downgraded by the community, your influence will take a hit as well.
  
  I see this happening throughout the system: Every time you exert your influence, you take responsibility for that action, as anyone may now rate/review/downvote your action. If you stand behind your judgement of Rush Limbaugh as truthful, enough people will disagree with you that from that point on, anytime you rate something as “truthful”, that rating will count for very little.
  - Alexandros 24 May 2010 8:34 UTC
    2 points
    0
    Parent
    Hi avalot, thank you for the detailed discussion. I suspect the system I have in mind is simpler but should satisfy the same principles. In fact it has been eerie reading your post, as on principle we are in 95% agreement, to excruciation detail, and to a large extent on technical behaviour. I guess my one explicit difference is that I cannot let go of the profit motive. If I make a substantial contribution, I would like to be properly rewarded, if only to be able to materialize other ideas and contribute to causes I find worthy. That of course does not imply going to facebook’s lengths to squeeze the last drop of value out of its system, nor should it take precedence over openness and distribution. But to the extent that it can fit, I would like it to be there. Two questions for you:
    
    First, with everyone rating everyone, how do you avoid your system becoming a keynesian beauty contest? (http://en.wikipedia.org/wiki/Keynesian_beauty_contest)
    
    Second, assuming the number of connections increase exponentially with a linear increase in users, the processing load will also rise much quicker than the number of users. How will a system like this operate at web-scale?
    - avalot 24 May 2010 15:49 UTC
      1 point
      0
      Parent
      Alexandros,
      
      Not surprised that we’re thinking along the same lines, if we both read this blog! ;)
      
      I love your questions. Let’s do this:
      
      Keynesian Beauty Contest: I don’t have a silver bullet for it, but a lot of mitigation tactics. First of all, I envision offering a cascading set of progressively more fine-grained rating attributes, so that, while you can still upvote or downvote, or rate something with starts, you can also rate it on truthfulness, entertainment value, fairness, rationality (and countless other attributes)… More nuanced ratings would probably carry more influence (again, subject to others’ cross-rating). Therefore, to gain the highest levels of influence, you’d need to be nuanced in your ratings of content… gaming the system with nuanced, detailed opinions might be effectively the same as providing value to the system. I don’t mind someone trying to figure out the general population’s nuanced preferences… that’s actually a valuable service!
      
      Secondly, your ratings are also cross-related to the semantic metadata (folksonomy of tags) of the content, so that your influence is limited to the topic at hand. Gaining a high influence score as a fashion celebrity doesn’t put your political or scientific opinions at the top of search results. Hopefully, this works as a sort of structural Palin-filter. ;)
      
      The third mitigation has to do with your second question: How do we handle the processing of millions of real-time preference data points, when all of them should (in theory) get cross-related to all others, with (theoretically) endless recursion?
      
      The typical web-based service approach of centralized crunching doesn’t make sense. I’m envisioning a distributed system where each influence node talks with a few others (a dozen?), and does some cross-processing with a them to agree on some temporary local normals, means and averages. That cluster does some more higher-level processing in consort with other close-by clusters, and they negotiate some “regional” aggregates… that gets propagated back down into the local level, and up to the next level of abstraction… up until you reach some set of a dozen superclusters that span the globe, and who trade in high-level aggregates.
      
      All that is regulated, in terms of clock ticks, by activity: Content that is being rated/shared/commented on by many people will be accessed and cached by more local nodes, and processed by more clusters, and its cross-processing will be accelerated because it’s “hot”. Whereas one little opinion on one obscure item might not get processed by servers on the other side of the world until someone there requests it. We also decay data this way: If nobody cares, the system eventually forgets. (Your personal node will remember your preferences, but the network, after having consumed their influence effects, might forget their data points.)
      
      A distributed, propagation system, batch-processed, not real-time, not atomic but aggregated. That means you can’t go back and change old ratings, and individual data points, because they get consumed by the aggregates. That means you can’t inspect what made your scored go up and down at the atomic level. That means your score isn’t the same everywhere on the planet at the same time. So gaming the system is harder because there’s no real-time feedback loop, there’s no single source of absolute truth (truth is local and propagates lazily), and there’s no auditing trail of the individual effects of your influence.
      
      All of this hopefully makes the system so fluid that it holds innumerable beauty contests, always ongoing, always local, and the results are different depending on when and where you are. Hopefully this makes the search for the Nash equilibrium a futile exercise, and people give up and just say what they actually think is valuable to others, as opposed to just expected by others.
      
      That’s my wishful thinking at the point. Am I fooling myself?
      - whpearson 25 May 2010 14:21 UTC
        0 points
        0
        Parent
        I’d create a simplified evolutionary model of the system using a GA to create the agents. If groups can find a way to game your system to create infinite interesting-ness/insightful-ness for specific topics, that then you need to change it.
        avalot 25 May 2010 20:39 UTC
        0 points
        0
        Parent
        You’re right: A system like that could be genetically evolved for optimization.
        
        On the other hand, I was hoping to create an open optimization algorithm, governable by the community at large… based on their influence scores in the field of “online influence governance.” So the community would have to notice abuse and gaming of the system, and modify policy (as expressed in the algorithm, in the network rules, in laws and regulations and in social mores) to respond to it. Kind of like democracy: Make a good set of rules for collaborative rule-making, give it to the people, and hope they don’t break it.
        
        But of course the Huns could take over. I’m trusting us to protect ourselves. In some way this would be poetic justice: If crowds can’t be wise, even when given a chance to select and filter among the members for wisdom, then I’ll give up on bootstrapping humanity and wait patiently for the singularity. Until then, though, I’d like to see how far we could go if given a useful tool for collaboration, and left to our own devices.
        Alexandros 25 May 2010 23:28 UTC
        1 point
        0
        Parent
        I think you are closer to a strong solution than you realize. You have mentioned the pieces but I think you haven’t put them together yet. In short, the solution I see is to depend on local (individual) decisions rather than group ones. If each node has its own ranking algorithm and its own set of trust relations, there is no reason to create complex group-cooperation mechanisms. A user that spams gets negative feedback and therefore eventually gets isolated in the graph. Even if automated users outnumber real users, the best they can do is vote each other up and therefore end up with their own cluster of the network, with real users only strongly connected to each other. Of course, if a bot provides value, it can be incorporated in that graph. “sufficiently advanced spam...”, etc. etc. This also means that the graph splinters into various clusters depending on worldview. (your rush limbaugh example). This deals with keynesian beauty contests as there is no ‘average’ to aim at. Your values simply cluster you with people who share them. If you value quality, you go closer to quality. If you value ‘republican-ness’ you move closer to that. The price you pay is that there is no ‘objective’ view of the system. There is no ‘top 10 articles’, only ‘top 10 articles for user X’.
        
        Another thing I see with your design is that it is complex and attempts to boil at least a few oceans. (emergent ontologies/folksonomies for one, distributing identity, storage, etc.). I have some experience with defining complex architectures for distributed systems (e.g. http://arxiv.org/abs/0907.2485 ) and the problem is that they need years of work by many people to reach some theoretical purity, and even then bootstrapping will be a bitch. The system I have in mind is extremely simple by comparison, definitely more pragmatic (and therefore makes compromises) and is based on established web technologies. As a result, it should bootstrap itself quite easily. I find myself not wanting to publicly share the full details until I can start working on the thing (I am currently writing up my PhD thesis and my deadline is Oct. 1. After that, I’m focusing on this project). If you want to talk more details, we should probably take this to a private discussion.
        avalot 26 May 2010 14:36 UTC
        0 points
        0
        Parent
        You are right: This needs to be a fully decentralized system, with no center, and processing happening at the nodes. I was conceiving of “regional” aggregates mostly as a guess as to what may relieve network congestion if every node calls out to thousands of others.
        
        Thank you for setting me right: My thinking has been so influenced by over a decade of web app dev that I’m still working on integrating the full principles of decentralized systems.
        
        As for boiling oceans… I wish you were wrong, but you probably are right. Some of these architectures are likely to be enormously hard to fine-tune for effectiveness. At the same time, I am also hoping to piggyback on existing standards and systems.
        
        Anyway, let’s certainly talk offline!