AI and integrity

Link post

The blog post contains a preamble about the OpenAI non-disparagement agreement stuff which I assume you all know.

We need to do better than relying on the integrity of individual engineers against millions of dollars paid to them to keep quiet.

Here are three suggestions:

  • Offer to match 50% of funds lost by whistleblowers.

    • This might sound like both an insult and a huge waste. But I think people really are motivated by $500k or more, that they expected to earn. That was going to go to their retirement, go on houses or kids schooling. We should consider the possibility that money like this matters.

    • Make clear that whistleblowers will be taking a hit but not a huge one. I know if i had a family and potentially millions coming to me if I stayed quiet, I would be tempted to. Altman certainly knows this—his gambit around the OpenAI board crisis shows a shrewd understanding of how income motivates his staff.

    • This is good value for money. How much would some AI safety focused foundations pay to place a person with integrity in OpenAI? I guess it is $1 - $10mn. Here you have someone who can give an accurate account of why they left a well-paying AI Safety role. I suggest that is worth a lot of money, especially if they are willing to forgo half of what they would have made. (This is a costly signal of integrity)

  • Create an integrity prize

    • There is much to celebrate here. A man looked at millions and decided he’d rather have the ability to speak honestly. That seems like the sort of behaviour we should want in the world. I want people who would hide Jews under their floorboards, I want people would walk away from interesting scientific problems to avoid building the nuclear bomb (Szilard) and I want people who value their honesty more than millions of dollars when they are developing world-changing tech.

    • Give a medium sized prize every year. Perhaps $100k as an AI Integrity prize. Find a set of judges who have demonstrated intellectual and practical integrity in the past and get them to vote every year for someone to award it to. Someone who has borne personal cost in AI to maintain their integrity.

    • Don’t be too trusting. If Kokotajlo might the inaugural award have an investigatory team on the staff to kick the tires (perhaps consider Kelsey Piper, who has a reputation of this to maintain). Aim to be 90+% confident that they will still endorse the award in 10 years.

  • Talk to your elected representatives, donate to AI safety organisations.

    • Money makes a difference. I generally think that people should pay more for the things they want to protect. So rather than relying on the honesty of a few researchers, I want to use my time and money to push for changes to the system of overall incentives.

    • Sadly this is very complex. I don’t have a specific bill or politician to recommend, but I think that giving to someone and then trying to improve next month is better than nothing.

Crossposted to EA Forum (0 points, 0 comments)