DeepMind Gemini Safety lead; Foundation board member
Dave Orr
You should ignore the EY style “no future” takes when thinking about your future. This is because if the world is about to end, nothing you do will matter much. But if the world isn’t about to end, what you do might matter quite a bit—so you should focus on the latter.
One quick question to ask yourself is: are you more likely to have an impact on technology, or on policy? Either one is useful. (If neither seems great, then consider earning to give, or just find a way to add value in society in other ways.)
Once you figure that out, the next step is almost certainly building relevant skills, knowledge, and networks. Connect with senior folks with relevant roles, ask and otherwise try to figure out what skills and such are useful, try to get some experience by working or volunteering with great people or organizations.
Do that for a while and I bet some gaps and opportunities will become pretty clear. 😀
I agree that it’s bad to raise a child in an environment of extreme anxiety. Don’t do that.
Also try to avoid being very doomy and anxious in general, it’s not a healthy state to be in. (Easier said than done, I realize.)
I think you should have a kid if you would have wanted one without recent AI progress. Timelines are still very uncertain, and strong AGI could still be decades away. Parenthood is strongly value creating and extremely rewarding (if hard at times) and that’s true in many many worlds.
In fact it’s hard to find probable worlds where having kids is a really bad idea, IMO. If we solve alignment and end up in AI utopia, having kids is great! If we don’t solve alignment and EY is right about what happens in a fast takeoff world, it doesn’t really matter if you have kids or not.
In that sense, it’s basically a freeroll, though of course there are intermediate outcomes. I don’t immediately see any strong argument in favor of not having kids if you would otherwise want them.
The thing you’re missing is called instruction tuning. You gather a series of prompt/response pairs and fine tune the model over that data. Do it right and you have a chatty model.
Thanks, Zvi, these roundups are always interesting.
I have one small suggestion, which is that you limit yourself to one Patrick link per post. He’s an interesting guy but his area is quite niche, and if people want his fun stories about banking systems they can just follow him. I suspect that people who care about those things already follow him, and people who don’t aren’t that interested to read four items from him here.
I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done. E.g. the comparison to other risk policies highlights lack of detail in various ways.
I think it takes a lot of time and work to build our something with lots of analysis and detail, years of work potentially to really do it right. And yes, much of that work hasn’t happened yet.
But I would rather see labs post the work they are doing as they do it, so people can give feedback and input. If labs do so, the frameworks will necessarily be much less detailed than they would if we waited until they were complete.
So it seems to me that we are in a messy process that’s still very early days. Feedback about what is missing and what a good final product would look like is super valuable, thank you for your work doing that. I hope the policy folks pay close attention.
But I think your view that RSPs are the wrong direction is misguided, or at least I don’t find your reasons to be persuasive—there’s much more work to be done before they’re good and useful, but that doesn’t mean they’re not valuable. Honestly I can’t think of anything much better that could have been reasonably done given the limited time and resources we all have.
I think your comments on the name are well taken. I think your ideas about disclaimers and such are basically impossible for a modern corporation, unfortunately. I think your suggestion about pushing for risk management in policy are the clear next step, that’s only enabled by the existence of an RSP in the first place.
Thanks for the detailed and thoughtful effortpost about RSPs!
I agree with all of this. It’s what I meant by “it’s up to all of us.”
It will be a signal of how things are going if I’m a year we still have only vague policies, or if there has been real progress in operationalizing the safety levels, detection, what the right reactions are, etc.
I think there are two paths, roughly, that RSPs could send us down.
RSPs are a good starting point. Over time we make them more concrete, build out the technical infrastructure to measure risk, and enshrine them in regulation or binding agreements between AI companies. They reduce risk substantially, and provide a mechanism whereby we can institute a global pause if necessary, which seems otherwise infeasible right now.
RSPs are a type of safety-washing. They provide the illusion of a plan, but as written they are so vague as to be meaningless. They let companies claim they take safety seriously but don’t meaningfully reduce risk, and in fact may increase it by letting companies skate by without doing real work, rather than forcing companies to act responsibly by just not developing a dangerous uncontrollable technology.
If you think that Anthropic and other labs that adopt these are fundamentally well meaning and trying to do the right thing, you’ll assume that we are by default heading down path #1. If you are more cynical about how companies are acting, then #2 may seem more plausible.
My feeling is that Anthropic et al are clearly trying to do the right thing, and that it’s on us to do the work to ensure that we stay on the good path here, by working to deliver the concrete pieces we need, and to keep the pressure on AI labs to take these ideas seriously. And to ask regulators to also take concrete steps to make RSPs have teeth and enforce the right outcomes.
But I also suspect that people on the more cynical side aren’t going to be persuaded by a post like this. If you think that companies are pretending to care about safety but really are just racing to make $$, there’s probably not much to say at this point other than, let’s see what happens next.
New York City Mayor Eric Adams has been using ElevenLabs AI to create recordings of him in languages he does not speak and using them for robocalls. This seems pretty not great.
Can you say more about why you think this is problematic? Recording his own voice for a robocall is totally fine, so the claim here is that AI involvement makes it bad?
Yes he should disclose somewhere that he’s doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.
FWIW as an executive working on safety at Google, I basically never consider my normal working activities in light of what they would do to Google’s stock price.
The exception is around public communication. There I’m very careful because it’s asymmetrical—I could potentially cause a pr disaster that would affect the stock, but I don’t see how I could give a talk that’s so good that it helps it.
Maybe a plug pulling situation would be different, but I also think it’s basically impossible for it to be a unilateral situation, and if we’re in such a moment, I hardly think any damage would be contained to Google’s stock price, versus say the market as a whole.
How much do you think that your decisions affect Google’s stock price? Yes maybe more AI means a higher price, but on the margin how much will you be pushing that relative to a replacement AI person? And mostly the stock price fluctuates on stuff like how well the ads business is doing, macro factors, and I guess occasionally whether we gave a bad demo.
It feels to me like the incentive is just so diffuse that I wouldn’t worry about it much.
Your idea of just donating extra gains also seems fine.
That’s not correct, or at least not how my Google stock grants work. The price is locked in at grant time, not vest time. In practice what that means is that you get x shares every month, which counts as income when multiplied by the current stock price.
And then you can sell them or whatever, including having a policy that automatically sells them as soon as they vest.
The star ratings are an improvement, I had felt also that breakthrough was overselling many of the items last week.
However, stars are very generic and don’t capture the concept of a breakthrough very well. You could consider a lightbulb.
I also asked chatgpt to create an emoji of an AI breakthrough, and after some iteration it came up with this: https://photos.app.goo.gl/sW2TnqDEM5FzBLdPA
Use it if you like it!
Thanks for putting together this roundup, I learn things from it every time.
I agree with this.
Consider a hypothetical: there are two drugs we could use to execute prisoners convinced with the death penalty. One of them causes excruciating pain, the other does not, but costs more.
Would we feel that we would rather use the torture drug later? After all, the dude is dead, so he doesn’t care either way.
I have a pretty strong intuition that those drugs are not similar. Same thing with the anesthesia example.
HT Michael Thiessen, who expects this to result in people figuring out how to extract the (distilled) model weights. Is that inevitable?
Not speaking for Google here.
I think it’s inevitable, or at least it’s impossible to stop someone willing to put in the effort. The weights are going to be loaded into the phone’s memory, and a jailbroken phone should let you have access to the raw memory.
But it’s a lot of effort and I’m not sure what the benefit would be to anyone. My guess is that if this happens it will be by a security researcher or some enterprising grad student, not by anyone actually motivated to use the weights for anything in particular.
I could see the illustrations via RSS, but don’t see them here, chrome on mobile.
I assume you’ve seen these, but if not, there are some relevant papers here: https://scholar.google.com/scholar?q=deepmind+reinforcement+learning+cooling+data+center&hl=en&as_sdt=0&as_vis=1&oi=scholart
The main place we differ is that we are on opposite sides of the ‘will Tether de-peg?’ market. No matter what they did in the past, I now see a 5% safe return as creating such a good business that no one will doubt ability to pay. Sometimes they really do get away with it, ya know?
This seems sensible, but I remember thinking something very similar about Full Tilt, and then they turned out to be doing a bunch of shady shit that was very not in their best interest. I think there’s a significant chance that fraudsters gonna fraud even when they really shouldn’t, and Tether in particular has such a ridiculous background that it just seems very possible that they will take unnecessary risks, lend money when they shouldn’t, etc, just because people do what they’ve been doing all too often.
Pradyumna: You a reasonable person: the city should encourage carpooling to reduce congestion
Bengaluru’s Transport Department (a very stable genius): Taxi drivers complained and so we will ban carpooling
It’s not really that Bangalore banned carpooling, they required licenses for ridesharing apps. Maybe that’s a de facto ban of those apps, but that’s a far cry from banning carpooling in general.
We’re here to test the so-called tower of babel theory. What if, due to some bizarre happenstance, humanity had thousands of languages that change all the time instead of a single universal language like all known intelligent species?