I get the impression from talking to people who work professionally on reducing ai x-risk (largely taken, would include doing ops for FLI) that less than half of them could give an accurate and logically sound model of where ai x-risk comes from and why it is good that they do the work they do (bonus points if they can compare it to other work they might be doing instead). I’m generally a bit disappointed by this because to me it doesn’t seem that hard to get everyone who’s a professional knowledgeable on the basics, and it seems worthwhile as more people could be autonomous in assessing strategic decisions and making sure implementation of plans serves the right aims.
An example of why having a model of x-risk and directions for reducing it matters : Imagine you’re an ops person with money to organize a half-day conference. What do? What city to choose, which participants to invite?
If you have no model of anything, I guess you can copycat—do a small scale EAG-like event for AI Safety people, or do a scientific conference like thing. That’s okay, probably not a total waste of money (except to the extent you don’t know ai safety theory, it’s possible to do counter-productive actions like giving a platform to the wrong people).
Imagine you have the following simple model : - AI is being improved by people - If deployed with the current alignment tech and governance level, powerful autonomous AGI could have convergent instrumental goals that lead to takeover (sharp left turn models) or large scale deployment of AGI in society can eek out human influence (Out with a Whimper or Multipolar Failure) - Thus we should do actions that get us the best alignment tech and governance level by the time anyone is able to train and deploy dangerous IA systems. - This means we could do actions that - - Slow down dangerous AI development - - Accelerate alignment tech development - - Accelerate getting a good level of governance
Each of these need sub-models (all interlinked), let’s detail a simple one for governance Things that accelerate getting a good level of governance - Better knowledge and models of ai x-risk (scenario+risk planning), demonstrations of precursors (model organisms of misalignment, ..), increasing scientific consensus about the paths to avoid - Spreading the above knowledge to the general public (to have politicians be supported in economy-limiting policy) - Spreading the above knowledge to policy-makers - Having technically competent people write increasingly better policy proposals (probably stuff leading to international regulation of dangerous AI)
Now that you have these, it’s much easier to find a few key metrics to optimize for your ops event. Doing it really well might include you talking to other event organizers to know what has and hasn’t been done, what works, what’s been neglected etc, but even without all that you can decide to act on: Improvement in attendees’ knowledge and models of AI x-risk Pre-event and post-event surveys or informal conversations to gauge attendees’ understanding of AI risk scenarios and mitigation strategies. Structure the event with interactive sessions explicitly designed to clarify AI x-risk models, scenario planning, and concrete policy implications. Potential reach for disseminating accurate AI safety knowledge Choose guests who have strong networks or influence with policy-makers, academics, or public media. Select a location close to influential governmental bodies or major media centers (e.g., Washington D.C., Brussels, or Geneva).
(I realize now that I wrote a full example that this might have been a mini post to serve as short reference, a wireframe for thinking through this, please reply if you think a cleaned up version of this would be good)
To get back to my original point, it’s currently my impression that many who work in AI x-risk reduction, for example in ops, could not produce a “good” version of this above draft even with 2 hours of their time because of lacking background knowledge. I hope the above example sufficiently illustrates that they could be doing better work if they did.
I get the impression from talking to people who work professionally on reducing ai x-risk (largely taken, would include doing ops for FLI) that less than half of them could give an accurate and logically sound model of where ai x-risk comes from and why it is good that they do the work they do (bonus points if they can compare it to other work they might be doing instead). I’m generally a bit disappointed by this because to me it doesn’t seem that hard to get everyone who’s a professional knowledgeable on the basics, and it seems worthwhile as more people could be autonomous in assessing strategic decisions and making sure implementation of plans serves the right aims.
An example of why having a model of x-risk and directions for reducing it matters : Imagine you’re an ops person with money to organize a half-day conference. What do?
What city to choose, which participants to invite?
If you have no model of anything, I guess you can copycat—do a small scale EAG-like event for AI Safety people, or do a scientific conference like thing. That’s okay, probably not a total waste of money (except to the extent you don’t know ai safety theory, it’s possible to do counter-productive actions like giving a platform to the wrong people).
Imagine you have the following simple model :
- AI is being improved by people
- If deployed with the current alignment tech and governance level, powerful autonomous AGI could have convergent instrumental goals that lead to takeover (sharp left turn models) or large scale deployment of AGI in society can eek out human influence (Out with a Whimper or Multipolar Failure)
- Thus we should do actions that get us the best alignment tech and governance level by the time anyone is able to train and deploy dangerous IA systems.
- This means we could do actions that
- - Slow down dangerous AI development
- - Accelerate alignment tech development
- - Accelerate getting a good level of governance
Each of these need sub-models (all interlinked), let’s detail a simple one for governance
Things that accelerate getting a good level of governance
- Better knowledge and models of ai x-risk (scenario+risk planning), demonstrations of precursors (model organisms of misalignment, ..), increasing scientific consensus about the paths to avoid
- Spreading the above knowledge to the general public (to have politicians be supported in economy-limiting policy)
- Spreading the above knowledge to policy-makers
- Having technically competent people write increasingly better policy proposals (probably stuff leading to international regulation of dangerous AI)
Now that you have these, it’s much easier to find a few key metrics to optimize for your ops event. Doing it really well might include you talking to other event organizers to know what has and hasn’t been done, what works, what’s been neglected etc, but even without all that you can decide to act on:
Improvement in attendees’ knowledge and models of AI x-risk
Pre-event and post-event surveys or informal conversations to gauge attendees’ understanding of AI risk scenarios and mitigation strategies.
Structure the event with interactive sessions explicitly designed to clarify AI x-risk models, scenario planning, and concrete policy implications.
Potential reach for disseminating accurate AI safety knowledge
Choose guests who have strong networks or influence with policy-makers, academics, or public media.
Select a location close to influential governmental bodies or major media centers (e.g., Washington D.C., Brussels, or Geneva).
(I realize now that I wrote a full example that this might have been a mini post to serve as short reference, a wireframe for thinking through this, please reply if you think a cleaned up version of this would be good)
To get back to my original point, it’s currently my impression that many who work in AI x-risk reduction, for example in ops, could not produce a “good” version of this above draft even with 2 hours of their time because of lacking background knowledge. I hope the above example sufficiently illustrates that they could be doing better work if they did.