I’ve been working on AI safety for a while now. It’s going better than expected, but I have finite hours. The more hours I spend on safety, the less I can spend on business-oriented things. Those business-oriented things are long term (4+ years) high risk attempts at earning to give.
My timelines aren’t long, so doing some direct work on AI safety seems wise so long as I’m not completely killing the businessy side of things.
But if I could find an actually-useful way to share work, it could save years. So:
Suppose you’ve got a person with a background in high performance real time physics simulation, bleeding edge low latency rendering, aggressive optimization, and most other low-level videogamey things you need for making a physics-heavy 3D multi-user game-like application.
Is there a project you would want to see them develop for the sake of AI safety?
So far, my ideas for this have a strong flavor of searching under the streetlight, and they don’t seem higher value than my current strategy of fully separate research.
One example: A multi-agent simulation with rich physical interactions to explore (and attempt to break) forms of corrigibility in a complex instrumented environment.
Good: - Perfectly transparent physical simulations give you a lot of easy options for analysis compared to pure language. (Judging if a given block of text violated another player’s values in some way is not trivial; judging if the agent stomped on another player’s head is trivial.)
Questionable: - Is the marginal value of the deeper physics simulation actually enough to bother doing this, compared to gridworld-esque options? - Concretely, what research would this assist that could not get done otherwise?
Bad: - Releasing a whole framework for this kind of thing as an open source project- assuming it was decent and flexible- would almost unavoidably be more useful for not-safety. It might not be the kind of not-safety that is dangerous, given scale, but it’s still not-safety.
It’s not clear to me there is any great option here, but… if there’s something in this space you really want to see, let me know!
[Question] How would you use video gamey tech to help with AI safety?
I’ve been working on AI safety for a while now. It’s going better than expected, but I have finite hours. The more hours I spend on safety, the less I can spend on business-oriented things. Those business-oriented things are long term (4+ years) high risk attempts at earning to give.
My timelines aren’t long, so doing some direct work on AI safety seems wise so long as I’m not completely killing the businessy side of things.
But if I could find an actually-useful way to share work, it could save years. So:
Suppose you’ve got a person with a background in high performance real time physics simulation, bleeding edge low latency rendering, aggressive optimization, and most other low-level videogamey things you need for making a physics-heavy 3D multi-user game-like application.
Is there a project you would want to see them develop for the sake of AI safety?
So far, my ideas for this have a strong flavor of searching under the streetlight, and they don’t seem higher value than my current strategy of fully separate research.
One example:
A multi-agent simulation with rich physical interactions to explore (and attempt to break) forms of corrigibility in a complex instrumented environment.
Good:
- Perfectly transparent physical simulations give you a lot of easy options for analysis compared to pure language. (Judging if a given block of text violated another player’s values in some way is not trivial; judging if the agent stomped on another player’s head is trivial.)
Questionable:
- Is the marginal value of the deeper physics simulation actually enough to bother doing this, compared to gridworld-esque options?
- Concretely, what research would this assist that could not get done otherwise?
Bad:
- Releasing a whole framework for this kind of thing as an open source project- assuming it was decent and flexible- would almost unavoidably be more useful for not-safety. It might not be the kind of not-safety that is dangerous, given scale, but it’s still not-safety.
It’s not clear to me there is any great option here, but… if there’s something in this space you really want to see, let me know!