jacob_cannell comments on Progress and Prizes in AI Alignment

jacob_cannell 4 Jan 2017 1:53 UTC
13 points
I came to a similar conclusion a while ago: it is hard to make progress in a complex technical field when progress itself is unmeasurable or worse ill-defined.

Part of the problem may be cultural: most working in the AI safety field have math or philosophy backgrounds. Progress in math and philosophy is intrinsically hard to measure objectively; success is mostly about having great breakthrough proofs/ideas/papers that are widely read and well regarded by peers. If your main objective is to convince the world, then this academic system works fine—ex: Bostrom. If your main objective is to actually build something, a different approach is perhaps warranted.

The engineering oriented branches of Academia (and I include comp sci in this) have a very different reward structure. You can publish to gain social status just as in math/philosophy, but if your idea also has commercial potential there is the powerful additional motivator of huge financial rewards. So naturally there is far more human intellectual capital going into comp sci than math, more into deep learning than AI safety.

In a sane world we’d realize that AI safety is a public good of immense value that probably requires large-scale coordination to steer the tech-economy towards solving. The X-prize approach essentially is to decompose a big long term goal into subgoals which are then contracted to the private sector.

The high level abstract goal for the Ansari XPrize was “to usher in a new era of private space travel”. The specific derived prize subgoal was then “to build a reliable, reusable, privately financed, manned spaceship capable of carrying three people to 100 kilometers above the Earth’s surface twice within two weeks”.

AI safety is a huge bundle of ideas, but perhaps the essence could be distilled down to: “create powerful AI which continues to do good even after it can take over the world.”

For the Ansari XPrize, the longer term goal of “space travel” led to the more tractable short term goal of “100 kilometers above the Earth’s surface twice within two weeks”. Likewise, we can replace “the world” in the AI safety example:

AI Safety “XPrize”: create AI which can take over a sufficiently complex video game world but still tends to continue to do good according to a panel of human judges.

To be useful, the video game world should be complex in the right ways: it needs to have rich physics that agents can learn to control, it needs to permit/encourage competitive and cooperative strategic complexity similar to that in the real world, etc. So more complex than pac-man, but simpler than the Matrix. Something in the vein of a minecraft mod might have the right properties—but there are probably even more suitable open-world MMO games.

The other constraint on such a test is we want the AI to be superhuman in the video game world, but not our world (yet). Clearly this is possible—ala AlphaGo. But naturally the more complex the video game world is in the direction of our world, both the harder the goal becomes and the more dangerous.

Note also that the AI should not know that it is being tested; it shall not know it inhabits a simulation. This isn’t likely to be any sort of problem for the AI we can actually build and test in the near future, but it becomes an interesting issue later on.

DeepMind is now focusing on Starcraft, OpenAI has universe, so we already on a related path. Competent AI for open-ended 3D worlds with complex physics—like minecraft—is still not quite here, but is probably realizable in just a few years.
What links here?
- LOVE in a simbox is all you need by jacob_cannell (28 Sep 2022 18:25 UTC; 66 points)