The algorithm takes an argmax over an exponentially large space of sequences of actions, i.e. it does 2^{episode length} model evaluations. Do you think the result is smarter than a group of humans of size 2^{episode length}? I’d bet against—the humans could do this particular brute force search, in which case you’d have a tie, but they’d probably do something smarter.
I obviously haven’t solved the Tractable General Intelligence problem. The question is whether this is a tractable/competitive framework. So expectimax planning would naturally get replaced with a Monte-Carlo tree search, or some better approach we haven’t thought of. And I’ll message you privately about a more tractable approach to identifying a maximum a posteriori world-model from a countable class (I don’t assign a very high probability to it being a hugely important capabilities idea, since those aren’t just lying around, but it’s more than 1%).
It will be important, when considering any of these approximations, to evaluate whether they break benignity (most plausibly, I think, by introducing a new attack surface for optimization daemons). But I feel fine about deferring that research for the time being, so I defined BoMAI as doing expectimax planning instead of MCTS.
Given that the setup is basically a straight reinforcement learner with a weird prior, I think that at that level of abstraction, the ceiling of competitiveness is quite high.
I’m sympathetic to this picture, though I’d probably be inclined to try to model it explicitly—by making some assumption about what the planning algorithm can actually do, and then showing how to use an algorithm with that property. I do think “just write down the algorithm, and be happier if it looks like a ‘normal’ algorithm” is an OK starting point though
Given that the setup is basically a straight reinforcement learner with a weird prior, I think that at that level of abstraction, the ceiling of competitiveness is quite high.
Stepping back from this particular thread, I think the main problem with competitiveness is that you are just getting “answers that look good to a human” rather than “actually good answers.” If I try to use such a system to navigate a complicated world, containing lots of other people with more liberal AI advisors helping them do crazy stuff, I’m going to quickly be left behind.
It’s certainly reasonable to try to solve safety problems without attending to this kind of competitiveness, though I think this kind of asymptotic safety is actually easier than you make it sound (under the implicit “nothing goes irreversibly wrong at any finite time” assumption).
Stepping back from this particular thread, I think the main problem with competitiveness is that you are just getting “answers that look good to a human” rather than “actually good answers.”
The algorithm takes an argmax over an exponentially large space of sequences of actions, i.e. it does 2^{episode length} model evaluations. Do you think the result is smarter than a group of humans of size 2^{episode length}? I’d bet against—the humans could do this particular brute force search, in which case you’d have a tie, but they’d probably do something smarter.
I obviously haven’t solved the Tractable General Intelligence problem. The question is whether this is a tractable/competitive framework. So expectimax planning would naturally get replaced with a Monte-Carlo tree search, or some better approach we haven’t thought of. And I’ll message you privately about a more tractable approach to identifying a maximum a posteriori world-model from a countable class (I don’t assign a very high probability to it being a hugely important capabilities idea, since those aren’t just lying around, but it’s more than 1%).
It will be important, when considering any of these approximations, to evaluate whether they break benignity (most plausibly, I think, by introducing a new attack surface for optimization daemons). But I feel fine about deferring that research for the time being, so I defined BoMAI as doing expectimax planning instead of MCTS.
Given that the setup is basically a straight reinforcement learner with a weird prior, I think that at that level of abstraction, the ceiling of competitiveness is quite high.
I’m sympathetic to this picture, though I’d probably be inclined to try to model it explicitly—by making some assumption about what the planning algorithm can actually do, and then showing how to use an algorithm with that property. I do think “just write down the algorithm, and be happier if it looks like a ‘normal’ algorithm” is an OK starting point though
Stepping back from this particular thread, I think the main problem with competitiveness is that you are just getting “answers that look good to a human” rather than “actually good answers.” If I try to use such a system to navigate a complicated world, containing lots of other people with more liberal AI advisors helping them do crazy stuff, I’m going to quickly be left behind.
It’s certainly reasonable to try to solve safety problems without attending to this kind of competitiveness, though I think this kind of asymptotic safety is actually easier than you make it sound (under the implicit “nothing goes irreversibly wrong at any finite time” assumption).
Starting a new thread on this:
here.