Vanessa Kosoy comments on BASALT: A Benchmark for Learning from Human Feedback

Vanessa Kosoy 9 Jul 2021 21:11 UTC
LW: 4 AF: 3
AF

The AI safety community claims it is hard to specify reward functions… But for real-world deployment of AI systems, designers do know the task in advance!

Right, but you’re also going for tasks that are relatively simple and easy. In the sense that, “MakeWaterfall” is something that I can, based on my own experience, imagine solving without any ML at all (but ofc going to that extreme would require massive work). It might be that for such tasks solutions using handcrafted rewards/heuristics would be viable, but wouldn’t scale to more complex tasks. If your task was e.g. “follow arbitrary natural language instructions” then I wouldn’t care about the “lax” rules.

Note we do ban extraction of information from the Minecraft simulator

This is certainly good, but I wonder what are the exact rules here. Suppose the designer trains a neural network to recognize trees in minecraft by getting the minecraft engine to generate lots of images of trees. The resulting network is then used as a hardcoded part of the agent architecture. Is that allowed? If not, how well can you enforce it (I imagine something of the sort can be done in subtler ways)?

Not saying that what you’re doing is not useful, just pointing out a certain way in which the benchmark might diverge from its stated aim.
- Rohin Shah 10 Jul 2021 7:16 UTC
  LW: 4 AF: 3
  AF Parent
  It might be that for such tasks solutions using handcrafted rewards/heuristics would be viable, but wouldn’t scale to more complex tasks.
  I agree that’s possible. Tbc, we did spend some time thinking about how we might use handcrafted rewards / heuristics to solve the tasks, and eliminated a couple based on this, so I think it probably won’t be true here.
  Suppose the designer trains a neural network to recognize trees in minecraft by getting the minecraft engine to generate lots of images of trees. The resulting network is then used as a hardcoded part of the agent architecture. Is that allowed?
  No.
  If not, how well can you enforce it (I imagine something of the sort can be done in subtler ways)?
  For the competition, there’s a ban on pretrained models that weren’t publicly available prior to competition start. We look at participants’ training code to ensure compliance. It is still possible to violate this rule in a way that we may not catch (e.g. maybe you use internal simulator details to do hyperparameter tuning, and then hardcode the hyperparameters in your training code), but it seems quite challenging and not worth the effort even if you are willing to cheat.
  For the benchmark (which is what I’m more excited about in the longer run), we’re relying on researchers to follow the rules. Science already relies on researchers honestly reporting their results—it’s pretty hard to catch cases where you just make up numbers for your experimental results.
  (Also in the benchmark version, people are unlikely to write a paper about how they solved the task using special-case heuristics; that would be an embarrassing paper.)