ARENA might end up teaching this person some mech-interp methods they haven’t seen before, although it sounds like they would be more than capable of self-teaching any mech-interp. The other potential value-add for your acquaintance would be if they wanted to improve their RL or Evals skills, and have a week to conduct a capstone project with advisors. If they were mostly aiming to improve their mech-interp ability by doing ARENA, there would probably be better ways to spend their time.
JamesH
The way we see this project going concretely looks something like:
First things first, we want to get a good enough theoretical background of IBP. This will ultimately result in something like a distillation of IBP that we will use as reference, and hope others will get a lot of use from.
In this process, we will be doing most of our testing in a theoretical framework. That is to say, we will be constructing model agents and seeing how InfraBayesian Physicalism actually deals with these in theory, whether it breaks down at any stage (as judged by us), and if so whether we can fix or avoid those problems somehow.
What comes after this, as we see it at the moment, is trying to implement the principles of InfraBayesian Physicalism in a real-life, honest-to-god, Inverse Reinforcment Learning proposal. We think IBP stands a good chance of being able to patch some of the largest problems in IRL, which should ultimately be demonstrable by actually making an IRL proposal that works robustly. (When this inevitably fails the first few times, we will probably return to step 1, having gained useful insights, and iterate).
I think there’s a mistake in 17: \sin(x) is not a diffeomorphism between (-\pi,\pi) and (-1,1) (since it is e.g. not bijective between these domains). Either you mean sin(x/2) or the interval bounds should be (-\pi/2, \pi/2)