Work with me on agent foundations: independent fellowship

Summary: I am an independent researcher in agent foundations, and I’ve recently received an LTFF grant to fund someone to do research with me. This is a rolling application; I’ll close it whenever I’m no longer interested in taking another person.

If you’re not familiar with agent foundations, you can read about my views in this post.

What the role might be like

This role is extremely flexible. Depending on who you are, it could end up resembling an internship, a research assistant position, a postdoc or even as a mentor/​advisor to me. Below, I’ve listed out the parameters of the fellowship that I am using as a baseline of what it could be. All of these parameters are negotiable!

  • $25 per hour. This is not a lot for people who live in the SF Bay area, or who are used to industry salaries, but it looks to me like this is comparable to a typical grad student salary.

  • 20 hours per week. I’d like this fellowship to be one of your main projects, and I think it can take quite a lot of “deep work” focus before one can make progress on the research problems.[1]

  • 3 months, with a decent chance of extension. During my AI safety camp project, it took about 6 weeks to get people up to speed on all the parts of the agent structure problem. Ideally I could find someone for this role who is already closer to caught up (though I don’t necessarily anticipate that). I’m thinking of this fellowship as something like an extended work-trial for potentially working together longer-term. That said, I think we should at least aim to get results by the end of it. Whether I’ll decide to invite you to continue working with me afterwards depends on how our collaboration went (both technically and socially), how many other people I’m collaborating with at that time, and whether I think I have enough funds to support it.

  • Remote, but I’m happy to meet in person. Since I’m independent, I don’t have anything like an office for you to make use of. But if you happen to be in the SF Bay area, I’d be more than happy to have our meetings in person. I wake up early, so US eastern and European time zones work well for me (and other time zones too).

  • Meeting 2-5 times per week. Especially in the beginning, I’d like to do a pretty large amount of syncing up. It can take a long time to convey all the aspects of the research problems. I also find that real-time meetings regularly generate new ideas. That said, some people find meetings worse for their productivity, and so I’ll be responsive to your particular work style.

  • An end-of-term write-up. It seems to take longer than three months to get results in the types of questions I’m interested in, but I think it’s good practice to commit to producing a write-up of how the fellowship goes. If it goes especially well, we could produce a paper.

What this role ends up looking like mostly depends on your experience level relative to mine. Though I now do research, I haven’t gone through the typical academic path. I’m in my mid-thirties and have a proportional amount of life and career experience, but in terms of mathematics, I consider myself the equivalent of a second year grad student. So I’m comfortable leading this project and am confident in my research taste, but you might know more math than me.

The research problems

Like all researchers in agent foundations, I find it quite difficult to concisely communicate what my research is about. Probably the best way to tell if you will be interested in my research problems is to read other things I’ve written, and then have a conversation with me about it.

All my research is purely mathematical,[2] rather than experimental or empirical. None of it involves machine learning per se, but the theorems should apply to ML systems.

The domains of math that I’ve been focusing on include: probability theory, stochastic processes, measure theory, dynamical systems, ergodic theory, information theory, algorithmic information theory. Things that I’m interested in but not competent in include: category theory, computational mechanics, abstract algebra, reinforcement learning theory.

Here are some more concrete examples of projects you could work on.

  • Write an explainer for Brudno’s theorem.[3]

  • Take theorem 10 from this information theory paper by Touchette & Lloyd and extend it to the case with multiple timesteps, or where we measure the change in utility rather than the change in entropy.

  • Take the abstract Internal Model Principle of Wonham[4] and rework it so that it applies to controllers regulating the external environment (rather than regulating their internal state).

  • Take the same abstract Internal Model Principle and modify it to show that controllers which approximately regulate their internal state must have approximate models of their environments.

  • Do a literature review on the differences between the utility function formalism and the reinforcement learning reward formalism, explain exactly when they are and are not compatible, and discuss which existing results do or don’t apply across both.

Application process

If you’re interested, fill out this application form! You’re also welcome to message me with any questions. After that, the rest of the application steps are;

  • A short, conversational interview (20 min)

  • A longer interview (1h) where we talk about both of our research interests in more detail, and come up with some potential concrete projects.

  • Then, you go off and do some thinking & reading about your project ideas, and write a more detailed proposal. I’ll pay you $200 for this part, and you should spend roughly 4-8 hours on it.

  • A second longer interview (1h), where we go through your proposal.

After this, we should have a pretty good sense of whether we would work well together, and I’ll make a decision about whether to offer you the 3-month fellowship (or whatever else we may have negotiated).

  1. ^

    Why not 40h/​week? Partly because I want to use the grant money well. I also think that marginal productivity on a big abstract problem starts to drop around 20h/​week. (I get around this by having multiple projects at time, so that may be an option.) Happy to negotiate on this as well.

  2. ^

    More specifically, the desired results are mathematical. The ideas are almost all “pre-mathematical”, in that the first part will be to translate the ideas into the appropriate formalisms.

  3. ^

    A. A. Brudno, Entropy and the complexity of the trajectories of a
    dynamical system (1983)

  4. ^

    Canonically Wonham, W. M. Towards an Abstract Internal Model Principle (1976) but a more pedagogic presentation appears in the book Supervisory Control of Discrete-Event Systems, (2019) Cai & Wonham as section 1.5.