I do expect reflection to be a pretty central part of the path to FOOM, but I expect it to be way easier to analyze once the non-reflective foundations of agency are sorted out. There are good reasons to expect otherwise on an outside view—i.e. all the various impossibility results in logic and computing. On the other hand, my inside view says it will make more sense once we understand e.g. how abstraction produces maps smaller than the territory while still allowing robust reasoning, how counterfactuals naturally pop out of such abstractions, how that all leads to something conceptually like a Cartesian boundary, the relationship between abstract “agent” and the physical parts which comprise the agent, etc.
If I imagine what my work would look like if I started out expecting reflection to be the taut constraint, then it does seem like I’d follow a path a lot more like MIRI’s. So yeah, this fits.
If I imagine what my work would look like if I started out expecting reflection to be the taut constraint, then it does seem like I’d follow a path a lot more like MIRI’s. So yeah, this fits.
One thing I’m still not clear about in this thread is whether you (John) would feel that progress has been made for the theory of agency if all the problems on which MIRI were instantaneously solved. Because there’s a difference between saying “this is the obvious first step if you believe reflection is the taut constraint” and “solving this problem would help significantly even if reflection wan’t the taut constraint”.
I expect that progress on the general theory of agency is a necessary component of solving all the problems on which MIRI has worked. So, conditional on those problems being instantly solved, I’d expect that a lot of general theory of agency came along with it. But if a “solution” to something like e.g. the Tiling Problem didn’t come with a bunch of progress on more foundational general theory of agency, then I’d be very suspicious of that supposed solution, and I’d expect lots of problems to crop up when we try to apply the solution in practice.
(And this is not symmetric: I would not necessarily expect such problems in practice for some more foundational piece of general agency theory which did not already have a solution to the Tiling Problem built into it. Roughly speaking, I expect we can understand e-coli agency without fully understanding human agency, but not vice-versa.)
One thing I am confused about is whether to think of the e-coli as qualitatively different from the human. The e-coli is taking actions that can be well modeled by an optimization process searching for actions that would be good if this optimization process output them, which has some reflection in it.
It feels like it can behaviorally be well modeled this way, but is mechanistically not shaped like this, I feel like the mechanistic fact is more important, but I feel like we are much closer to having behavioral definitions of agency than mechanistic ones.
I would say the e-coli’s fitness function has some kind of reflection baked into it, as does a human’s fitness function. The qualitative difference between the two is that a human’s own world model also has an explicit self-model in it, which is separate from the reflection baked into a human’s fitness function.
After that, I’d say that deriving the (probable) mechanistic properties from the fitness functions is the name of the game.
… so yeah, I’m on basically the same page as you here.
That does seem right.
I do expect reflection to be a pretty central part of the path to FOOM, but I expect it to be way easier to analyze once the non-reflective foundations of agency are sorted out. There are good reasons to expect otherwise on an outside view—i.e. all the various impossibility results in logic and computing. On the other hand, my inside view says it will make more sense once we understand e.g. how abstraction produces maps smaller than the territory while still allowing robust reasoning, how counterfactuals naturally pop out of such abstractions, how that all leads to something conceptually like a Cartesian boundary, the relationship between abstract “agent” and the physical parts which comprise the agent, etc.
If I imagine what my work would look like if I started out expecting reflection to be the taut constraint, then it does seem like I’d follow a path a lot more like MIRI’s. So yeah, this fits.
One thing I’m still not clear about in this thread is whether you (John) would feel that progress has been made for the theory of agency if all the problems on which MIRI were instantaneously solved. Because there’s a difference between saying “this is the obvious first step if you believe reflection is the taut constraint” and “solving this problem would help significantly even if reflection wan’t the taut constraint”.
I expect that progress on the general theory of agency is a necessary component of solving all the problems on which MIRI has worked. So, conditional on those problems being instantly solved, I’d expect that a lot of general theory of agency came along with it. But if a “solution” to something like e.g. the Tiling Problem didn’t come with a bunch of progress on more foundational general theory of agency, then I’d be very suspicious of that supposed solution, and I’d expect lots of problems to crop up when we try to apply the solution in practice.
(And this is not symmetric: I would not necessarily expect such problems in practice for some more foundational piece of general agency theory which did not already have a solution to the Tiling Problem built into it. Roughly speaking, I expect we can understand e-coli agency without fully understanding human agency, but not vice-versa.)
I agree with this asymmetry.
One thing I am confused about is whether to think of the e-coli as qualitatively different from the human. The e-coli is taking actions that can be well modeled by an optimization process searching for actions that would be good if this optimization process output them, which has some reflection in it.
It feels like it can behaviorally be well modeled this way, but is mechanistically not shaped like this, I feel like the mechanistic fact is more important, but I feel like we are much closer to having behavioral definitions of agency than mechanistic ones.
I would say the e-coli’s fitness function has some kind of reflection baked into it, as does a human’s fitness function. The qualitative difference between the two is that a human’s own world model also has an explicit self-model in it, which is separate from the reflection baked into a human’s fitness function.
After that, I’d say that deriving the (probable) mechanistic properties from the fitness functions is the name of the game.
… so yeah, I’m on basically the same page as you here.