clem, they/them, researcher @ alignment of complex systems (acs)
clem_acs
Announcing the PIBBSS Symposium ’24!
There is another sense in which I would not want to say that there is any particular hierarchy between natural/unnatural/rational constraints.
I think there’s a lot to unpack here. I’m going to give it a preliminary go, anticipating that it’s likely to be to a bit all over the place. The main thread I want to pull is what it means to impose a particular hierarchy between the constraints, and then see how this leads to many possible hierarchies in such a way that it feels like no particular hierarchy is privileged.
From a “natural” point of view, which privileges physical time, individuation is something that must be explained—a puzzle which is at the heart of the mystery of the origin of life. From this point of view, “rationality” or “coherence” is also something that must be explained (which is what Richard Ngo is gesturing out in his comment / post).
From a “rational” point of view, we can posit abstract criteria which we want our model of agency to fulfil. For instance Logical Induction (Garrabrant et al. 2016), takes a formalisation of the following desideratum (named “Approximate Inexploitability”, or “The Logical Induction Criterion”): “It should not be possible to run a Dutch book against a good reasoner in practice.” (ibid, p.8, p.14), and then constructs an agent entirely within logic from this. Something like “rationality” or “coherence” is assumed (for well argued reasons), and the structure of agency is deduced from there. This kind of move is also what underpins selection theorems. In my view, individuation also needs to be explained here, but it’s often simply assumed (much like it is in most of theoretical biology).
The “unnatural” point of view is much more mysterious to us. When I use the term, I want to suggest that individuation can be assumed, but physical time becomes something that must be explained. This is a puzzle which is touched on in esoteric areas of physics (e.g. “A smooth exit from eternal inflation?”—Hawking and Hertog 2018), and consciousness science (e.g. “A measure for intrinsic information”—Barbosa et al. 2020), and discussed in “religious” or “spiritual” contexts, but in reality very poorly understood. I think you gesture at a really interesting perspective on this by relating it to “thinghood” in active inference—but to me this misses a lot of what makes this truly weird—the reasons I decided to label it “unnatural” in the first place.
It’s deeply confusing to me at this stage how the “unnatural” point of view relates to the “rational” one, I’d be curious to hear any thoughts on this, however speculative. I do, however, think that there is a sense in which none of the three hierarchies I’m gesturing at in this comment are “the real thing”—they more feel like prisms through which we can diffract the truth in an attempt to break it down into manageable components.
Motivation 1 (‘organisms-as-agents thesis’) “says that organisms really do exhibit some or even all of the attributes of agency”. Motivation 2 (‘organism-as-agents heuristic’) “says that it can be heuristically useful to treat organisms as if they were agents for certain intellectual purposes”.
It’s interesting that both motivations appear to be about modelling organisms as agents, as opposed to any other level of organisation. This feels like it misses some of the most interesting insights we might get from biological agency, namely those around agents at different levels of organisation interacting—e.g: ants and ant colonies, cancer cells and multicellular organisms, or individual organisms and selection pressures (which could be treated as as-if agents at evolutionary timescales).
Contingency: A Conceptual Tool from Evolutionary Biology for Alignment
“Social” is slightly too coarse-grained a tag. The thing we’re actually interested in is “whether successfully predicting the behaviour of other members of its own species is a strong selection pressure”. Social collaboration is one way this happens—another seems to be “deception” arms races (such as corvids stealing and hiding things from each other), or specific kinds of mating rituals. It also depends on the relative strength of other selection pressures—in most cases highly intelligent creatures also seem to have developed a “slack” in resources they can devote to intelligence (e.g. humans cooking food).
This does seem to hold for cephalopods—a strong datapoint for which being their highly sophisticated forms of communication (e.g. video below).
Clem here—I was fellowship lead this year and have been a research affiliate and mentor for PIBBSS in the past. Thanks for posting this. As might be expected in my position, I’m much more bullish than you / most people on what is often called “blue sky” research. Breakthroughs in the our fundamental understanding of agency, minds, learning etc. seem valuable in a range of scenarios, not just in world dominated by an “intelligence explosion”. In particular, I think that this kind of work (a) benefits a huge amount from close engagement with empirical work, and (b) itself is very likely to inform near-future prosaic work. Furthermore, I feel that progress on these questions is genuinely possible, and is made significantly more likely with more people working on it from as many perspectives as possible.
This said, I think two things you say under “reservations” I strongly agree with, and have some comments on.
> I encourage PIBBSS to “embrace the weird,” albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.
There are worlds where furthering our understanding of deeply confused basic concepts that underpin everything else we do isn’t considered “the weird”, but given that we’re not in these worlds I have to agree. One big issues I see here is that doing this well requires marrying the better parts of academic culture with the better parts of tech / rationality culture (and yes, for this purpose I place those in the same bucket). Some of the places that I think do this best—e.g google’s paradigms of intelligence team—have a culture / belief system somewhat incompatible with EA. It’s worth noting that people often pursue basic questions for very different reasons.
> I strongly encourage PIBBSS to publicly post and seek feedback on their applicant selection and research prioritization processes, so that the AI safety ecosystem can offer useful insight (and benefit from this).
I think this is actually really important, and it’s not something that I think PIBBSS does very well currently. One thing I would note is that, for reasons sketched above, I think it’s important that the AI safety ecosystem aren’t the only people interacting with this. One thing that’s holding things back here is, in my view, a venue for this kind of research whose scope is, primarily, the basic research. This is not to say that the relevance and impact for safety shouldn’t be a primary concern in research prioritisation—I think it very much should be—but I do think this can be done in a way that is more compatible with academic norms (at least those academic norms that are worth upholding).