First of all, these are all meant to denote very rough attempts at demarcating research tastes.
It seems possible to be aiming to solve P1 without thinking much of P4, if a) you advocate ~Butlerian pause, or b) if you are working on aligned paternalism as the target behavior (where AI(s) are responsible for keeping humans happy, and humans have no residual agency or autonomy remaining).
Also a lot of people who focus on the problem from a P4 perspective tend to focus on the human-AI interface, where most of the relevant technical problems lie, but this might reduce their attention on issues of mesa-optimizers or emergent agency despite the massive importance of those issues to their project in the long run.
particlemania
The Problem With the Word ‘Alignment’
Paradigms and Theory Choice in AI: Adaptivity, Economy and Control
Okasha’s paper is addressing emerging discussions in biology that are talking about organisms-as-agents in particular, otherwise being called the Return of the Organism turn in philosophy of biology.
In the paper, he adds “Various concepts have been offered as ways of fleshing out this idea of organismic autonomy, including goal-directedness, functional organization, emergence, self-maintenance, and individuality. Agency is another possible candidate for the job.”
This seems like a reasonable stance so far as I can tell, since organisms seem to have some structural integrity—in what can make delineated cartesian boundaries well-defined.
For collectives, a similar discussion may surface additional upsides and downsides to agency concepts, that may not apply at organism levels.
Announcing “Key Phenomena in AI Risk” (facilitated reading group)
My understanding of Steel Late Wittgenstein’s response would be that you could agree with that words and concepts are distinct, and mapping is not always 1-1, but that what concepts get used is also significantly influenced by which features of the world are useful in some contexts of language (/word) use.
Rewards and Utilities are different concepts. To reject that reward is necessary to get/build agency is not the same thing as rejecting EU maximization as a basin of idealized agency.
As an addendum, it seems to me that you may not necessarily need a ‘long-term planner’ (or ‘time-unbounded agent’) in the environment. A similar outcome may also be attainable if the environment contains a tiling of time-bound agents who can all trade across each other in ways such that the overall trade network implements long term power seeking.
Reflections on the PIBBSS Fellowship 2022
Concept Dictionary.
Concepts that I intend to use or invoke in my writings later, or are parts of my reasoning about AI risk or related complex systems phenomena.
I would agree that it would be good and reasonable to have a term to refer to the family of scientific and philosophical problem spanned by this space. At the same time, as the post says, the issue is when there is semantic dilution, people talking past each other, and coordination-inhibiting ambiguity.
Now take a look at something I could check with a simple search: an ICML Workshop that uses the term alignment mostly to mean P3 (task-reliability) https://arlet-workshop.github.io/
One might want to use alignment one way or the other, and be careful of the limited overlap with P3 in our own registers, but by the time the larger AI community has picked up on the use-semantics of ‘RLHF is an alignment technique’ and associated alignment primarily with task-reliability, you’d need some linguistic interventions and deliberation to clear the air.