Blake H. comments on The Plan

Blake H. 24 Dec 2021 14:52 UTC
5 points
Totally get it. There are lots of folks practicing philosophy of mind and technology today in that aussie tradition who I think take these questions seriously and try to cache out what we mean when we talk about agency, mentality, etc. as part of their broader projects.
I’d resist your characterization that I’m insisting words shouldn’t be used a particular way, though I can understand why it might seem that way. I’m rather hoping to shed more light on the idea raised by this post that we don’t actually know what many of these words even mean when they’re used in certain ways (hence the authors totally correct point about the need to clarify confusions about agency while working on the alignment problem). My whole point in wading in here is just to point out to a thoughtful community that there’s a really long rich history of doing just this, and even if you prefer the answers given by aussie materialists, it’s even better to understand those positions vis-a-vis their present and past interlocutors. If you understand those who disagree with them, and can articulate those positions in terms they’d accept, you understand your preferred positions even better. I wouldn’t say I deplore it, but I am always mildly amused when cogsci, compsci, and stats people start wading into plainly philosophical waters (“sort out our fundamental confusions about agency”) and talk as if they’re the first ones to get there—or the only ones presently splashing around. I guess I would have thought (perhaps naively) that on a site like this people would be at least curious to see what work has already been done on the questions so they can accelerate their inquiry.
Re: ruling out hard problems—lot’s of philosophy is the attempt to better understand the problem’s framing such that it either reduces to a different problem, or disappears altogether. I’d urge you to see this as an example of that kind of thing, rather than ruling out certain questions from the gun.
And on anthropocentrism—not sure what the point is supposed to be here, but perhaps it’s directed at the “difference in kind” statements I made above. If so, I’d hope we can see light between treating humans as if they were the center of the universe and recognizing that there are at least putatively qualitative differences between the type of agency rational animals enjoy and the type of agency enjoyed by non-rational animals and artifacts. Even the aussie materialists do that—and then set about trying to form a theory of mind and agency in physical terms because they rightly see those putatively qualitative differences as a challenge to their particular form of metaphysical naturalism.
So look, if the author of this post is really serious about (1) they will almost certainly have to talk about what we mean when we use agential words. There will almost certainly be disagreements about whether their characterizations (A) fit the facts, and (B) are coherent with the rest of our beliefs. I don’t want to come even close to implying that folks in compsci, cogsci, stats, etc. can’t do this—they certainly can. I’m just saying that it’s really, really conspicuous to not do so in dialogue with those whose entire discipline is devoted to that task. Philosophers are really good at testing our accounts of an agential concept by saying things like, “okay let’s run with this idea of yours that we can define agency and mentality in terms of some bayesian predictive processing, or in terms of planning states, or whaterver, but to see if that view really holds up, we have to be able to use your terms or some innocent others to account for all the distinctions we recognize in our thought and talk about minds and agents.” That’s the bulk of what philosophers of mind and action do nowadays—they take someone’s proposal about a theory of mind or action and test whether it can give an account of some region of our thought and talk about minds and agents. If it can’t they either propose addenda, push the burden back to the theorist, or point out structural reasons why the theory faces general obstacles that seem difficult to overcome.
Here’s some recent work on the topic, just to make it plain that there are philosophers working on these questions:
https://link.springer.com/article/10.1007%2Fs10676-021-09611-0
https://link.springer.com/article/10.1007/s11023-020-09539-2
And a great article by a favorite philosopher of action on three competing theories of human agency
https://onlinelibrary.wiley.com/doi/10.1111/nous.12178
Hope some of that is interesting, and appreciate the response.
Cheers
- Daniel Kokotajlo 31 Dec 2021 22:31 UTC
  2 points
  Parent
  Those articles are all paywalled; got free versions? I tried Sci-Hub, no luck.
  - gwern 1 Jan 2022 1:26 UTC
    4 points
    Parent
    ? The second is already open-access, and the third both works in SH & GS (with 2 different PDF links). Only the first link fails in SH. (But what an abstract: “I also argue that if future generally intelligent AI possess a predictive processing cognitive architecture, then they will come to share our pro-moral motivations (of valuing humanity as an end; avoiding maleficent actions; etc.), regardless of their initial motivation set.” Wow.)
    - Daniel Kokotajlo 1 Jan 2022 3:56 UTC
      5 points
      Parent
      Huh, I tried the first and third in SH, maybe I messed up somehow. My bad. Thanks!
      I still am interested in the first (on the principle that maybe, just maybe, it’s the solution to all our problems instead of being yet another terrible argument made by philosophers about why AIs will be ethical by default if only we do X… I think I’ve seen two already) and would like to have access.
    - Jon Garcia 1 Jan 2022 14:01 UTC
      1 point
      Parent
      I can see how that would work. The author needs to be careful, though. Predictive processing may be a necessary condition for robust AGI alignment, but it is not per se a sufficient condition.
      
      First of all, that only works if you give the AGI strong inductive priors for detecting and predicting human needs, goals, and values. Otherwise, it will tend to predict humans as though we are just “physical” systems (we are, but I mean modeling us without taking our sentience and values into account), no more worthy of special care than rocks or streams.
      
      Second of all, this only works if the AGI has a structural bias toward treating the needs, goals, and values that it infers from predictive processing as its own. Otherwise, it may understand how to align with us, but it won’t care by default.