Remmelt comments on Why Stop AI is barricading OpenAI

Remmelt 16 Oct 2024 12:16 UTC
1 point
0
I agree that with superficial observations, I can’t conclusively demonstrate that something is devoid of intellectual value.
Thanks for recognising this, and for taking some time now to consider the argument.
However, the nonstandard use of words like “proof” is a strong negative signal on someone’s work.
Yes, this made us move away from using the term “proof”, and instead write “formal reasoning”.
Most proofs nowadays are done using mathematical notation. So it is understandable that when people read “proof”, they automatically think “mathematical proof”.
Having said that, there are plenty of examples of proofs done in formal analytic notation that is not mathematical notation. See eg. formal verification practices in the software and hardware industries, or various branches of analytical philosophy.
If someone wants to demonstrate a scientific fact, the burden of proof is on them to communicate this in some clear and standard way
Yes, much of the effort has been to translate argument parts in terms more standard for the alignment community.
What we cannot expect is that the formal reasoning is conceptually familiar and low-inferential distance. That would actually be surprising – why then has someone inside the community not already derived the result in the last 20 years?
The reasoning is going to be as complicated as it has to be to reason things through.
This problem is exacerbated when someone bases their work on original philosophy. To understand Forrest Landry’s work to his satisfaction someone will have to understand his 517-page book An Immanent Metaphysics
Cool that you took a look at his work. Forrest’s use of terms is meant to approximate everyday use of those terms, but the underlying philosophy is notoriously complicated.
Jim Rutt is an ex-chair of Santa Fe Institute who defaults to being skeptical of metaphysics proposals (funny quote he repeats: “when someone mentions metaphysics, I reach for my pistol”). But Jim ended up reading Forrest’s book and it passed his B.S. detector. So he invited Forrest over to his podcast for a three-part interview. Even if you listen to that though, I don’t expect you to immediately come away understanding the conceptual relations.
So here is a problem that you and I are both seeing:
- There is this polymath who is clearly smart and recognised for some of his intellectual contributions (by interviewers like Rutt, or co-authors like Anders).
- But what this polymath claims to be using as the most fundamental basis for his analysis would take too much time to work through.
- So then if this polymath claims to have derived a proof by contradiction –concluding that long-term AGI safety is not possible – then it is intractable for alignment researchers to verify the reasoning using his formal annotation and his conceptual framework. That would be asking for too much – if he’d have insisted on that, I agree that would have been a big red flag signalling crankery.
- The obvious move then is for some people to work with the polymath to translate his reasoning to a basis of analysis that alignment researchers agree is a sound basis to reason from. And to translate to terms/concepts people are familiar with. Also, the chain of reasoning should not be so long that busy researchers never end up reading through, but also not so short that you either end up having to use abstractions readers are unfamiliar with, or open up unaddressed gaps in the reasoning. Etc.
- The problem becomes finding people who are both willing and available to do that work. One person is probably not enough.
Having read the research proposal my guess is that they will prove something roughly like the Good Regulator Theorem or Rice’s theorem
Both are useful theorems, which have specific conclusions that demonstrate that there are at least some limits to control.
(ie. Good Regulator Theorem demonstrates a limit to a system’s capacity to model – or internally functionally represent – the statespace of some more complex super-system. Rice Theorem demonstrates a particular limit to having some general algorithm predict a behavioural property of other algorithms.)
The hashiness model is a tool meant for demonstrating under conservative assumptions – eg. of how far from cryptographically hashy the algorithm run through ‘AGI’ is, and how targetable human-safe ecosystem conditions are – that AGI would be uncontainable. With “uncontainable”, I mean that no available control system connected with/in AGI could constrain the possibility space of AGI’s output sequences enough over time such that the (cascading) environmental effects do not lethally disrupt the bodily functioning of humans.
Paul expressed appropriate uncertainty. What is he supposed to...say...?
I can see Paul tried expressing uncertainty by adding “probably” to his claim of how the entire scientific community (not sure what this means) would interpret that one essay.
To me, it seemed his commentary was missing some meta-uncertainty. Something like “I just did some light reading. Based on how it’s stated in this essay, I feel confident it makes no sense for me to engage further with the argument. However, maybe other researchers would find it valuable to spend more time engaging with the argument after going through this essay or some other presentation of the argument.”

~
That covers your comments re: communicating the argument in a form that can be verified by the community.
Let me cook dinner, and then respond to your last two comments to dig into the argument itself. EDIT: am writing now, will respond tomorrow.
- Remmelt 18 Oct 2024 7:19 UTC
  1 point
  0
  Parent
  When you say failures will “build up toward lethality at some unknown rate”, why would failures build up toward lethality? We have lots of automated systems e.g. semiconductor factories, and failures do not accumulate until everyone at the factory dies, because humans and automated systems can notice errors and correct them.
  Let’s take your example of semiconductor factories.
  
  There are several ways to think about failures here. For one, we can talk about local failures in the production of the semiconductor chips. These especially will get corrected for.
  A less common way to talk about factory failures is when workers working in the factories die or are physically incapacitated as a result, eg. because of chemical leaks or some robot hitting them. Usually when this happens, the factories can keep operating and existing. Just replace the expendable workers with new workers.
  
  Of course, if too many workers die, other workers will decide to not work at those factories. Running the factories has to not be too damaging to the health of the internal human workers, in any of the many (indirect) that ways operations could turn out to be damaging.
  The same goes for humans contributing to the surrounding infrastructure needed to maintain the existence of these sophisticated factories – all the building construction, all the machine parts, all the raw materials, all the needed energy supplies, and so on. If you try overseeing the relevant upstream and downstream transactions, it turns out that a non-tiny portion of the entire human economy is supporting the existence of these semiconductor factories one way or another. It took a modern industrial cross-continental economy to even make eg. TSMC’s factories viable.
  The human economy acts as a forcing function constraining what semiconductor factories can be. There are many, many ways to incapacitate complex multi-celled cooperative organisms like us. So the semiconductor factories that humans are maintaining today ended up being constrained to those that for the most part do not trigger those pathways downstream.
  Some of that is because humans went through the effort of noticing errors explicitly and then correcting them, or designing automated systems to do likewise. But the invisible hand of the market considered broadly – as constituting of humans with skin in the game, making often intuitive choices – will actually just force semiconductor factories to be not too damaging to surrounding humans maintaining the needed infrastructure.
  With AGI, you lose that forcing function.
  Let’s take AGI to be machinery that is autonomous enough to at least automate all the jobs needed to maintain its own existence. Then AGI is no longer dependent on an economy of working humans to maintain its own existence. AGI would be displacing the human economy – as a hypothetical example, AGI is what you’d get if those semiconductor factories producing microchips expanded to producing servers and robots using those microchips that in turn learn somehow to design themselves to operate the factories and all the factory-needed infrastructure autonomously.
  
  Then there is one forcing function left: the machine operation of control mechanisms. Ie. mechanisms that detect, model, simulate, evaluate, and correct downstream effects in order to keep AGI safe.
  
  The question becomes – Can we rely on only control mechanisms to keep AGI safe?
  That question raises other questions.
  
  E.g. as relevant to the hashiness model:
  “Consider the space of possible machinery output sequences over time. How large is the subset of output sequences that in their propagation as (cascading) environmental effects would end up lethally disrupting the bodily functioning of humans? How is the accumulative probability of human extinction distributed across the entire output possibility space (or simplified: how mixed are the adjoining lethal and non-lethal possibility subspaces)? Can any necessarily less complex control system connected with/in this machinery actually keep tracking whether possible machinery outputs fall into the lethal sub-space or the non-lethal sub-space? ”
  
  This is pretty similar to Hendrycks’s natural selection argument, but with the additional piece that the goals of AIs will converge to optimizing the environment for the survival of silicon-based life.
  There are some ways to expand Hendrycks’ argument to make it more comprehensive:
  - Consider evolutionary selection at the more fundamental level of physical component interactions. Ie. not just at the macro level of agents competing for resources, since this is a leaky abstraction that can easily fail to capture underlying vectors of change.
  - Consider not only selection of local variations (ie. mutations) that introduces new functionality, but also the selection of variants connecting up with surrounding units in ways that ends up repurposing existing functionality.
  - Consider not only the concept of goals that are (able to be) explicitly tracked by the machinery itself, but also that of the implicit conditions needed by components which end up being selected for in expressions across the environment.
  Evolutionary arguments are notoriously tricky and respected scientists get them wrong all the time
  This is why we need to take extra care in modelling how evolution – as a kind of algorithm – would apply across the physical signalling pathways of AGI.
  I might share a gears-level explanation that Forrest that just gave in response to your comment.
  - Remmelt 30 Oct 2024 9:39 UTC
    1 point
    0
    Parent
    Noticing no response here after we addressed superficial critiques and moved to discussing the actual argument.
    For those few interested in questions raised above, Forrest wrote some responses: http://69.27.64.19/ai_alignment_1/d_241016_recap_gen.html
    The claims made will feel unfamiliar and the reasoning paths too. I suggest (again) taking the time to consider what is meant. If a conclusion looks intuitively wrong from some AI Safety perspective, it may be valuable to explicitly consider the argumentation and premises behind that.