ThomasCederborg comments on Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure

ThomasCederborg 11 Oct 2024 1:14 UTC
3 points
0
I changed the title from: ``A Pivotal Act AI might not buy a lot of time″ to: ``Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure″.
As explained by Martin Randall, the statement: ``something which does not buy ample time is not a pivotal act″ is false (based on the Arbital Guarded Definition of Pivotal Act). Given your ``Agreed react″ to that comment, this issue seems to be settled. In the first section of the present comment, I explain why I still think that the old title was a mistake. The second section outlines a scenario that better illustrates that a Pivotal Act AI might not buy a lot of time.
Why the old title was a mistake
The old title implied that launching the LAI was a very positive event. With the new title, launching the LAI may or may not have been a positive event. This was the meaning that I intended.
Launching the LAI drastically increased the probability of a win by shutting down all competing AI projects. It however also increased risks from scenarios where someone successfully hits a bad alignment target. This can lead to a massively worse than extinction outcome (for example along the lines of the outcome implied by PCEV). In other words: launching LAI may or may not have been a positive event. Thus, launching the LAI may or may not have been a Pivotal Act according to the Arbital Guarded Definition (which requires the event to be very positive).
The old title does not seem to be incompatible with the actual text of the post. But it is incompatible with my intended meaning. I didn’t intend to specify whether or not LAI was a positive event. Because the argument about the need for Alignment Target Analysis (ATA) goes through regardless of whether or not launching LAI was a good idea. Regardless of whether or not launching LAI was a positive event, ATA work needs to start now to reduce risks. Because in both cases, ATA progress is needed to reduce risks. And in both cases, there is not a lot of time to do ATA later. (ATA is in fact more important in scenarios where launching the LAI was in fact a terrible mistake)
As I show in my other reply: there is a well established convention of using the term Pivotal Act as a shorthand for shutting down all competing AI projects. As can be seen by looking at the scenario in the post: this might not buy a lot of time. That is how I was using the term when I picked the old title.
A scenario that better illustrates why a Pivotal Act AI might not buy a lot of time
This section outlines a scenario where an unambiguous Pivotal Act is instantly followed by a very severe time crunch. It is possible to see that a Pivotal Act AI might not buy a lot of time by looking at the scenario in the post. But the present section will outline a scenario that better illustrates this fact. (In other words: this section outlines a scenario for which the old title would actually be a good title.) In this new scenario, a Pivotal Act dramatically reduces the probability of extinction by shutting down all unauthorised AI projects. It also completely removes the possibility of anything worse than extinction. Right after the Pivotal Act, there is a frenzied race against the clock to make enough progress on ATA before time runs out. Failure results in a significant risk of extinction.
Consider the case where Dave launches Dave’s AI (DAI). If DAI had not been launched, everyone would have almost certainly been killed by some other AI. DAI completely and permanently shuts down all competing AI projects. DAI also reliably prevents all scenarios where designers fail to hit the alignment target that they are aiming at. Due to Internal Time Pressure, a Sovereign AI must then be launched very quickly (discussions of Internal Time Pressure can be found here, and here, and here). There is very little time to decide what alignment target to aim at. (The point made in this section is not sensitive to who gave Dave permission to launch DAI. Or sensitive to who DAI will defer to for the choice of alignment target. But for the sake of concreteness, let’s say that the UN security council authorised DAI. And that DAI defers to a global electorate regarding the choice of alignment target).
By the time Dave launches DAI, work on ATA has already progressed very far. There already exist many alignment targets that would in fact lead to an unambiguous win (somehow, describing these outcomes as a win is objectively correct). Only one of the many proposed alignment targets still has an unnoticed problem. And this problem is not nearly as severe as the problem with PCEV. People take the risks of unnoticed problems very seriously. But due to severe Internal Time Pressure, there is not much they can do with this knowledge. The only option is to use their limited time to analyse all alignment targets that are being considered. (many very optimistic assumptions are made regarding both DAI and the level of ATA progress. This is partly to make sure that readers will agree that the act of launching DAI should count as a Pivotal Act. And partly to show that ATA might still be needed, despite these very optimistic assumptions).
The only alignment target that is not a clear win, is based on maximising the sum of re-normalised utility functions. The proposed AI includes a proposed way of mapping a human to a utility function. This always results in a perfect representation of what the human wants. (And there are no definitional issues with this mapping). These functions are then renormalised to have the same variance (as discussed here). Let’s write VarAI for this AI. VarAI maximises the sum of the renormalised functions. The aggregation method described above has a problem that is obvious in retrospect. If that problem is explained, then it is clear that VarAI is an unacceptable alignment target. However, in this scenario, no one has noticed this problem. The question is now whether or not anyone will notice the problem (before an alignment target needs to be settled on).
Due to the problem with this aggregation method, VarAI would create a Soft Maximisation version of a Suffering Reducing AI (SMSRAI) as a successor AI (if VarAI is successfully implemented and pointed at the population in this thought experiment). So, if the proponents of VarAI happens to win the political fight, then the result would be SMSRAI. SMSRAI would work to reduce both average suffering and population size (without doing anything drastic). The quality of most peoples lives would increase very dramatically. Many people would choose to spend centuries living a very nice life. But after some conversations with SMSRAI, few would choose to live for millennia. No one would choose to live for many millennia. Humanity would very gradually go extinct in a few dozen generations. By the time it happens, no one would find this problematic in any way. Changes in attitudes are very gradual, and no one ever feels pressured or manipulated by SMSRAI (for example: if you ask SMSRAI to leave you alone, it backs off for a couple of centuries before politely asking again if you want to talk).
If anyone finds the problem with the aggregation method in time, then VarAI will be dropped as a potential alignment target. (Once the problem is explained, it is retrospectively obvious why VarAI is completely unacceptable as an alignment target). If no one finds this problem in time however, it is possible that VarAI will win the political fight.
In order to avoid accidental interference, Dave designed DAI to not discuss ATA. (As a general strategy for reducing risks from definitional issues, Dave actually made sure that DAI only does things that are absolutely necessary for the completion of a carefully chosen set of well defined tasks). So the only way to remove the possibility of extinction, is to reach the needed level of ATA understanding before it’s too late. (If the problem with this aggregation method is not explained to them, their only path to understanding is to try doing ATA).
Despite severe Internal Time Pressure, the possibility of an unnoticed problem in an alignment target is taken very seriously. So a last judge safety add on is implemented. The last judge is extrapolated using a different extrapolation method than the one used in VarAI (in order to give the last judge a chance to veto a bad extrapolation method. The VarAI extrapolation definitions are in fact somehow objectively correct. But there is no way of verifying that fact). The result of the last judge extrapolation method turns out to be a very anxious mind. The result is a mind that is in general not comfortable with objecting to things (such as the extrapolation method that it is the output of, or the outcome implied by an alignment target). This mind is very reluctant to veto a scenario where no one is suffering, and where almost everyone are very happy with all aspects of how things turn out (SMSRAI very gradually, over many generations, ``helps people realise″ that the outcome is actually a good outcome. And people genuinely are having a very nice time, for a lot longer than most people expected). So the off switch is not triggered.
If Dave had not launched DAI, all humans would very likely have been killed very soon by some other AI. So I think a lot of people would consider Launching DAI to be a Pivotal Act. (It completely upset the game board. It drastically increased the probability of a win. It was a very positive event according to a wide range of value systems). But if someone wants humanity to go on existing (or wants to personally live a super long life), then there is not a lot of time to find the problem with VarAI (because without sufficient ATA progress, there still exists a significant probability of extinction). So, launching DAI was a Pivotal Act. And launching DAI did not result in a lot of time to work on ATA. Which demonstrates that a Pivotal Act AI might not buy a lot of time.
One can use this scenario as an argument in favour of starting ATA work now. It is one specific scenario that exemplifies a general class of scenarios: scenarios where starting ATA work now, would further reduce an already small risk of a moderately bad outcome. It is a valid argument. But it is not the argument that I was trying to make in my post. I was thinking of something a lot more dangerous. I was imagining a scenario where a bad alignment target is very likely to get successfully implemented unless ATA progresses to the needed levels of insight before it is too late. And I was imagining an alignment target that implied a massively worse than extinction outcome (for example along the lines of the outcome implied by PCEV). I think this is a stronger argument in favour of starting work on ATA now. And this interpretation was ruled out by the old title (which is why I changed the title).
(a brief tangent: if someone expects everything to turn out well. But would like to work on ATA in order to further reduce a small probability of something going moderately bad. Then I would be very happy to collaborate with such a person in a future ATA project. Having very different perspectives in an ATA project sounds like a great idea. An ATA project is very different from a technical design project where a team is trying to get something implemented that will actually work. There is really no reason for people to have similar worldviews or even compatible ontologies. It is a race against time to find a conceptual breakthrough of an unknown type. It is a search for an unnoticed implicit assumption of an unknown type. So genuinely different perspectives sounds like a great idea)
In summary: ``A Pivotal Act AI might not buy a lot of time″ is in fact a true statement. And it is possible to see this by looking at the scenario outlined in the post. But it was a mistake to use this statement as the title for this post. Because it implies things about the scenario that I did not intend to imply. So I changed the title and outlined a scenario that is better suited for illustrating that a Pivotal Act AI might not buy a lot of time.
PS:
I upvoted johnswentworth’s comment. My original title was a mistake. And the comment helped me realise my mistake. I hope that others will post similar comments on my posts in the future. The comment deserves upvotes. But I feel like I should ask about these agreement votes.
The statement: ``something which does not buy ample time is not a pivotal act″ is clearly false. Martin Randall explained why the statement is false (helpfully pulling out the relevant quotes from the texts that johnswentworth cited). And then johnswentworth did an ``Agreed reaction″ on Martin Randall’s explanation of why the statement is false. After this however, johnswentworth’s comment (with the statement that had already been determined to be false) was agree voted to plus 7. That seemed odd to me. So I wanted to ask about it. (My posts sometimes question deeply entrenched assumptions. And johnswentworth’s comment sort of looks like criticism (at least if one only skims the post and the discussion). So maybe there is no great mystery here. But I still wanted to ask about this. Mostly in case someone has noticed an object level error in my post. But I am also open to terminology feedback)
What links here?
- ThomasCederborg's comment on Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure by ThomasCederborg (EA Forum; 11 Oct 2024 1:19 UTC; 1 point)

ThomasCederborg comments on Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure

Why the old title was a mistake

A scenario that better illustrates why a Pivotal Act AI might not buy a lot of time

PS: