I was quite surprised to see myself cited as “liking the broader category that QACI is in”—I think this claim may technically be true for some definition of “likes” and “broader category”, but tries to imply a higher level of endorsement to the casual reader than is factual.
I don’t have a very good understanding of QACI and therefore have no particularly strong opinions on QACI. It seems quite different from the kinds of alignment approaches I think about.
If Orthogonal wants to ever be taken seriously, by far the most important thing is improving the public-facing communication. I invested a more-than-fair amount of time (given the strong prior for “it won’t work” with no author credentials, proof-of-concepts, or anything that would quickly nudge that prior) trying to understand QACI, and why it’s not just gibberish (both through reading LW posts and interacting with authors/contributors on the discord server), and I’m still mostly convinced there is absolutely nothing of value in this direction.
And now there’s this 10k-word-long post, roughly the size of an actual research paper, with no early indication that there’s any value to be obtained by reading the whole thing. I know, I’m “telling on myself” by commenting without reading this post, but y’all rarely get any significant comments on LW posts about QACI (as this post points out), and this might be the reason.
The way I see it, the whole thing has the impressive balance of being extremely hand-wavy as a whole, written up in an extremely “chill and down with the kids” manner, with bits and pieces of math sprinkled in various places, often done incorrectly.
Maybe the general academic formalism isn’t the worst thing after all—you need an elevator pitch, an abstract, something to read in a minute or two that will give the general idea of what’s going on. Then an introduction, expanding on those ideas and providing some more context. And then the rest of the damn research (which I know is in a very early stage and preparadigmatic and all that—but that’s not an excuse for bad communication)
Modern AI systems (read: LLMs) don’t look like that, so Tammy thinks efforts to align them are misguided.
Starting your plan with “ignore all SOTA ML research” doesn’t sound like a winning proposition.
The important thing is, your math should be safe regardless of what human values turn out to really be, but you still need lots of info to pin those values down.
I don’t think even hardcore believers in the orthogonality thesis believe all possible sets of values are equally easy to satisfy. Starting with a value-free model is throwing away a lot of baby with the bath-water.
Moreover, explicitly having a plan of “we can satisfy any value system” leaves you wide open for abuse (and goodharting). Much better to aim from the start for a broadly socially agreeable goal like curiosity, love or freedom.
Look at every possible computer program.
I’m sure the author is aware that Solomonoff Induction is non-computable, but this article still fails to appreciate how impossible it is to even approximate. We could build a galaxy sized super-computer and still wouldn’t be able to calculate BB(7)
Overall, I would rate the probability of this plan succeeding at 0%
Ignores the fact that there’s no such thing in the real world as a “strong coherent general agent”. Normally I scoff at people who say things like “the word intelligence isn’t defined” but this is a case where the distinction between “can do anything a human can do” and “general purpose intelligence” really matters.
Value-free alignment is simultaneously impossible and dangerous
Refuses to look “inside the box” thereby throwing away all practical tools for alignment
Completely ignores everything we’ve actually learned about AI in the last 4 years
Requires calculation of an impossible to calculate function
Generally falls into the MIRI/rationalist tarpit of trying to reason about intelligent systems without actually working with/appreciating actually working systems in the real word
A lot of your arguments boil down to “This ignores ML and prosaic alignment” so I think it would be helpful if you explained why ML and prosaic alignment are important.
The obvious reply would be that ML now seems likely to produce AGI, perhaps alongside minor new discoveries, in a fairly short time. (That at least is what EY now seems to assert.) Now, the grandparent goes far beyond that, and I don’t think I agree with most of the additions. However, the importance of ML sadly seems well-supported.
QACI and PreDCA are certainly the most galaxy-brained alignment plans I’ve seen, and for reasons I can understand: if you’re a superintelligence deciding what to do in a largely unknown situation, it makes sense to start by considering all possible worlds, or even all possible models of all possible worlds, and then narrowing your focus. I haven’t delved into every detail of their algorithms or their metaphysics, but the overall feel makes sense to me. (As for the complaint that this is all wildly uncomputable, what we are seeing here is the idealized version of the calculation. The real thing would use heuristics.)
I also think the part of the plan which involves satisfying the values of the indicated person, is extremely “under-theorized”, and that’s putting it mildly. AI_0 is going to propose a design for AI_1, and the criterion is, that the person will approve of the proposed design when they see it. I suppose such a moment is inevitable in any alignment plan—a moment when some human being or group of human beings must decide, using their natural faculties of judgment, whether superalignment has been solved and it’s now safe to set the process in motion. But in that case, we need a much better idea of what it would mean to be ready for that moment and that responsibility.
We start by noting that human values are complex and hard to fully formalize, in a way that leads to good things (or even “just” avoids bad things) when maximized by a utility function.
This sentence sounds so ambiguous, I think I understand it because I already assume what you mean. I guess you mean that “it is difficult to (formalize it in a way such that)”, and not that “(it is difficult to formalize) leads to good things”. Also, I guess that “when maximized by a utility function” means “when you try to maximize them by maximizing an utility function”.
This is getting unwieldy, so we change Loc_n and simplify. Now, Loc_n is just a distribution over “counterfactual insertion functions”, i.e. functions that produce coutnerfactual worldstates from counterfactual input blobs. A single such function would be denoted GAMMA, of type / in set UPPERGAMMA_n (again, n is the size of the blob being dealt with).
Loc_n(w in UPPEROMEGA, b of length n) = a distribution over halting/deterministic/non-crazy programs that eat w and give a bitstring of length n. If f(w) = b, then our distribution puts weight the KSIMP(f) as the weight on that f. Otherwise, we weight it 0, since f didn’t give us b in the first place.
(The following math is sometimes on a Conway’s Game Of Life cellular-automata grid, which could debatably generalize>>s<< to our actual universe, or at least can help to model it. Have I mentioned that the “simulate universes a lot” mental image is going to help here?)
HOW_GOOD is a function with input [A, an action [and a hypothesis]] and output [0;1].
Score(action) is another extended sum. For every h in HYPOTHESIS, we calculate PRIOR(h) * LOOKS_LIKE_THIS_WORLD(h) * HOW_GOOD(action, h). Sum them up, and we get Score(action).
An idea Tammy uses frequently: Types are sets. That is, a type-signature for an entity says “the entity is within the set of all [whatever]s”. Yes of course the distinction is unclear at this very moment. Tammy just invented QACI within the past year, it’s still a work in progress, and even the background math’s probably not all done. More to the point, I haven’t described most of it yet! Again, any problems that could arise are left to part 2 of this post.
I do not understand the overall meaning of all sentences from the second onwards in this paragraph.
First, we start setting up math entities. Let UPPEROMEGA be a set of worldstates, DELTA_UPPEROMEGA is the set of distributions over UPPEROMEGA, and A is a set of actions.
Thank you for narrowing my confusion over what AI_0 does.
My top question now is: how long does AI_0 need to run, and why is it safe from other AIs during that period?
AI_0 appears to need a nontrivial fraction of our future lightcone to produce a decent approximation of the intended output. Yet keeping it boxed seems to leave the world vulnerable to other AIs.
At over 15k tokens, reading the full article requires significant time and effort. While it aims to provide comprehensive detail on QACI, much of this likely exceeds what is needed to convey the core ideas. The article could be streamlined to more concisely explain the motivation, give mathematical intuition, summarize the approach, and offer brief examples. Unnecessary elaborations could be removed or included as appendices. This would improve clarity and highlight the essence of QACI for interested readers.
My acquired understanding is that the article summarizes a new AI alignment approach called QACI (Question-Answer Counterfactual Interval). QACI involves generating a factual “blob” tied to human values, along with a counterfactual “blob” that could replace it. Mathematical concepts like realityfluid and Loc() are used to identify the factual blob among counterfactuals. The goal is to simulate long reflection by iteratively asking the AI questions and improving its answers. QACI claims to avoid issues like boxing and embedded agency through formal goal specification.
While the article provides useful high-level intuition, closer review reveals limitations in QACI’s theoretical grounding. Key concepts like realityfluid need more rigor, and details are lacking on how embedded agency is avoided. There are also potential issues around approximation and vulnerabilities to adversarial attacks that require further analysis. Overall, QACI seems promising but requires more comparison with existing alignment proposals and formalization to adequately evaluate. The article itself is reasonably well-written, but the length and inconsistent math notation create unnecessary barriers.
This seems like a rhetorical question. Both of our top-level comments seem to reach similar conclusions, but it seems you regret the time spent engaging with the OP to write your comment. This took 10 min, most spent writing this comment. What is your point?
My point is that your comment was extremely shallow, with a bunch of irrelevant information, and in general plagued with the annoying ultra-polite ChatGPT style—in total, not contributing anything to the conversation. You’re now defensive about it and skirting around answering the question in the other comment chain (“my endorsed review”), so you clearly intuitively see that this wasn’t a good contribution. Try to look inwards and understand why.
Just a content note—I’d prefer a less click-baity, more informative title for easy searchability in the future.
I was quite surprised to see myself cited as “liking the broader category that QACI is in”—I think this claim may technically be true for some definition of “likes” and “broader category”, but tries to imply a higher level of endorsement to the casual reader than is factual.
I don’t have a very good understanding of QACI and therefore have no particularly strong opinions on QACI. It seems quite different from the kinds of alignment approaches I think about.
If Orthogonal wants to ever be taken seriously, by far the most important thing is improving the public-facing communication. I invested a more-than-fair amount of time (given the strong prior for “it won’t work” with no author credentials, proof-of-concepts, or anything that would quickly nudge that prior) trying to understand QACI, and why it’s not just gibberish (both through reading LW posts and interacting with authors/contributors on the discord server), and I’m still mostly convinced there is absolutely nothing of value in this direction.
And now there’s this 10k-word-long post, roughly the size of an actual research paper, with no early indication that there’s any value to be obtained by reading the whole thing. I know, I’m “telling on myself” by commenting without reading this post, but y’all rarely get any significant comments on LW posts about QACI (as this post points out), and this might be the reason.
The way I see it, the whole thing has the impressive balance of being extremely hand-wavy as a whole, written up in an extremely “chill and down with the kids” manner, with bits and pieces of math sprinkled in various places, often done incorrectly.
Maybe the general academic formalism isn’t the worst thing after all—you need an elevator pitch, an abstract, something to read in a minute or two that will give the general idea of what’s going on. Then an introduction, expanding on those ideas and providing some more context. And then the rest of the damn research (which I know is in a very early stage and preparadigmatic and all that—but that’s not an excuse for bad communication)
Starting your plan with “ignore all SOTA ML research” doesn’t sound like a winning proposition.
I don’t think even hardcore believers in the orthogonality thesis believe all possible sets of values are equally easy to satisfy. Starting with a value-free model is throwing away a lot of baby with the bath-water.
Moreover, explicitly having a plan of “we can satisfy any value system” leaves you wide open for abuse (and goodharting). Much better to aim from the start for a broadly socially agreeable goal like curiosity, love or freedom.
I’m sure the author is aware that Solomonoff Induction is non-computable, but this article still fails to appreciate how impossible it is to even approximate. We could build a galaxy sized super-computer and still wouldn’t be able to calculate BB(7)
Overall, I would rate the probability of this plan succeeding at 0%
Ignores the fact that there’s no such thing in the real world as a “strong coherent general agent”. Normally I scoff at people who say things like “the word intelligence isn’t defined” but this is a case where the distinction between “can do anything a human can do” and “general purpose intelligence” really matters.
Value-free alignment is simultaneously impossible and dangerous
Refuses to look “inside the box” thereby throwing away all practical tools for alignment
Completely ignores everything we’ve actually learned about AI in the last 4 years
Requires calculation of an impossible to calculate function
Generally falls into the MIRI/rationalist tarpit of trying to reason about intelligent systems without actually working with/appreciating actually working systems in the real word
A lot of your arguments boil down to “This ignores ML and prosaic alignment” so I think it would be helpful if you explained why ML and prosaic alignment are important.
The obvious reply would be that ML now seems likely to produce AGI, perhaps alongside minor new discoveries, in a fairly short time. (That at least is what EY now seems to assert.) Now, the grandparent goes far beyond that, and I don’t think I agree with most of the additions. However, the importance of ML sadly seems well-supported.
QACI and PreDCA are certainly the most galaxy-brained alignment plans I’ve seen, and for reasons I can understand: if you’re a superintelligence deciding what to do in a largely unknown situation, it makes sense to start by considering all possible worlds, or even all possible models of all possible worlds, and then narrowing your focus. I haven’t delved into every detail of their algorithms or their metaphysics, but the overall feel makes sense to me. (As for the complaint that this is all wildly uncomputable, what we are seeing here is the idealized version of the calculation. The real thing would use heuristics.)
I also think the part of the plan which involves satisfying the values of the indicated person, is extremely “under-theorized”, and that’s putting it mildly. AI_0 is going to propose a design for AI_1, and the criterion is, that the person will approve of the proposed design when they see it. I suppose such a moment is inevitable in any alignment plan—a moment when some human being or group of human beings must decide, using their natural faculties of judgment, whether superalignment has been solved and it’s now safe to set the process in motion. But in that case, we need a much better idea of what it would mean to be ready for that moment and that responsibility.
TYPO THREAD
This sentence sounds so ambiguous, I think I understand it because I already assume what you mean. I guess you mean that “it is difficult to (formalize it in a way such that)”, and not that “(it is difficult to formalize) leads to good things”. Also, I guess that “when maximized by a utility function” means “when you try to maximize them by maximizing an utility function”.
I do not understand the overall meaning of all sentences from the second onwards in this paragraph.
Missing math formatting henceforth
After finishing the post, I infer that your way or writing math is idiosyncratic, and not lack of formatting.
It does not look clearer to me than more standard math notation, although I guess it may be easier to write for you?
Misformatting?
Thank you for narrowing my confusion over what AI_0 does.
My top question now is: how long does AI_0 need to run, and why is it safe from other AIs during that period?
AI_0 appears to need a nontrivial fraction of our future lightcone to produce a decent approximation of the intended output. Yet keeping it boxed seems to leave the world vulnerable to other AIs.
At over 15k tokens, reading the full article requires significant time and effort. While it aims to provide comprehensive detail on QACI, much of this likely exceeds what is needed to convey the core ideas. The article could be streamlined to more concisely explain the motivation, give mathematical intuition, summarize the approach, and offer brief examples. Unnecessary elaborations could be removed or included as appendices. This would improve clarity and highlight the essence of QACI for interested readers.
My acquired understanding is that the article summarizes a new AI alignment approach called QACI (Question-Answer Counterfactual Interval). QACI involves generating a factual “blob” tied to human values, along with a counterfactual “blob” that could replace it. Mathematical concepts like realityfluid and Loc() are used to identify the factual blob among counterfactuals. The goal is to simulate long reflection by iteratively asking the AI questions and improving its answers. QACI claims to avoid issues like boxing and embedded agency through formal goal specification.
While the article provides useful high-level intuition, closer review reveals limitations in QACI’s theoretical grounding. Key concepts like realityfluid need more rigor, and details are lacking on how embedded agency is avoided. There are also potential issues around approximation and vulnerabilities to adversarial attacks that require further analysis. Overall, QACI seems promising but requires more comparison with existing alignment proposals and formalization to adequately evaluate. The article itself is reasonably well-written, but the length and inconsistent math notation create unnecessary barriers.
Is it a thing now to post LLM-generated comments on LW?
This seems like a rhetorical question. Both of our top-level comments seem to reach similar conclusions, but it seems you regret the time spent engaging with the OP to write your comment. This took 10 min, most spent writing this comment. What is your point?
My point is that your comment was extremely shallow, with a bunch of irrelevant information, and in general plagued with the annoying ultra-polite ChatGPT style—in total, not contributing anything to the conversation. You’re now defensive about it and skirting around answering the question in the other comment chain (“my endorsed review”), so you clearly intuitively see that this wasn’t a good contribution. Try to look inwards and understand why.
Is this an AI summary (or your own writing)? If so, would you mind flagging it as such?
This is my endorsed review of the article.
It’s absolutely fine if you want to use AI to help summarize content, and then you check that content and endorse it.
I still ask if you could please flag it as such, so the reader can make an informed decision about how to read/respond to the content?