Consistency check: After coming up with a conclusion, check that it’s consistent with other simple facts you know. This lets you catch simple errors very quickly.
Give an example: If you’ve got an abstract object, think of the simplest possible object which instantiates it, preferably one you’ve got lots of good intuitions about. This resolves confusion like nothing else I know.
Proving too much: After you’ve come up with a clever argument, see if it can be used to prove another claim, ideally the opposite claim. It can massively weaken the strength of arguments at little cost.
Prove it another way: Don’t leave things at one proof, find another. It shines light on flaws in your understanding, as well as deeper principles.
Are any of these satisfactory?
Algon
I usually say “assuming no AGI”, but that’s to people who think AGI is probably coming soon.
Thanks! Clicking on the triple dots didn’t display any options when I posted this comment. But they do now. IDK what went wrong.
This is great! But one question: how can I actually make a lens? What do I click on?
Great! I’ve added it to the site.
I thought it was better to exercise until failure?
Do you think this footnote conveys the point you were making?
As alignment research David Dalrymple points out, another “interpretation of the NFL theorems is that solving the relevant problems under worst-case assumptions is too easy, so easy it’s trivial: a brute-force search satisfies the criterion of worst-case optimality. So, that being settled, in order to make progress, we have to step up to average-case evaluation, which is harder.” The fact that designing solving problems for unnecessarily general environments is too easy crops up elsewhere, in particular in Solomonoff Induction. There, the problem is to assume a computable environment and predict what will happen next. The algorithm? Run through every possible computable environment and average their predictions. No algorithm can do better at this task. But for less general tasks, designing an optimal algorithm becomes much harder. But eventually, specialization makes things easy again. Solving tic-tac-toe is trivial. Between total generality and total specialization is where the most important, and most difficult, problems in AI lay.
What are the “no free lunch” theorems?
I think mesa-optimizers could be a major-problem, but there are good odds we live in a world where they aren’t. Why do I think they’re plausible? Because optimization is a pretty natural capability, and a mind being/becoming an optimizer at the top-level doesn’t seem like a very complex claim, so I assign decent odds to it. There’s some weak evidence in favour of this too, e.g. humans not optimizing of what the local, myopic evolutionary optimizer which is acting on them is optimizing for, coherence theorems etc. But that’s not super strong, and there are other simple hypotheses for how things go, so I don’t assign more than like 10% credence to the hypothesis.
It’s still not obvious to me why adversaries are a big issue. If I’m acting against an adversary, it seems like I won’t make counter-plans that lead to lots of side-effects either, for the same reasons they won’t.
Could you unpack both clauses of this sentence? It’s not obvious to me why they are true.
I was thinking about this a while back, as I was reading some comments by @tailcalled where they pointed out this possibility of a “natural impact measure” when agents make plans. This relied on some sort of natural modularity in the world, and in plans, such that you can make plans by manipulating pieces of the world which don’t have side-effects leaking out to the rest of the world. But thinking through some examples didn’t convince me that was the case.
Though admittedly, all I was doing was recursively splitting my instrumental goals into instrumental sub-goals and checking if they wound up seeming like natural abstractions. If they had, perhaps that would reflect an underlying modularity in plan-making in this world that is likely to be goal-independent. They didn’t, so I got more pessimistic about this endeavour. Though writing this comment out, it doesn’t seem like those examples I worked through are much evidence. So maybe this is more likely to work than I thought.
Thanks for the recommendation! I liked ryan’s sketches of what capabilities an Nx AI R&D labor AIs might possess. Makes things a bit more concrete. (Though I definitely don’t like the name.) I’m not sure if we want to include this definition, as it is pretty niche. And I’m not convinced of its utility. When I tried drafting a paragraph describing it, I struggled to articulate why readers should care about it.
Here’s the draft paragraph.
“Nx AI R&D labor AIs: The level of AI capabilities that is necessary for increasing the effective amount of labor working on AI research by a factor of N. This is not the same thing as the capabilities required to increase AI progress by a factor of N, as labor is just one input to AI progress. The virtues of this definition include: ease of operationalization, [...]”
What are the differences between AGI, transformative AI, and superintelligence?
Thanks for the feedback!
I’m working on some articles why powerful AI may come soon, and why that may kill us all. The articles are for a typical smart person. And for knowledgable people to share to their family/friends. Which intros do you prefer, A or B.
A) “Companies are racing to build smarter-than-human AI. Experts think they may succeed in the next decade. But more than “building” it, they’re “growing” it — and nobody knows how the resulting systems work. Experts vehemently disagree on whether we’ll lose control and see them kill us all. And although serious people are talking about extinction risk, humanity does not have a plan. The rest of this section goes into more detail about how all this could be true.”
B) “Companies are racing to grow smarter-than-human AIs. More and more experts think they’ll succeed within the next decade. And we do grow modern AI — which means no one knows how they work, not even their creators. All this is in spite of the vehement disagreement amongst experts about how likely it is that smarter-than-human AI will kill us all. Which makes the lack of a plan on humanity’s part for preventing these risks all the more striking.These articles explain why you should expect smarter than human AI to come soon, and why that may lead to our extinction. ”
Does this text about Colossus match what you wanted to add?
Colossus: The Forbin Project also depicts an AI take-over due to instrumental convergence. But what differentiates it is the presence of two AIs, which collude with each other to take over. In fact, their discussion of their shared situation, being in control of their creators nuclear defence systems, is what leads to their decision to take over from their creators. Interestingly, the back-and-forth between the AI is extremely rapid, and involves concepts that humans would struggle to understand. Which made it impossible for its creators to realize the conspiracy that was unfolding before their eyes.
That’s a good film! A friend of mine absolutely loves it.
Do you think the Forbin Project illustrates some aspect of misalignment that isn’t covered by this article?
How do fictional stories illustrate AI misalignment?
Huh, I definitely wouldn’t have ever recommended someone play 5x5. I’ve never played it. Or 7x7. I think I would’ve predicted playing a number of 7x7 games would basically give you the “go experience”. Certainly, 19x19 does feel like basically the same game as 9x9, except when I’m massively handicapping myself. I can beat newbies easily with a 9 stone handicap in 19x19, but I’d have to think a bit to beat them in 9x9 with a 9 stone handicap. But I’m not particularly skilled, so maybe at higher levels it really is different?
Rarely. I’m doubtful my experiences are representative though. I don’t recall anyone being confused by my saying “assuming no AGI”. But even when speaking to the people who’ve thought it is a long ways off or haven’t thought up it too deeply, we were still in a social context where “AGI soon” was within the overton window.