thank you for clarifying.
Logan Zoellner
It’s easy to imagine a situation where an AI has a payoff table like:
| defect | don’t defect
------------------------succeed| 100 | 10
--- ------------------------------
fail | X | n/a
where we want to make X as low as possible (and commit to doing so)
For example a paperclip maximizing AI might be able to make 10 paperclips while cooperating with humans, 100 by successfully defecting against humans
seems to violate not only the “don’t negotiate with terrorists” rule, but even worse the “especially don’t signal in advance that you intend to negotiate with terrorists” rule.
Those all sound line fairly normal beliefs.
Like… I’m trying to figure out why the title of the post is “I am not a successionist” and not “like many other utilitarians I have a preference for people who are biologically similar to me, I have things in common with, or I am close friends with. I believe when optimizing utility in the far future we should take these things into account”
Even though can’t comment on OP’s views, you seemed to have a strong objection to my “we’re merely talking price” statement (i.e. when calculating total utility we consider tradeoffs between different things we care about).
Edit:
to put it another way, if I wrote a post titled “I am a successionist” in which I said something like: “I want my children to have happy lives and their children to have happy lives, and I believe they can define ‘children’ in whatever way seems best to them”, how would my views actually different from yours (or the OPs)?
I genuinely want to know what you mean by “kind”.
If your grandchildren adopt an extremely genetically distant human, is that okay? A highly intelligent, social and biologically compatible alien?
You’ve said you’re fine with simulations here, so it’s really unclear.
I used “markov blanket” to describe what I thought you might be talking about: a continuous voluntary process characterized by you and your decedents making free choices about their future. But it seems like you’re saying “markov blanket bad”, and moreover that you thought the distinction should have be obvious to me.
Even if there isn’t a bright-line definition, there must be some cluster of traits/attributes you are associating with the word “kind”.
from a preference toward my own kind.
What is it about your kind that you care about? Is it DNA? Shared culture? Merely there being a continuous Markov blanket connecting you and them? If you’re okay with your grandchildren replacing you, you are in a certain sense a successionist. We’re merely talking price.
I suppose you could be opposed to some scheme like: we will completely annihilate the universe and create a new one with no logical connection to our own, but I don’t think anybody is planning that. Moreso that AI will be “children of the mind” vs biological children.
alas, this isn’t really enforceable in the USA given the 1st amendment.
but we eventually die.
Dying is a symmetric problem, it’s not like we can’t die without AGI. If you want to calculate p(human extinction | AGI) you have to consider ways AGI can both increase and decrease p(extinction). And the best methods currently available to humans to aggregate low probability statistics are expert surveys, groups of super-forecasters, or prediction markets, all of which agree on pDoom <20%.
this experiment has been done before.
If you have a framing of the AI Doom argument that can cause a consensus of super-forecasters (or AI risk skeptics, or literally any group that has an average pDoom<20%) to change their consensus, I would be exceptionally interested in seeing that demonstrated.
Such an argument would be neither bad nor weak, which is precisely the type of argument I have been hoping to find by writing this post.
> Please notice that your position is extremely non-intuitive to basically everyone.Please notice that Manifold both thinks AGI soon and pDoom low.
I think this cumulative argument works:
1. there are dozens of ways AI can prevent a mass extinction event at different stages at its existence.2. …
If you make a list of 1000 bad things and I make a list of 1000 good things, I have no reason to think that you are somehow better at making lists than prediction markets or expert surveys.
Are you genuinely unfamiliar with what is happening to the uyghurs, or is this a rhetorical question?
Why do I expect the trend to be superexponential? Well, it seems like it sorta has to go superexponential eventually. Imagine: We’ve got to AIs that can with ~100% reliability do tasks that take professional humans 10 years. But somehow they can’t do tasks that take professional humans 160 years?
I don’t think this means the real thing has to go hyper-exponential, just that “how long does it take humans to do a thing?” is a good metric when AI is sub-human but a poor one when AI is superhuman.
If we had a metric “how many seconds/turn does a grandmaster have to think to beat the current best chess-playing AI”, it would go up at a nice steady rate until shortly after DeepBlue at which point it shoots to infinity. But if we had a true measurement of chess quality, we wouldn’t see any significant spike at the human-level.
I’ll now present the fastest scenario for AI progress that I can articulate with a straight face. It addresses the potential challenges that figured into my slow scenario.
This seems incredibly slow for “the fastest scenario you can articulate”. Surely the fastest is more like:
EY is right, there is an incredibly simple algorithm that describes true ‘intelligence’. Like humans, this algorithm is 1000x more data and compute efficient than existing deep-learning networks. On midnight of day X, this algorithm is discovered by <a person/an LLM/an exhaustive search over all possible algorithms>. By 0200 of day X, the algorithm has reached the intelligence of a human being. It quickly snowballs by earning money on Mechanical Turk and using that money to rent out GPUs on AWS. By 0400 the algorithm has cracked nanotechnology and begins converting life into computronium. Several minutes later, life as we know it on Earth has ceased to exist.
The hope is to use the complexity of the statement rather than mathematical taste.
I understand the hope, I just think it’s going to fail (for more or less the same reason it fails with formal proof).
With formal proof, we have Godel’s speedup, which tells us that you can turn a Godel statement in a true statement with a ridiculously long proof.
You attempt to get around this by replacing formal proof with “heuristic”, but whatever your heuristic system, it’s still going to have some power (in the Turing hierarchy sense) and some Godel statement. That Godel statement is in turn going to result in a “seeming coincidence”.
Wolfram’s observation is that this isn’t some crazy exception, this is the rule. Most true statements in math are pretty arbitrary and don’t have shorter explanations than “we checked it and its true”.
The reason why mathematical taste works is that we aren’t dealing with “most true statements”, we’re only dealing with statements that have particular beauty or interest to Mathematicians.
It may seem like cheating to say that human mathematicians can do something that literally no formal mathematical system can do. But if you truly believe that, the correct response would be to respond when asked “is pi normal” with “I don’t know”.
The reason why your intuition is throwing you off is because you keep thinking of coincidences as “pi is normal” and not “we picked an arbitrary CA with 15k bits of complexity and ran it for 15k steps but it didn’t stop. I guess it never terminates.”
It sounds like you agree “if a Turing machine goes for 100 steps and then stops” this is ordinary and we shouldn’t expect an explanation. But also believe “if pi is normal for 10*40 digits and then suddenly stops being normal this is a rare and surprising coincidence for which there should be an explanation”.
And in the particular case of pi I agree with you.
But if you start using this principle in general it is not going to work out well for you. Most simple to describe sequences that suddenly stop aren’t going to have nice pretty explanations.
Or to put it another way: the number of things which are nice (like pi) are dramatically outnumbered by the number of things that are arbitrary (like cellular automata that stop after exactly 100 steps).
I would absolutely love if there was some criteria that I could apply to tell me whether something is nice or arbitrary, but the Halting Problem forbids this. The best we can do is mathematical taste. If mathematicians have been studying something for a long time and it really does seem nice, there is a good chance it is.
I doubt that weakening from formal proof to heuristic saves the conjecture. Instead I lean towards Stephen Wolfram’s Computational Irreducibly view of math. Some things are true simply because they are true and in general there’s no reason to expect a simpler explanation.
In order to reject this you would either have to assert:
a) Wolfram is wrong and there are actually deep reasons why simple systems behave precisely the way they do
or
b) For some reason computational irreducibly applies to simple things but not to infinite sets of the type mathematicians tend to be interested in.I should also clarify that in a certain sense I do believe b). I believe that pi is normal because something very fishy would have to be happening for it to not be.
However, I don’t think this holds in general.
With Collatz, for example, we are already getting close to the hairy “just so” Turing machine like behavior where you would expect the principle to fail.
Certainly, if one were to collect all the Collatz-like systems that arise from Turing Machines I would expect some fraction of them to fail the no-coincidence principle.
The general No-Coincidence principle is almost certainly false. There are lots of patterns in math that hold for a long time before breaking (e.g. Skewe’s Number) and there are lots of things that require astronomically large proofs (e.g Godel’s speed-up theorem). It would be an enormous coincidence for both of these cases to never occur at once.
I have no reason to think your particular formalization would fare better.
If we imagine a well-run Import-Export Bank, it should have a higher elasticity than an export subsidy (e.g. the LNG terminal example). Of course if we imagine a poorly run Import-Export Bank...
One can think of export subsidy as the GiveDirectly of effective trade deficit policy: pretty good and the standard against which others should be measured.
I don’t know, but many people do.
Just to be clear, your position is that 25 years from now when LLMs are trained using trillions of times as much compute and routinely doing task that take humans months to years that they will still be unable to run a business worth $1B?