I agree with your caveats.
However I’m not egocentric enough to imagine myself as particularly interesting to potential simulators. And so that hypothetical doesn’t significantly change my beliefs.
I agree with your caveats.
However I’m not egocentric enough to imagine myself as particularly interesting to potential simulators. And so that hypothetical doesn’t significantly change my beliefs.
This line of reasoning is a lot like the Doomsday argument.
I am well aware that my assumptions are my assumptions. They work for me, but you may want to assume something different.
I’ve personally interpreted the Doomsday argument that way since I ran across it about 30 years ago. Honestly AI x-risk is pretty low on my list of things to worry about.
The simulation argument never impressed me. Every simulation that I’ve seen ran a lot more slowly than the underlying reality. Therefore even if you do get a lengthy regress of simulations stacked on on simulations, most of the experience to have is in the underlying reality, and not in the simulations. Therefore I’ve concluded that I probably exist in reality, not a simulation.
If all beliefs in a Bayesian network are bounded away from 0 and 1, then an approximate update can be done to arbitrary accuracy in polynomial time.
The pathological behavior shows up here because there are two competing but mutually exclusive belief systems. And it is hard to determine when your world view should flip.
I hope that this makes it more interesting to you.
I like your second argument. But to be honest, there is a giant grey area between “non-replicable” and “fraudulent”. It is hard to draw the line between, “Intellectually dishonest but didn’t mean to deceive” and “fraudulent”. And even if you could define the line, we lack the data to identify what falls on either side.
It is worth reading Time to assume that health research is fraudulent until proven otherwise? I believe that this case is an exception to Betteridge’s law—I think that the answer is yes. Given the extraordinary efforts that the editor of Anaesthesia needed to catch some fraud, I doubt that most journals do it. And because I believe that, I’m inclined to a prior that says that non-replicability suggests at least even odds of fraud.
As a sanity check, high profile examples like the former President of Stanford demonstrate that fraudulent research is accepted in top journals, leading to prestigious positions. See also the case of Dr. Francesca Gino, formerly of Harvard.
And, finally, back to the line between intellectual dishonesty and fraud. I’m inclined to say that they amount to the same thing in practice, and we should treat them similarly. And the combined bucket is a pretty big problem.
Here is a good example. The Cargo Cult Science speech happened around 50 years ago. Psychologists have objected ever since to being called a pseudoscience by many physicists. But it took 40 years before they finally did what Feynman told them to, and tried replicating their results. They generally have not acknowledged Feynman’s point, nor have the started fixing the other problems that Feynman talked about.
Given that, how much faith should we put in psychology?
The effect isn’t large, but you’d lose that bet. See Nonreplicable publications are cited more than replicable ones.
I have a simple solution for Pascal muggers. I assume that, unless I have good and specific reason to believe otherwise, the probability of achieving an extreme utility u is bounded above by O(1/u). And therefore after some point, I can replace whatever extreme utility is quoted in an argument with a constant. Which might be arbitrarily close to 0.
In some contexts this is obvious. For example if someone offers you $1000 to build a fence in their yard, you might reasonably believe them. You might or might not choose to do it. If they offered you $10,000, that’s suspiciously high for the job. You reasonably would worry that it is a lie, and might or might not choose the believable $1000 offer over it as having higher expected value. If you were offered $1,000,000 to build the same fence, you’d assume it was a lie and definitely wouldn’t take the job.
By this reasoning, what should you think in the limit as the job stays finite, and the reward tends towards infinity? You hit that limit with the claim that for a finite amount of worship in your lifetime (and a modest tithe) you’ll get an infinite reward in Heaven. This is my favorite argument against Pascal’s wager.
But now let’s bring it back to something more reasonable. Longtermism makes moral arguments in terms of improving the prospect of a distant future teaming with consciousness. But if you trot out the Doomsday argument, the a priori odds of this future are proportional to 1 / the amount of future consciousness. Given the many ways that humanity could go extinct, and the possibilities of future space, I don’t have a strong opinion how to adjust that prior given current evidence. Therefore I treat this as a case where we hit the upper bound on the probability of the reward from the scenario is bounded above by a reasonably small constant.
My constant is small in this case because I also believe that large amounts of future consciousness go hand in hand with high likelihood of extreme misery from a Malthusian disaster scenario.
It depends on subject matter.
For math, it is already here. Several options exist, Coq is the most popular.
For philosophy, the language requirements alone need AI at the level of reasonably current LLMs. Which brings their flaws as well. Plus you need knowledge of human experience. By the time you put it together, I don’t see how a mechanistic interpreter can be anything less than a (hopefully somewhat limited) AI.
Which again raises the question of how we come to trust in it enough for it not to be a leap of faith.
Nobody ever read the 1995 proof.
Instead they wound up reading the program. This time it was written in C—which is easier to follow. And the fact that there were now two independent proofs in different languages that ran on different computers greatly reduced the worries that one of them might have a simple bug.
I do not know that any human has ever tried to properly read any proof of the 4 color theorem.
Now to the issue. The overall flow and method of argument were obviously correct. Spot checking individual points gave results that were also correct. The basic strategy was also obviously correct. It was a basic, “We prove that if it holds in every one of these special cases, then it is true. Then we check each special case.” Therefore it “made sense”. The problem was the question, “Might there be a mistake somewhere?” After all proofs do not simply have to make sense, they need to be verified. And that was what people couldn’t accept.
The same thing with the Objectivist. You can in fact come up with flaws in proposed understandings of the philosophy fairly easily. It happens all the time. But Objectivists believe that, after enough thought and evidence, it will converge on the one objective version. The AI’s proposed proof therefore can make sense in all of the same ways. It would even likely have a similar form. “Here is a categorization of all of the special cases which might be true. We just have to show that each one can’t work.” You might look at them and agree that those sound right. You can look at individual cases and accept that they don’t work. But do you abandon the belief that somewhere, somehow, there is a way to make it work? As opposed to the AI saying that there is none?
As you said, it requires a leap of faith. And your answer is mechanistic interpretability. Which is exactly what happened in the end with the 4 color proof. A mechanistically interpretable proof was produced, and mechanistically interpreted by Coq. QED.
But for something as vague as a philosophy, I think it will take a long time to get to mechanistically interpretable demonstrations. And the thing which will do so is likely itself to be an AI...
You may be missing context on my reference to the 4 color problem. The original 1976 proof, by Appel and Haken, took over 1000 hours of computer time to check. A human lifetime is too short to verify that proof. This eliminates your first option. The Objectivist cannot, even in principle, check the proof. Life is too short.
Your first option is therefore, by hypothesis, not an option. You can believe the AI or not. But you can’t actually check its reasoning.
The history of the 4 color problem proof shows this kind of debate. People argued for nearly 20 years about whether there might be a bug. Then an independent, and easier to check, computer proof came along in 1995. The debate mostly ended. More efficient computer generated proofs have since been created. The best that I’m aware of is 60,000 lines. In principle that would be verifiable by a human. But no human that I know of has actually bothered. Instead the proof was verified by the proof assistant Coq. And, today, most mathematicians trust Coq over any human.
We have literally come full circle on the 4 color problem. We started by asking whether we can trust a computer if a human can’t check it. And now we accept that a computer can be more trustworthy than a human!
However it took a long time to get the proof down to such a manageable size. And it took a long time to get a computer program that is so trustworthy that most believe it over themselves.
And so the key epistemological challenge. What would it take for you to trust an AI’s reasoning over your own beliefs when you’re unable to actually verify the AI’s reasoning?
Concrete example.
Let’s presuppose that you are an Objectivist. If you don’t know about Objectivism, I’ll just give some key facts.
Objectivists place great value in rationality and intellectual integrity.
Objectivists believe that they have a closed philosophy. Meaning that there is a circle of fundamental ideas set out by Ayn Rand that will never change, though the consequences of those ideas certainly are not obvious and still needs to be worked out.
Objectivists believe that there is a single objective morality that can be achieved from Ayn Rand’s ideas if we only figure out the details well enough.
Now suppose that an Objectivist used your system. And the AIs came to the conclusion that there is no single objective morality obtainable by Ayn Rand’s ideas. But the conclusion required a long enumeration of different possible resolutions, only to find a problem in each one. With the enumeration, like the proof of the 4-color problem, being too long for any human to read.
What should the hypothetical Objectivist do upon obtaining the bad news? Abandon the idea of an absolute morality? Reject the result obtained by the intelligent AI? Ignore the contradiction?
Now I don’t know your epistemology. There might be no such possible conflict for you. I doubt there is for me. But in the abstract, this is something that really could happen to someone who thinks of themselves as truly rational.
What you want sounds like Próspera. It is too early to say how that will work out.
They took some inspiration from Singapore. When Singapore became independent in 1965, it was a poverty-stricken third world place. It now has a better GDP/capita than countries like the USA. And also did things like come up with the best way of teaching math to elementary school students.
But Singapore is only libertarian in some ways. They are also a dictatorship who does not believe in, for instance, free speech. Their point is that when you cram immigrants from many cultures together, you’ll get problems if you don’t limit how much one group is allowed to offend another. I don’t like it, but also don’t have evidence that they are wrong.
And finally, most utopian experiments don’t work out very well. See A Libertarian Walks Into a Bear for an amusing example.
As a life strategy I would recommend something I call “tit for tat with forgiveness and the option of disengaging”.
Most of the time do tit for tat.
When we seem to be in a negative feedback loop, we try to reset with forgiveness.
When we decide that a particular person is not worth having in our life, we walk them out of our life in the most efficient way possible. If this requires giving them a generous settlement in a conflict, that’s generally better than continuing with the conflict to try for a more even settlement.
The first three are things most social animals are adapted to do. The last is possible for us because we live in societies that are large enough for us to never interact with people we don’t like. Unfortunately our emotions are pretty well adapted to life in groups below Dunbar’s number. So the decision to disengage efficiently takes work.
Honest question about a hypothetical.
How would you respond if you set this up, and then your personal GPT concluded, from your epistemology and the information available to it, that your epistemology is fundamentally flawed and you should adopt a different one. Suppose further than when it tried to explain it to you, everything that it said made sense but you could not follow the full argument.
What should happen then? Should you no longer be the center of the prosthetic enabled “you” that has gone beyond your comprehension? Should the prosthetics do their thing, with the goal of supplying you with infinite entertainment instead of merely amplifying you? Should the prosthetics continue to be bound by the limitations of your mind? (Not necessarily crazy if you’re afraid of another advanced AI hacking your agents to subvert them.)
Obviously ChatGPT does not offer sufficient capabilities that this should happen. But if your future continues to the point where the agents augmenting your capabiliteis have AGI, this type of challenge will arise.
The reason why variance matters is that high variance increases your odds of going broke. In reality, gamblers don’t simply get to reinvest all of their money. They have to take money out for expenses. That process means that you can go broke in the short run, despite having a great long-term strategy.
Therefore instead of just looking at long-term returns you should also look at things like, “What are my returns after 100 trials if I’m unlucky enough to be at the 20th percentile?” There are a number of ways to calculate that. The simplest is to say that if p is your probability of winning, the expected number of times you’ll win is 100p. The variance in a single trial is p(1-p). And therefore the variance of 100 trials is 100p(1-p). Your standard deviation in wins is the square root, or 10sqrt(p(1-p)). From the central limit theorem, at the 20th percentile you’ll therefore win roughly 100p − 8.5sqrt(p(1-p)) times. Divide this by 100 to get the proportion q that you won. Your ideal strategy on this metric will be Kelly with p replaced by that q. This will always be less than Kelly. Then you can apply that to figure out what rate of return you’d be worrying about if you were that unlucky.
Any individual gambler should play around with these numbers. Base it on your bankroll, what you’re comfortable with losing, how frequent and risky your bets are, and so on. It takes work to figure out your risk profile. Most will decide on something less than Kelly.
Of course if your risk profile is dominated by the pleasure of the adrenaline from knowing that you could go broke, then you might think differently. But professional gamblers who think that way generally don’t remain professional gamblers over the long haul.
I’m sorry that you are confused. I promise that I really do understand the math.
In repeated addition of random variables, all of these have a close relationship. The sum is approximately normal. The normal distribution has identical mean, median, and mode. Therefore all three are the same.
What makes Kelly tick is that the log of net worth gives you repeated addition. So with high likelihood the log of your net worth is near the mean of an approximately normal distribution, and both median and mode are very close to that. But your net worth is the exponent of the log. That creates an asymmetry that moves the mean away from the median and mode. With high probability, you will do worse than the mean.
The comment about variance is separate. You actually have to work out the distribution of returns after, say 100 trials. And then calculate a variance from that. And it turns out that for any finite n, variance monotonically increases as you increase the proportion that you bet. With the least variance being 0 if you bet nothing, to being dominated by the small chance of winning all of them if you bet everything.
Dang it. I meant to write that as,
If you bet more than Kelly, you’ll experience lower returns on average and higher variance.
That said, both median and mode are valid averages, and Kelly wins both.
I believe that AI safety is a real issue. There are both near term and long term issues.
I believe that the version of AI safety that will get traction is regulatory capture.
I believe that the AI safety community is too focused on what fascinating technology can do, and not enough on the human part of the equation.
On Andrew Ng, his point is that he doesn’t see how exactly AI is realistically going to kill all of us. Without a concrete argument that is worth responding to, what can he really say? I disagree with him on this, I do think that there are realistic scenarios to worry about. But I do agree with him on what is happening politically with AI safety.
Hypothetically this is possible.
But, based on current human behavior, I would expect such simulations to focus on the great and famous, or on situations which represent fun game play.
My life does not qualify as any of those. So I heavily discount this possibility.