That’s what I understood when I read this sentence yes.
Leo P.
I find it interesting how he says that there is no such thing as AGI, but acknowledges that machines will “eventually surpass human intelligence in all domains where humans are intelligent” as that would meet most people’s definition of AGI.
I don’t see how saying that machines will “eventually surpass human intelligence in all domains where humans are intelligent” imply the G in AGI.
I’m sorry but I don’t get the explanation regarding the coinrun. I claim that the “reward as incentivization” framing still “explains” the behaviour in this case. As an analogy, we can go back to training a dog and rewarding it with biscuits: let’s say you write numbers on the floor from 1 to 10. You ask the dog a simple calculus question (whose answer is between 1 to 10), and each time he puts its paw on the right number he gets a biscuit. Let’s just say that during the training it so happens that the answer to all the calculus questions is always 6. Would you claim that you taught the dog to answer simple calculus questions, or rather that you taught it to put his paw on 6 when you ask him a calculus question? If the answer is the latter then I don’t get why the interpretation through the “reward as incentivization” framing in the CoinRun setting is that the model “wants to get the coin” in the CoinRun.
The generalized version of this lesson “that cooperation/collusion favors the good guys—ie those aligned towards humanity” actually plays out in history. In WW2 the democratic powers—those with interconnected economies and governments more aligned to their people—formed the stronger allied coalition. The remaining autocratic powers—all less aligned to their people and also each other—formed a coalition of necessity. Today history simply repeats itself with the democratic world aligned against the main autocratic powers (russia, china, north korea, iran).
I don’t want to enter a history debate, but I’m not at all sold on that view, which seems to rewrite history. The european part of WW2 was mainly won because of the USSR, not really a “democratic power” (you could argue that USSR would never have had the means to do that without the financial help of the US, or that without England holding up, Germany would have won on the eastern front, both of which are probably true, but the point still stands that it’s not as simple as “democratic vs autocratic”).
Regarding the present, I’m not sold at all on the “democratic world aligned against the main autocratic powers”. Actually, I’d even make the case that democratic powers actively cooperate with autocratic ones as long as they have something to gain, despite it being contrary to the values they advocate for: child labor in asian coutries, women’s rights in Emirates, Qatar, Saudi Arabia, and so on. So I believe that once we look at a more detailed picture than the one you’re depicting it’s actually a counterargument to your take.
I don’t actually think we’re bottlenecked by data. Chinchilla represents a change in focus (for current architectures), but I think it’s useful to remember what that paper actually told the rest of the field: “hey you can get way better results for way less compute if you do it this way.”
I feel like characterizing Chinchilla most directly as a bottleneck would be missing its point. It was a major capability gain, and it tells everyone else how to get even more capability gain. There are some data-related challenges far enough down the implied path, but we have no reason to believe that they are insurmountable. In fact, it looks an awful lot like it won’t even be very difficult!
Could you explain why you feel that way about Chinchilla? Because I found that post: https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications to give very compelling reasons for why data should be considered a bottleneck and I’m curious what makes you say that it shouldn’t be a problem at all.
I’d very much like to understand how your credences can be so high with nothing else to back them up than “it’s possible and we lack some data”. Like, sure, but to have credences so high you need to have at least some data or reason to back that up.
Humans have not evolved to do math or physics, but we did evolve to resist manipulation and deception, these were commonplace in the ancestral environment.
This seems pretty counterintuitive to me, seeing how easily many humans fall for not-so-subtle deception and manipulation everyday.
I really don’t understand the AGI in a box part of your arguments: as long as you want your AGI to actually do something (it can be anything, be it you asked for a proof of a mathematical problem or whatever else), its output will have to go through a human anyway, which is basically the moment when your AGI escapes. It does not matter what kind of box you put around your AGI because you always have to open it for the AGI to do what you want it to do.
The second case might not really make sense, because deception is a convergent instrumental goal especially if the AI is trying to cause X and you’re trying to cause not X, and generally because an AI that smart probably has inner optimizers that don’t care about this “make a plan, don’t execute plans” thing you thought you’d set up.
I believe the second case is a subcase of the problem of ELK. Maybe the AI isn’t trying to deceive you, and actually do what you asked it to do (e.g., I want to see “the diamond” on the main detector), yet the plans it produces has consequence X that you don’t want (in the ELK example, the diamond is stolen but you see something that looks like the diamond on the main detector). The problem is: how can you be sure the plans proposed have consequence X? Especially if you don’t even know X is a possible consequence of the plans?
Why would I press the dislike button when I get the possibility to signal virtue by showing people I condemn what “X” says about “Y”?
Talking about the fact that each consciouness will only see a classical state doesn’t make sense, because they are in a quantum superposition state. Just like it does not make sense to say that the photon went either right or left in the double slit experiment.
You created a superposition of a million consciousnesses and then outputted an aggregate value about all those consciousnesses.
This I agree with.
Either a million entities experienced a conscious experience, or you can find out the output of a conscious being with ever actually creating a conscious being—i.e. p-zombies exist (or at least aggregated p-zombies).
This I do not. You do not get access to a million entities by the argument I laid out previously. You did not simulate all of them. And you did not create something that behaves like a million entities aggregated either, just like you cannot store 2^n classical bits on a quantum computer consisting of n qbits. You get a function which outputs an aggregated value of your superposition, but you can’t recover each consciousness you pretend to have been simulated from it. Therefore this is what I believe is flawed in your position:
it seems reasonable to extend that to something which act exactly the same as an aggregate of conscious beings—it must in fact be an aggregate of conscious beings
If I understand your arguments correctly (which I may not, in which case I’ll be happy to stand corrected), this sentence should mean to you that for something to act the same as an aggregate of n conscious beings, it must be an aggregate of at least n conscious beings? But then doesn’t this view mean that a function of d variables can never be reduced to a function of k variables, k < d?
I’m not sure I understand your answer. I’m saying that you did not simulate at least million consciousnesses, just as in Shor’s algorithm you do not try all the divisors.
So even though we’ve run the simulation function only a thousand times, we must have simulated at least million consciousnesses, or how else could we know that exactly 254,368 of them e.g. output a message which doesn’t contain the letter e?
Isn’t that exactly what Scott Aaronson explains a quantum computer/algorithm doesn’t do? (See https://www.scottaaronson.com/papers/philos.pdf, page 34 for the full explanation)
For AIs, we’re currently interested in the values that arise in a single AIs (specifically, the first AI capable of a hard takeoff), so single humans are the more appropriate reference class.
I’m sorry but I don’t understand why looking at single AIs make single humans the more appropriate reference class.
That seems like extremely limited, human thinking. If we’re assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans.
If humans are capable of building one AGI, they certainly would be capable to build a second one which could have goals unaligned with the first one.
Even admitting that alignement is not possible, it’s not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a “good” super-AGI).
Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.
An other funny thing you can do with square roots: let’s take and let us look at the power series . This converges for , so that you can specialize in . Now, you can also do that inside , and the series converges to , ``the″ square root of . But in this actually converges to .
But although bayesianism makes the notion of knowledge less binary, it still relies too much on a binary notion of truth and falsehood. To elaborate, let’s focus on philosophy of science for a bit. Could someone give me a probability estimate that Darwin’s theory of evolution is true?
What do you mean by that question? Because the way I understand it, then the probability is “zero”. The probability that, in the vast hypotheses space, Darwin’s theory of evolution is the one that’s true, and not a slightly modified variant, is completely negligible. My main problem is: “is theory X true?” is usually a question which does not carry any meaning. You can’t answer that question in a vacuum without specifying against which other theories you’re “testing” it (or here, asking the question).
If I understand correctly, what you’re saying with the “97% of being 97% true” is that the probability that the true theory is within some bounds in the hypotheses space, which correspond to the property that inside those bounds the theories share 97% of the properties of “Darwin’s point” (whatever that may mean), is 97%. Am I understanding this correctly?
First, I don’t think it’s a good idea to have to rely on the axiom of choice in order to be able to define continuity.
Now, from my point of view, saying that continuity is defined in terms of limits is the wrong way to look at it. Continuity is a property relative to the topology of your space. If you define continuity in terms of open sets, I find that not only the definition does make sense, but also it extends in general to any topological space. But I kind of understand that not everyone will find this intuitive.
Also, I believe that your definitions that replace the limits in terms of hyperreals have to take into account all possible infinitesimals, and thus I don’t understand how it’s really any different that the sequential characterization of limits. But maybe I’m missing something.