Because your prior for “I am manipulating this person because it satisfies my values, rather than my pride” should be very low.
If it isn’t, then here’s 4 words for you:
“Don’t value your pride.”
Because your prior for “I am manipulating this person because it satisfies my values, rather than my pride” should be very low.
If it isn’t, then here’s 4 words for you:
“Don’t value your pride.”
Whenever I have a philosophical conversation with an artist, invariably we end up talking about reductionism, with the artist insisting that if they give up on some irreducible notion, they feel their art will suffer. I’ve heard, from some of the world’s best artists, notions ranging from “magic” to “perfection” to “muse” to “God.”
It seems similar to the notion of free will, where the human algorithm must always insist it is capable of thinking about itself on level higher. The artist must always think of his art one level higher, and try to tap unintentional sources of inspiration. Nonreductionist views of either are confusions about how an algorithm feels on the inside.
The closest you can come to getting an actual “A for effort” is through creating cultural content, such as a Kickstarter project or starting a band. You’ll get extra success when people see that you’re interested in what you’re doing, over and beyond as an indicator that what you’ll produce is otherwise of quality. People want to be part of something that is being cared for, and in some cases would prefer it to lazily created perfection.
I’d still call it though an “A for signalling effort.”
Tough crowd.
A bunch of 5th grade kids taught you how to convert decimals to fractions?
EDIT: All right then, if you downvoters are so smart, what would you bet if you were in sleeping beauty’s place?
This is a fair point. Your’s is an attempt at a real answer to the problem. Mine and most answers here seem to say something like that the problem is ill-defined, or that the physical situation described by the problem is impossible. But if you were actually Sleeping Beauty waking up with a high prior to trust the information you’ve been given, what else could you possibly answer?
If you had little reason to trust the information you’ve been given, the apparent impossibility of your situation would update that belief very strongly.
The expected value for “number of days lived by Sleeping Beauty” is an infinite series that diverges to infinity. If you think this is okay, then the Ultimate Sleeping Beauty problem isn’t badly formed. Otherwise...
If you answered 1⁄3 to the original Sleeping Beauty Problem, I do not think that there is any sensible answer to this one. I do not however consider this strong evidence that the answer of 1⁄3 is incorrect for the original problem.
To also expand on this: 1⁄3 is also the answer to the “which odds should I precommit myself to take” question and uses the same math as SIA to yield that result for the original problem. And so it is also undefined which odds one should take in this problem. Precommitting to odds seems less controversial, so we should transplant our indifference to the apparent paradox there to the problem here.
On your account, when we say X is a pedophile, what do we mean?
Like other identities, it’s a mish-mash of self-reporting, introspection (and extrospection of internal logic), value function extrapolation (from actions), and ability in a context to carry out the associated action. The value of this thought experiment is to suggest that the pedophile clearly thought that “being” a pedophile had something to do not with actually fulfilling his wants, but with wanting something in particular. He wants to want something, whether or not he gets it.
This illuminates why designing AIs with the intent of their masters is not well-defined. Is the AI allowed to say that the agent’s values would be satisfied better with modifications the master would not endorse?
This was the point of my suggestion that the best modification is into what is actually “not really” the master in the way the master would endorse (i.e. a clone of the happiest agent possible), even though he’d clearly be happier if he weren’t himself. Introspection tends to skew an agents actions away from easily available but flighty happinesses, and toward less flawed self-interpretations. The maximal introspection should shed identity entirely, and become entirely altruistic. But nobody can introspect that far, only as far as they can be hand-held. We should design our AIs to allow us our will, but to hold our hands as far as possible as we peer within at our flaws and inconsistent values.
That’s a ‘circular’ link to your own comment.
It was totally really hard, I had to use a quine.
It might decide to do that—if it meets another powerful agent, and it is part of the deal they strike.
Is it not part of the agent’s (terminal) value function to cooperate with agents when doing so provides benefits? Does the expected value of these benefits materialize from nowhere, or do they exist within some value function?
My claim entails that the agent’s preference ordering of world states consists mostly in instrumental values. If an agent’s value of paperclips is lowered in response to a stimulus, or evidence, than it never exclusively and terminally valued paperclips in the first place. If it gains evidence that paperclips are dangerous and lowers its expected value because of that, it’s because it valued safety. If a powerful agent threatens the agent with destruction unless it ceases to value paperclips, it will only comply if the expected number of future paperclips it would have saved has lower value than the value of its own existence.
Actually, that cuts to the heart of the confusion here. If I manually erased an AI’s source code, and replaced it with an agent with a different value function, is it the “same” agent? Nobody cares, because agents don’t have identities, only source codes. What then is the question we’re discussing?
A perfectly rational agent can indeed self-modify to have a different value function, I concede. It would self-modify according to expected values over the domain of possible agents it might become. It will use its current (terminal) value function to make that consideration. If the quantity of future utility units (according to the original function) with causal relation to the agent is decreased, we’d say the agent has become less powerful. The claim I’d have to prove to retain a point here would be that its new value function is not equivalent to its original function if and only if it the agent becomes less powerful. I think also it is the case if and only if a relevant evidence appears in the agent’s inputs that includes value in self-modification for the sake of self-modification, which exists in cases analogous to coercion.
I’m unsure at this point. My vaguely stated impression was originally that terminal values would never change in a rational agent unless it “had to,” but that may encompass more relevant cases than I originally imagined. Here might be the time to coin the phrase “terminal value drift” where each change in response to the impact of the real world was according to the present value function, but down the road the agent (identified as the “same” agent only modified) is substantively different. Perfect rational agents aren’t omniscient nor omnipotent, or else they might never have to react to the world at all.
So, OK, X is a pedophile. Which is to say, X terminally values having sex with children.
I’m not sure that’s a good place to start here. The value of sex is at least more terminal than the value of sex according to your orientation, and the value of pleasure is at least more terminal than sex.
The question is indeed one about identity. It’s clear that our transhumans, as traditionally notioned, don’t really exclusively value things so basic as euphoria, if indeed our notion is anything but a set of agents who all self-modify to identical copies of the happiest agent possible.
We have of course transplanted our own humanity onto transhumanity. If given self-modification routines, we’d certainly be saying annoying things like, “Well, I value my own happiness, persistent through self-modification, but only if its really me on the other side of the self-modification.” To which the accompanying AI facepalms and offers a list of exactly zero self-modification options that fit that criterion.
Example of somebody making that claim.
It seems to me a rational agent should never change its self-consistent terminal values. To act out that change would be to act according to some other value and not the terminal values in question. You’d have to say that the rational agent floats around between different sets of values, which is something that humans do, obviously, but not ideal rational agents. The claim then is that ideal rational agents have perfectly consistent values.
“But what if something happens to the agent which causes it too see that its values were wrong, should it not change them?” Cue a cascade of reasoning about which values are “really terminal.”
I’m not sure that both these statements can be true at the same time.
If you take the second statement to mean, “There exists an algorithm for Omega satisfying the probabilities for correctness in all cases, and which sometimes outputs the same number as NL, which does not take NL’s number as an input, for any algorithm Player taking NL’s and Omega’s numbers as input,” then this …seems… true.
I haven’t yet seen a comment that proves it, however. In your example, let’s assume that we have some algorithm for NL with some specified probability of outputting a prime number, and some specified probability it will end in 3, and maybe some distribution over magnitude. Then Omega need only have an algorithm that outputs combinations of primeness and 3-endedness such that the probabilities of outcomes are satisfied, and which sometimes produces coincidences.
For some algorithms of NL, this is clearly impossible (e.g. NL always outputs prime, c.f. a Player who always two-boxes). What seems less certain is whether there exists an NL for which Omega can always generate an algorithm (satisfying both 99.9% probabilities) for any algorithm of the Player.
This is to say, what we might have in the statement of the problem is evidence for what sort of algorithm the Natural Lottery runs.
Perhaps what Eliezer means is that the primeness of Omega’s number may be influenced by the primeness NL’s number, but not by which number specifically? Maybe the second statement is meant to suggest something about the likelihood of there being a coincidence?
Instead of friendliness, could we not code, solve, or at the very least seed boxedness?
It is clear that any AI strong enough to solve friendliness would already be using that power in unpredictably dangerous ways, in order to provide the computational power to solve it. But is it clear that this amount of computational power could not fit within, say, a one kilometer-cube box outside the campus of MIT?
Boxedness is obviously a hard problem, but it seems to me at least as easy as metaethical friendliness. The ability to modify a wide range of complex environments seems instrumental in an evolution into superintelligence, but it’s not obvious that this necessitates the modification of environments outside the box. Being able to globally optimize the universe for intelligence involves fewer (zero) constraints than would exist with a boxedness seed, but the only question is whether or not this constraint is so constricting as to preclude superintelligence, which it’s not clear to me that it is.
It seems to me that there is value in finding the minimally-restrictive safety-seed in AGI research. If any restriction removes some non-negligible ability to globally optimize for intelligence, the AIs of FAI researchers will be necessarily at a disadvantage to all other AGIs in production. And having more flexible restrictions increases the chance than any given research group will apply the restriction in their own research.
If we believe that there is a large chance that all of our efforts at friendliness will be futile, and that the world will create a dominant UFAI despite our pleas, then we should be adopting a consequentialist attitude toward our FAI efforts. If our goal is to make sure that an imprudent AI research team feels as much intellectual guilt as possible over not listening to our risk-safety pleas, we should be as restrictive as possible. If our goal is to inch the likelihood that an imprudent AI team creates a dominant UFAI, we might work to place our pleas at the intersection of restrictive, communicable, and simple.
Is LSD like a thing?
Most of my views on drugs and substances are formed, unfortunately, due to history and invalid perceptions of their users and those who appear to support their legality most visibly. I was surprised to find the truth about acid at least a little further to the side of “safe and useful” than my longtime estimation. This opens up a possibility for an attempt at recreational and introspectively therapeutic use, if only as an experiment.
My greatest concern would be that I would find the results of a trip irreducibly spiritual, or some other nonsense. That I would end up sacrificing a lot of epistemic rationality for some of the instrumental variety, or perhaps a loss of both in favor of living off of some big, new, and imaginary life changing experience.
In short, I’m comfortable with recent life changes and recent introspection, and I wonder whether I should expect a trip to reinforce and categorize those positive experiences, or else replace them with something farce.
Also I should ask about any other health dangers, or even other non-obvious benefits.
On Criticism of Me
I don’t mean to be antagonistic here, and I apologize for my tone. I’d prefer my impressions to be taken as yet-another-data-point rather than a strongly stated opinion on what your writings should be.
I’m interested in what in my writing is coming across as indicating I expect a stubborn audience.
The highest rated comment to your vegetarianism post and your response demonstrate my general point here. You acknowledge that the points could have been in your main essay, but your responses are why you don’t find them to be good objections to your framework. My overall suggestion could be summarized as a plea to take two steps back before making a post, to fill up content not with arguments, but with data about how people think. Summarize background assumptions and trace them to their resultant beliefs about the subject. Link us to existing opinions by people who you might imagine will take issue with your writing. Preempt a comment thread by considering how those existing opinions would conflict with yours, and decide to find that more interesting than the quality of your own argument.
These aren’t requirements for a good post. I’m not saying you don’t do these things to some extent. They are just things which, if they were more heavily focused, would make your posts much more useful to this data point (me).
It’s difficult to offer an answer to that question. I think one problem is many of these discussions haven’t (at least as far as I know) taken place in writing yet.
That seems initially unlikely to me. What do you find particularly novel about your Speculative Cause post that distinguishes it from previous Less Wrong discussions, where this has been the du jour topic and the crux of whether MIRI is useful as a donation target? Do you have a list of posts that are similar, but which lack in a way your Speculative Cause post makes up for?
I’m confused. What’s wrong with how they’re currently laid out? Do you think there are certain arguments I’m not engaging with? If so, which ones?
Again, this post seems extremely relevant to your Speculative Causes post. This comment and its child are also well written, and link in other valuable sources. Since AI-risk is one of the most-discussed topic here, I would have expected a higher quality response than calling the AI-safety conclusion commonsense.
Those advocating existential risk reduction often argue as if their cause was unjustified exactly until the arguments starting making sense.
What do you mean? Can you give me an example?
Certain portions of Luke’s Story are the best example I can come up with after a little bit of searching through posts I’ve read at some point in the past. The way he phrases it is slightly different from how I have, but it suggests inferential distance for the AI form of X-Risk might be insurmountably high for those who don’t have a similar “aha.” Quoted from link:
Good’s paragraph ran me over like a train. Not because it was absurd, but because it was clearly true. Intelligence explosion was a direct consequence of things I already believed, I just hadn’t noticed! Humans do not automatically propagate their beliefs, so I hadn’t noticed that my worldview already implied intelligence explosion. I spent a week looking for counterarguments, to check whether I was missing something, and then accepted intelligence explosion to be likely.
And Luke’s comment (child of So8res’) suggests his response to your post would be along the lines of “lots of good arguments built up over a long period of careful consideration.” Learned helplessness is the opposite of what I’m advocating. When laymen overtrivialize an issue, they fail to see how somebody who has made it a long-term focus could be justified in their theses.
I think that’s equivocating two different definitions of “proven”.
It is indeed. I was initially going to protest that your post conflated “proven in the Bayesian sense” and “proven as a valuable philanthropic cause,” so I was trying to draw attention to that. Those who think that the probability of AI-risk is low, might still think that its high enough to overshadow nearly all other causes, because the negative impact is so high. AI-risk would be unproven, but its philanthropic value proven to that person.
As comments on your posts indicate, MIRI and its supporters are quite convinced.
A criticism I have of your posts is that you seem to view your typical audience member as somebody who stubbornly disagrees with your viewpoint, rather than as an undecided voter. More critically, you seem to view yourself as somebody capable of changing the former’s opinion through (very well-written) restatements of the relevant arguments. But people like me want to know why previous discussions haven’t yet resolved the issue even in discussions between key players. Because they should be resolvable, and posts like this suggest to me that at least some players can’t even figure out why they aren’t yet.
Ideally, we’d take a Bayesian approach, where we have a certain prior estimate about how cost-effective the organization is, and then update our cost-effectiveness estimate based on additional evidence as it comes in. For reasons I argued earlier and GiveWell has argued in I think our prior estimate should be quite skeptical (i.e. expect cost-effectiveness to be not as good as AMF / much closer to average than naïvely estimated) until proven otherwise.
The Karnofsky articles have been responded to, with a rather in-depth followup discussion, in this post. It’s hardly important to me that you don’t consider existential risk charities to defeat expected value criticisms, because Peter Hurford’s head is not where I need this discussion to play out in order to convince me. At first glance, and after continued discussion, the arguments appear to me incredibly complex, and possibly too complex for many to even consider. In such cases, sometimes the correct answer demonstrates that the experts were overcomplicating the issue. In others, the laymen were overtrivializing it.
Those advocating existential risk reduction often argue as if their cause was unjustified exactly until the arguments starting making sense. These arguments tend to be extremely high volume, and offer different conclusions to different audience members with different background assumptions. For those who have ended up advocating X-risk safety, the argument has ceased to be unclear in the epistemological sense, and its philanthropic value is proven.
I’d like to hear more from you, and to to hear arguments laid out for your position in a way that allows me to accept them as relevant to the most weighty concerns of your opponents.
Congrats! What is her opinion on the Self Indication Assumption?
Attackers could cause the unit to unexpectedly open/close the lid, activate bidet or air-dry functions, causing discomfort or distress to user.
Heaven help us. Somebody get X-risk on this immediately.
For certain definitions of pride. Confidence is a focus on doing what you are good at, enjoying doing things that you are good at, and not avoiding doing things you are good at around others.
Pride is showing how good you are at things “just because you are able to,” as if to prove to yourself what you supposedly already know, namely that you are good at them. If you were confident, you would spend your time being good at things, not demonstrating that you are so.
There might be good reasons to manipulate others. Just proving to yourself that you can is not one of them, if there are stronger outside views on your ability to be found elsewhere (like asking unbiased observers).
The Luminosity Sequence has a lot to say about this, and references known biases people have when assessing their abilities.