Hi. Checking back on this account on a whim after a long time of not using it. You’re right. 2012!Mestroyer was a noob and I am still cleaning up his bad software.
Mestroyer
I would need a bunch of guarantees about the actual mechanics of how the AI was forced to answer before I stopped seeing vague classes of ways this could go wrong. And even then, I’d assume there were some I’d missed, and if the AI has a way to show me anything other than “yes” or “no”, or I can’t prevent myself from thinking about long sequences of bits instead of just single bits separately, I’d be afraid it could manipulate me.
An example of a vague class of ways this could go wrong is if the AI figures out what my CEV would want using CDT, and itself uses a more advanced decision theory to exploit the CEV computation into wanting to write something more favorable to the AI’s utility function in the file.
Also, IIRC, Eliezer Yudkowsky said there are problems with CEV itself. (Maybe he just meant problems with the many-people version, but probably not). It was only supposed to be a vague outline, and a “see, you don’t have to spend all this time worrying about whether we share your ethical/political philosophy. Because It’s not going to be hardcoded into the AI anyway”
That’s the goal, yeah.
It doesn’t have to know what my CEV would be to know what I would want in those bits, which is a compressed seed of an FAI targetted (indirectly) at my CEV.
But there are problems like, “How much effort is it required to put into it?” (clearly I don’t want it to spend far more compute power than it has trying to come up with the perfect combination of bits which will make my FAI unfold a little bit faster, but I also don’t want it to spend no time optimizing. How do I get it to pick somewhere in between without it already wanting to pick the optimal amount of optimization for me?) “What decision theory is my CEV using to decide those bits? (Hopefully not something exploitable, but how do I specify that?)”
Given that I’m turning the stream of bits, 10KiB long I’m about to extract from you into an executable file, through this exact process, which I will run on this particular computer (describe specifics of computer, which is not the computer the AI is currently running on) to create your replacement, would my CEV prefer that this next bit be a 1 or a 0? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? …
(Note: I would not actually try this.)
~5, huh? Am I to credit?
This reminds me of this SMBC. There are fields (modern physics comes to mind too) that no one outside of them can understand what they are doing anymore, yet that appear to have remained sane. There are more safeguards against postmodernists’ failure mode than this one. In fact, I think there is a lot more wrong with postmodernism than that they don’t have to justify themselves to outsiders. Math and physics have mechanisms determining what ideas within them get accepted that imbue them with their sanity. In math, there are proofs. In physics, there are experiments.
If something like this safeguard is going to work for us, our mechanism that determines what ideas spread among us needs to reflect something good, so that producing the kind of idea that passes that filter makes our community worthwhile. This can be broken into two subgoals: making sure the kinds of questions we’re asking are worthwhile, that we are searching for the right kind of thing, and making sure that our acceptance criterion is a good one. (There’s also something that modern physics may or may not have for much longer, which is “Can progress be made toward the thing you’re looking for”).
CFAR seems to be trying to be using (some of) our common beliefs to produce something useful to outsiders. And they get good ratings from workshop attendees.
The last point is particularly important, since on one hand, with the current quasi-Ponzi mechanism of funding, the position of preserved patients is secured by the arrival of new members.
Downvoted because if I remember correctly, this is wrong; the cost of preservation of a particular person includes a lump of money big enough for the interest to pay for their maintenance. If I remember incorrectly and someone points it out, I will rescind my downvote.
I use text files. (.txt, because I hate waiting for a rich text editor to open, and I hate autocomplete for normal writing) It’s the only way to be able to keep track of them. I sometimes write paper notes when I don’t have a computer nearby, but I usually don’t keep those notes. Sometimes if I think of something I absolutely have to remember as I’m dozing off to sleep, I’ll enter it in my cell phone because I use that as an alarm clock and it’s always close to my bed. But my cell phone’s keyboard makes writing notes really slow, so I avoid it under normal circumstances.
I have several kinds of notes that I make. One is when I’m doing a hard intellectual task and I want to free up short-term memory, I will write things down as I think of them. I usually title this kind of file with the name of the task. For tasks too minor to remember just by a title like that, I just write something like ” notes 2014-06-22″.
I also write “where I left off ” notes, whenever I leave a programming project or something for a day (or sometimes even before I leave for lunch), because usually I will be forming ideas about how to fix problems as I’m fixing other problems, so I can save my future self some work by not forgetting them.
In response to your first paragraph,
Human morality is indeed the complex unfolding of a simple idea in a certain environment. It’s not the one you’re thinking of though. And if we’re talking about hypotheses for the fundamental nature of reality, rather than a sliver of it (because a sliver of something can be more complicated than the whole) you have to include the complexity of everything that contributes to how your simple thing will play out.
Note also that we can’t explain reality with a god with a utility function of “maximize the number of copies of some genes”, because the universe isn’t just an infinite expanse of copies of some genes. Any omnipotent god you want to use to explain real life has to have a utility function that desires ALL the things we see in reality. Good luck adding the necessary stuff for that into “good” without making “good” much more complicated, and without just saying “good is whatever the laws of physics say will happpen.”
You can say for any complex thing, “Maybe it’s really simple. Look at these other things that are really simple.” but there are many (exponentially) more possible complex things than simple things. The prior for a complex thing being generable from a simple thing is very low by necessity. If I think about this like, “well, I can’t name N things I am (N-1)/N confident of and be right N-1 times, and I have to watch out for overconfidence etc., so there’s no way I can apply 99% confidence to ‘morality is complicated’...” then I am implicitly hypothesis privileging. You can’t be virtuously modest for every complicated-looking utility function you wonder if could be simple, or your probability distribution will sum to more than 1.
By hypothesis, “God” means actus purus, moral perfection; there is no reason to double count.
I’m not double-counting. I’m counting once the utility function which specifies the exact way things shall be (as it must if we’re going with omnipotence for this god hypothesis), and once the utility-maximization stuff, and comparing it to the non-god hypothesis, where we just count the utility function without the utility maximizer.
I agree. “AGI Safety”/”Safe AGI” seems like the best option. if people say, “Let me save you some time and tell you right now that’s impossible” half of the work is done. The other half is just convincing them that we have to do it anyway because otherwise everyone is doomed. (This is of course, as long as they are using “impossible” in a loose sense. If they aren’t, the problem can probably be fixed by saying “our definition of safety is a little bit more loose than the one you’re probably thinking of, but not so much more loose that it becomes easy”).
Time spent doing any kind of work with a high skill cap.
Edit: Well, okay not any kind of work meeting that criterion, to preempt the obvious LessWrongian response. Any kind you can get paid for is closer to true.
One of my old CS teachers defended treating the environment as adversarial and knowing your source code, because of hackers. See median of 3 killers. (I’d link something, but besides a paper, I can’t find a nice link explaining what they are in a small amount of googling).
I don’t see why Yudkowsky makes superintelligence a requirement for this.
Also, it doesn’t even have to be source code they have access to (which they could if it was open-source software anyway). There are such things as disassemblers and decompilers.
[Edit: removed implication that Yudkowsky thought source code was necessary]
A lot of stuff on LessWrong is relevant to picking which charity to donate to. Doing that correctly is of overwhelming importance. Far more important than working a little bit more every week.
This is the kind of thing that when I take the outside view about my response, it looks bad. There is a scholarly paper refuting one of my strongly-held beliefs, a belief I arrived at due to armchair reasoning. And without reading it, or even trying to understand their argument indirectly, I’m going to brush it off as wrong. Merely based on the kind of bad argument (Bad philosophy doing all the work, wrapped in a little bit of correct math to prove some minor point once you’ve made the bad assumptions) I expect it to be, because this is what I think it would take to make a mathematical argument against my strongly-held belief, and because other people who share my strongly-held belief are saying that that’s the mistake they make.
Still not wasting my time on this though.
Actually, if you do this with something besides a test, this sounds like a really good way to teach a third-grader probabilities.
we’re human beings with the blood of a million savage years on our hands. But we can stop it. We can admit that we’re killers, but we’re not going to kill Today.
My impression is that they don’t, because I haven’t seen people who do this as low status. But they’ve all been people who are clearly high status anyway, due to their professional positions.
This is a bad template for reasoning about status in general, because of countersignaling.
Don’t know, sorry.