“Paperclipper” refers to any “universe tiler”, not just one that tile with paperclips. Specifying paperclips in particular is hard. If you don’t care about exactly what gets tiled, it’s much easier.
My arguments apply to most kinds of things that tile the universe with highly arbitrary things like paperclips, though they apply less to things that tile the universe with less arbitrary things like bignums or something. I do believe arbitrary universe tilers are easier than FAI; just not paperclippers.
For well-understood goals, that’s easy. Just hardcode the goal.
There are currently no well-understood goals, nor are there obvious ways of hardcoding goals, nor are hardcoded goals necessarily stable for self-modifying AIs. We don’t even know what a goal is, let alone do we know how to solve the grounding problem of specifying what a paperclip is (or a tree is, or what have you). With humans you get the nice trick of getting the AI to look back on the process that created it, or alternatively just use universal induction techniques to find optimization processes out there in the universe, which will also find humans. (Actually I think it would find mostly memes, not humans per se, but a lot of what humans care about is memes.)
Human values are contingent on our entire evolutionary history.
Paperclips are contingent on that, plus a whole bunch of random cultural stuff. Again, if we’re talking about universe tilers in general this does not apply.
Also, as a sort of appeal to authority, I’ve been working at the Singularity Institute for a year now, and have spent many many hours thinking about the problem of FAI (though admittedly I’ve given significantly less thought to how to build a paperclipper). If my intuitions are unsound, it is not for reasons that are intuitively obvious.
Paperclips are contingent on that, plus a whole bunch of random cultural stuff. Again, if we’re talking about universe tilers in general this does not apply.
Really? Maybe if you wanted it to be able to classify whether any arbitrary corner case count as a paperclip or not, but that isn’t required for a paperclipper. If you’re just giving it a shape to copy then I don’t see why that would be more than a hundred bytes or so—trivial compared to the optimizer.
If you’re just giving it a shape to copy then I don’t see why that would be more than a hundred bytes or so—trivial compared to the optimizer.
A hundred bytes in what language? I get the intuition, but it really seems to me like paper clips are really complex. There are lots of important qualities of paperclips that make them clippy that seem to me like they’d be very hard to get an AI to understand. You say you’re giving it a shape, but that shape is not at all easily defined. And its molecular structure might be important, and its size, and its density, and its ability to hold sheets of paper together… Shape and molecular component aren’t fundamental attributes of the universe that an AI would have a native language for. This is why we can’t just keep an oracle AI in a box—it turns out that our intuitive idea of what a box is is really hard to explain to a de novo AI. Paperclips are similar. And if the AI is smart enough to understand human concepts that well, then you should also be able to just type up CEV and give it that instead… CEV is easier to describe than a paperclip in that case, since CEV is already written up. (Edit: I mean a description of CEV is written up, not CEV. We’re still working on the latter.)
Well, here’s a recipe for a paperclip in English:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
If you can understand human concepts, “paperclip” is sufficient to tell you about paperclips. Google “paperclip,” you get hundreds of millions of results.
“Understanding human concepts” may be hard, but understanding arbitrary concrete concepts seems harder than understanding arbitrary abstract concepts that take a long essay to write up and have only been written up by one person and use a number of other abstract concepts in this writeup.
Sample a million paperclips from different manufacturers. Automatically cluster to find the 10,000 that are most similar to each other. Anything is a paperclip if it is more similar to one of those along every analytical dimension than another in the set.
Very easy, requires no human values. The AI is free to come up with strange dimensions along which to analyze paperclips, but it now has a clearly defined concept of paperclip, some (minimal) flexibility in designing them, and no need for human values.
Sample a million people from different continents. Automatically cluster to find the 10,000 that are most similar to each other. Anything is a person if it is more similar to one of those along every analytical dimension than another in the set.
This is already tripping majoritarianism alarm bells.
I would meditate on this for a while when trying to define a paperclip.
TheOtherDave got it right; I’m wasn’t trying to give a complete definition of what is and isn’t a paperclip, I was just offering forth an easy to define (without human values) subset that we would still call paperclips.
It has plenty of false negatives, but I don’t really see that as a loss. Likewise, your personhood algorithm doesn’t bother me as long as we don’t use it to establish non-personhood.
For that matter, one could play all kinds of Hofstadterian games along these lines… is a staple a paperclip? After all, it’s a thin piece of shaped metal designed and used to hold several sheets of paper together. Is a one-pound figurine of a paperclip a paperclip? Does it matter if you use it as a paperweight, to hold several pieces of paper together? Is a directory on a computer file system a virtual paperclip? Would it be more of one if we’d used the paperclip metaphor rather than the folder metaphor for it? And on and on and on.
In any case, I agree that the intuition that paperclips are an easy set to define depends heavily on the idea that not very many differences among candidates for inclusion matter very much, and that it should be obvious which differences those are. And all of that depends on human values.
Put a different way: picking 10,000 manufactured paperclips and fitting a category definition to those might exclude any number of things that, if asked, we would judge to be paperclips… but we don’t really care, so a category arrived at this way is good enough. Adopting the same approach to humans would similarly exclude things we would judge to be human… and we care a lot about that, at least sometimes.
There are currently no well-understood goals, nor are there obvious ways of hardcoding goals
“Find a valid proof of this theorem from the axioms of ZFC”. This goal is pretty well-understood, and I don’t believe an AI with such a goal will converge on Buddha. Or am I misunderstanding your position?
This doesn’t take into account logical uncertainty. It’s easy to write a program that eventually computes the answer you want, and then pose a question of doing that more efficiently while provably retaining the same goal, which is essentially what you cited, with respect to a brute force classical inference system starting from ZFC and enumerating all theorems (and even this has its problems, as you know, since the agent could be controlling which answer is correct). A far more interesting question is which answer to name when you don’t have time to find the correct answer. “Correct” is merely a heuristic for when you have enough time to reflect on what to do.
(Also, even to prove theorems, you need operating hardware, and manging that hardware and other actions in the world would require decision-making under (logical) uncertainty. Even nontrivial self-optimization would require decision-making under uncertainty that has a “chance” of turning you from the correct question.)
Try to formalize this intuition. With provably correct answers, that’s easy. Here, you need a notion of “best answer I’ve got”, a way of comparing possible answers where correctness remains inaccessible. This makes it “more interesting”: where the first problem is solved (to an extent), this one isn’t.
My arguments apply to most kinds of things that tile the universe with highly arbitrary things like paperclips, though they apply less to things that tile the universe with less arbitrary things like bignums or something. I do believe arbitrary universe tilers are easier than FAI; just not paperclippers.
There are currently no well-understood goals, nor are there obvious ways of hardcoding goals, nor are hardcoded goals necessarily stable for self-modifying AIs. We don’t even know what a goal is, let alone do we know how to solve the grounding problem of specifying what a paperclip is (or a tree is, or what have you). With humans you get the nice trick of getting the AI to look back on the process that created it, or alternatively just use universal induction techniques to find optimization processes out there in the universe, which will also find humans. (Actually I think it would find mostly memes, not humans per se, but a lot of what humans care about is memes.)
Paperclips are contingent on that, plus a whole bunch of random cultural stuff. Again, if we’re talking about universe tilers in general this does not apply.
Also, as a sort of appeal to authority, I’ve been working at the Singularity Institute for a year now, and have spent many many hours thinking about the problem of FAI (though admittedly I’ve given significantly less thought to how to build a paperclipper). If my intuitions are unsound, it is not for reasons that are intuitively obvious.
Really? Maybe if you wanted it to be able to classify whether any arbitrary corner case count as a paperclip or not, but that isn’t required for a paperclipper. If you’re just giving it a shape to copy then I don’t see why that would be more than a hundred bytes or so—trivial compared to the optimizer.
A hundred bytes in what language? I get the intuition, but it really seems to me like paper clips are really complex. There are lots of important qualities of paperclips that make them clippy that seem to me like they’d be very hard to get an AI to understand. You say you’re giving it a shape, but that shape is not at all easily defined. And its molecular structure might be important, and its size, and its density, and its ability to hold sheets of paper together… Shape and molecular component aren’t fundamental attributes of the universe that an AI would have a native language for. This is why we can’t just keep an oracle AI in a box—it turns out that our intuitive idea of what a box is is really hard to explain to a de novo AI. Paperclips are similar. And if the AI is smart enough to understand human concepts that well, then you should also be able to just type up CEV and give it that instead… CEV is easier to describe than a paperclip in that case, since CEV is already written up. (Edit: I mean a description of CEV is written up, not CEV. We’re still working on the latter.)
Well, here’s a recipe for a paperclip in English: Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
If you can understand human concepts, “paperclip” is sufficient to tell you about paperclips. Google “paperclip,” you get hundreds of millions of results.
“Understanding human concepts” may be hard, but understanding arbitrary concrete concepts seems harder than understanding arbitrary abstract concepts that take a long essay to write up and have only been written up by one person and use a number of other abstract concepts in this writeup.
Do you mean easier here?
Yes.
Sample a million paperclips from different manufacturers. Automatically cluster to find the 10,000 that are most similar to each other. Anything is a paperclip if it is more similar to one of those along every analytical dimension than another in the set.
Very easy, requires no human values. The AI is free to come up with strange dimensions along which to analyze paperclips, but it now has a clearly defined concept of paperclip, some (minimal) flexibility in designing them, and no need for human values.
Counter:
Sample a million people from different continents. Automatically cluster to find the 10,000 that are most similar to each other. Anything is a person if it is more similar to one of those along every analytical dimension than another in the set.
This is already tripping majoritarianism alarm bells.
I would meditate on this for a while when trying to define a paperclip.
TheOtherDave got it right; I’m wasn’t trying to give a complete definition of what is and isn’t a paperclip, I was just offering forth an easy to define (without human values) subset that we would still call paperclips.
It has plenty of false negatives, but I don’t really see that as a loss. Likewise, your personhood algorithm doesn’t bother me as long as we don’t use it to establish non-personhood.
Ah. This distinction escaped me; I tend to use definitions in a formal logical style (P or ~P, no other options).
Sure.
For that matter, one could play all kinds of Hofstadterian games along these lines… is a staple a paperclip? After all, it’s a thin piece of shaped metal designed and used to hold several sheets of paper together. Is a one-pound figurine of a paperclip a paperclip? Does it matter if you use it as a paperweight, to hold several pieces of paper together? Is a directory on a computer file system a virtual paperclip? Would it be more of one if we’d used the paperclip metaphor rather than the folder metaphor for it? And on and on and on.
In any case, I agree that the intuition that paperclips are an easy set to define depends heavily on the idea that not very many differences among candidates for inclusion matter very much, and that it should be obvious which differences those are. And all of that depends on human values.
Put a different way: picking 10,000 manufactured paperclips and fitting a category definition to those might exclude any number of things that, if asked, we would judge to be paperclips… but we don’t really care, so a category arrived at this way is good enough. Adopting the same approach to humans would similarly exclude things we would judge to be human… and we care a lot about that, at least sometimes.
“Find a valid proof of this theorem from the axioms of ZFC”. This goal is pretty well-understood, and I don’t believe an AI with such a goal will converge on Buddha. Or am I misunderstanding your position?
This doesn’t take into account logical uncertainty. It’s easy to write a program that eventually computes the answer you want, and then pose a question of doing that more efficiently while provably retaining the same goal, which is essentially what you cited, with respect to a brute force classical inference system starting from ZFC and enumerating all theorems (and even this has its problems, as you know, since the agent could be controlling which answer is correct). A far more interesting question is which answer to name when you don’t have time to find the correct answer. “Correct” is merely a heuristic for when you have enough time to reflect on what to do.
(Also, even to prove theorems, you need operating hardware, and manging that hardware and other actions in the world would require decision-making under (logical) uncertainty. Even nontrivial self-optimization would require decision-making under uncertainty that has a “chance” of turning you from the correct question.)
What’s more interesting about it? Think for some time and then output the best answer you’ve got.
Try to formalize this intuition. With provably correct answers, that’s easy. Here, you need a notion of “best answer I’ve got”, a way of comparing possible answers where correctness remains inaccessible. This makes it “more interesting”: where the first problem is solved (to an extent), this one isn’t.