“Fairly” was the wrong word in this context. Better might be ‘honest’ or ‘truthful.’ A truthful piece of information is one which increases the recipient’s ability to make accurate predictions; an honest speaker is one whose statements contain only truthful information.
Walking from Helsinki to Saigon sounds easy, too, depending on how it’s phrased. Just one foot in front of the other, right?
Humans make predictions all the time. Any time you perceive anything and are less than completely surprised by it, that’s because you made a prediction which was at least partly successful. If, after receiving and assimilating the information in question, any of your predictions is reduced in accuracy, any part of that map becomes less closely aligned with the territory, then the information was not perfectly honest. If you ignore or misinterpret it for whatever reason, even when it’s in some higher sense objectively accurate, that still fails the honesty test.
A rationalist should win; an honest communicator should make the audience understand.
Given the option, I’d take personal survival even at the cost of accurate perception and ability to act, but it’s not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.
What Robin is saying is, there’s a difference between
“metrics that correlate well enough with what you really want that you can make them the subject of contracts with other human beings”, and
“metrics that correlate well enough with what you really want that you can make them the subject of a transhuman intelligence’s goals”.
There are creative avenues of fulfilling the letter without fulfilling the spirit that would never occur to you but would almost certainly occur to a superintelligence, not because xe is malicious, but because they’re the optimal way to achieve the explicit goal set for xer. Your optimism, your belief that you can easily specify a goal (in computer code, not even English words) which admits of no undesirable creative shortcuts, is grossly misplaced once you bring smarter-than-human agents into the discussion. You cannot patch this problem; it has to be rigorously solved, or your AI wrecks the world.
Given the option, I’d take personal survival even at the cost of accurate perception and ability to act, but it’s not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.
Sure, but I don’t want to be locked in a box watching a light blink very predictably on and off.
Building the box reduces your ability to predict anything taking place outside the box. Even if the box can be sealed perfectly until the end of time without killing you (which would in itself be a surprise to anyone who knows thermodynamics), cutting off access to compilations of medical research reduces your ability to predict your own physiological reactions. Same goes for screwing with your brain functions.
I do not think you should be as confident as you are that your system is bulletproof. You have already had to elaborate and clarify and correct numerous times to rule out various kinds of paperclipping failures—all it takes is one elaboration or clarification or correction forgotten to allow for a new one, attacking the problem this way.
That’s our problem right there: you’re trying to persuade me to abandon a position I don’t actually hold. I agree that an AI based strictly on a survey of all historical humans would have negligible chance of success, simply because a literal survey is infeasible and any straightforward approximation of it would introduce unacceptable errors.
For everyone else, it was a chance to identify flaws in a proposition. No such thing as too much practice there. For me, it was a chance to experience firsthand the thought processes involved in defending a flawed proposition, necessary practice for recognizing other such flawed beliefs I might be holding; I had no religious upbringing to escape, so that common reference point is missing.
Furthermore, I knew from the outset that such a survey wouldn’t be practical, but I’ve been suspicious of CEV for a while now. It seems like it would be too hard to formalize, and at the same time, even if successful, too far removed from what people spend most of their time caring about. I couldn’t be satisfied that there wasn’t a better way to do it until I’d tried to find such a way myself.
It’s polite to give some signal that you’re playing devil’s advocate if you know you’re making weak arguments.
I couldn’t be satisfied that there wasn’t a better way to do it until I’d tried to find such a way myself.
This is not a sufficient condition for establishing the optimality of CEV. Indeed, I’m not sure there isn’t a better way (nor even that CEV is workable), just that I have at present no candidates for one.
I apologize. I thought I had discharged the devil’s-advocacy-signaling obligation by ending my original post on the subject with a request to be proved wrong.
I agree that personal satisfaction with CEV isn’t a sufficient condition for it being safe. For that matter, having proposed and briefly defended this one alternative isn’t really sufficient for my personal satisfaction in either CEV’s adequacy or the lack of a better option. But we have to start somewhere, and if someone did come up with a better alternative to CEV, I’d want to make sure that it got fair consideration.
“Fairly” was the wrong word in this context. Better might be ‘honest’ or ‘truthful.’ A truthful piece of information is one which increases the recipient’s ability to make accurate predictions; an honest speaker is one whose statements contain only truthful information.
About what? Anything? That sounds very easy.
Remember Goodhart’s Law—what we want is G, Good, not any particular G* normally correlated with Good.
Walking from Helsinki to Saigon sounds easy, too, depending on how it’s phrased. Just one foot in front of the other, right?
Humans make predictions all the time. Any time you perceive anything and are less than completely surprised by it, that’s because you made a prediction which was at least partly successful. If, after receiving and assimilating the information in question, any of your predictions is reduced in accuracy, any part of that map becomes less closely aligned with the territory, then the information was not perfectly honest. If you ignore or misinterpret it for whatever reason, even when it’s in some higher sense objectively accurate, that still fails the honesty test.
A rationalist should win; an honest communicator should make the audience understand.
Given the option, I’d take personal survival even at the cost of accurate perception and ability to act, but it’s not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.
What Robin is saying is, there’s a difference between
“metrics that correlate well enough with what you really want that you can make them the subject of contracts with other human beings”, and
“metrics that correlate well enough with what you really want that you can make them the subject of a transhuman intelligence’s goals”.
There are creative avenues of fulfilling the letter without fulfilling the spirit that would never occur to you but would almost certainly occur to a superintelligence, not because xe is malicious, but because they’re the optimal way to achieve the explicit goal set for xer. Your optimism, your belief that you can easily specify a goal (in computer code, not even English words) which admits of no undesirable creative shortcuts, is grossly misplaced once you bring smarter-than-human agents into the discussion. You cannot patch this problem; it has to be rigorously solved, or your AI wrecks the world.
Sure, but I don’t want to be locked in a box watching a light blink very predictably on and off.
Building the box reduces your ability to predict anything taking place outside the box. Even if the box can be sealed perfectly until the end of time without killing you (which would in itself be a surprise to anyone who knows thermodynamics), cutting off access to compilations of medical research reduces your ability to predict your own physiological reactions. Same goes for screwing with your brain functions.
I do not think you should be as confident as you are that your system is bulletproof. You have already had to elaborate and clarify and correct numerous times to rule out various kinds of paperclipping failures—all it takes is one elaboration or clarification or correction forgotten to allow for a new one, attacking the problem this way.
How confident do you think I am that my plan is bulletproof?
Given that you asked me the question, I reckon you give it somewhere between 1:100 and 2:1 odds of succeeding. I reckon the odds are negligible.
That’s our problem right there: you’re trying to persuade me to abandon a position I don’t actually hold. I agree that an AI based strictly on a survey of all historical humans would have negligible chance of success, simply because a literal survey is infeasible and any straightforward approximation of it would introduce unacceptable errors.
...why are you defending it, then? I don’t even see that thinking along those lines is helpful.
For everyone else, it was a chance to identify flaws in a proposition. No such thing as too much practice there. For me, it was a chance to experience firsthand the thought processes involved in defending a flawed proposition, necessary practice for recognizing other such flawed beliefs I might be holding; I had no religious upbringing to escape, so that common reference point is missing.
Furthermore, I knew from the outset that such a survey wouldn’t be practical, but I’ve been suspicious of CEV for a while now. It seems like it would be too hard to formalize, and at the same time, even if successful, too far removed from what people spend most of their time caring about. I couldn’t be satisfied that there wasn’t a better way to do it until I’d tried to find such a way myself.
It’s polite to give some signal that you’re playing devil’s advocate if you know you’re making weak arguments.
This is not a sufficient condition for establishing the optimality of CEV. Indeed, I’m not sure there isn’t a better way (nor even that CEV is workable), just that I have at present no candidates for one.
I apologize. I thought I had discharged the devil’s-advocacy-signaling obligation by ending my original post on the subject with a request to be proved wrong.
I agree that personal satisfaction with CEV isn’t a sufficient condition for it being safe. For that matter, having proposed and briefly defended this one alternative isn’t really sufficient for my personal satisfaction in either CEV’s adequacy or the lack of a better option. But we have to start somewhere, and if someone did come up with a better alternative to CEV, I’d want to make sure that it got fair consideration.