I could add: Objective punishments and rewards need objective justification.
Peterdjones
From my perspective, treating rationality as always instrumental, and never a terminal value is playing around with it’s traditional meaning. (And indiscriminately teaching instrumental rationality is like indiscriminately handing out weapons. The traditional idea, going back to st least Plato, is that teaching someone to be rational improves them...changes their values)
I am aware that humans hav a non zero level of life threatening behaviour. If we wanted it to be lower, we could make it lower, at the expense of various costs. We don’t which seems to mean we are happy with the current cost benefit ratio. Arguing, as you have, that the risk of AI self harm can’t be reduced to zero doesn’t mean we can’t hit an actuarial optimum.
It is not clear to me why you think safety training would limit intelligence.
Regarding the anvil problem: you have argued with great thoroughness that one can’t perfectly prevent an AIXI from dropping an anvil on its head. However, I can’t see the necessity. We would need to get the probability of a dangerously unfriendly SAI as close to zero as possible, because it poses an existential threat. However, a suicidally foolish AIXI is only a waste of money.
Humans have a negative reinforcement channel relating to bodily harm called pain. It isn’t perfect, but it’s good enough to train most humans to avoid doing suicidal stupid things. Why would an AIXI need anything better? Yout might want to answer that there is some danger related to an AIXI s intelligence, but it’s clock speed, or whatever, could be throttle, during training.
Also any seriously intelligent .AI made with the technology of today, or the near future, is going to require a huge farm of servers. The only way it could physically interact with the world is through remote controlled body...and if drops an anvil on that, it actually will survive as a mind!
An entity that has contradictory beliefs will be a poor instrumental rationalist. It looks like you would need to engineer a distinction between instrumental beliefs and terminal beliefs. While we’re on the subject, you might need a firewall to stop an .AI acting on intrinsically motivating ideas, if they exist. In any case, orthogonality is an architecture choice, not an ineluctable fact about minds.
The OT has multiple forms, as Armstrong notes. An OT that says you could make arbitrary combinations of preference and power if you really wanted to, can’t plug into an argument that future .AI will ,with high probability, be a Lovecraftian horror, at least not unless you also aargue that an orthogonal architecture will be chosen, with high probability.
something previously deemed “impossible”
It’s clearly possible for some values of “gatekeeper”, since some people fall for 419 scams. The test is a bit meaningless without information about the gatekeepers
The problem is that I don’t see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of “human pleasure = brain dopamine levels”, not least of all because there are people who’d want to be wireheads and there’s a massive amount of physiological research showing human pleasure to be caused by dopamine levels.
I don’t think Loosemore was addressing deliberately unfriendly AI, and for that matter EY hasn’t been either. Both are addressing intentionally friendly or neutral AI that goes wrong.
I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
Wouldn’t it care about getting things right?
Trying to think this out in terms of levels of smartness alone is very unlikely to be helpful.
Then solve semantics in a seed.
To be a good instrumental rationalist, an entity must be a good epistemic rationalist, because knowledge is instrumentally useful. But to be a good epistemic ratioanalist, and entityy must value certain things, like consistency and lack of contradiction. IR is not walled off from ER, which itself is not walled off from values. The orthogonality thesis is false. You can’t have any combination of values and instrumental efficacy, because an enity that thinks contradictions are valuable will be a poor epistemic ratiionalist and therefore a poor instrumental rationalist.
That’s not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to.
We want to select Ais that are friendly, and understand us, and this has already started happenning.
My answer: who knows? We’ve given it a deliberately vague goal statement (even more vague than the last one), we’ve given it lots of admittedly contradictory literature, and we’ve given it plenty of time to self-modify before giving it the goal of self-modifying to be Friendly.
Humans generally manage with those constraints. You seem to be doing something that is kind of the opposite of anthropomorphising—treatiing an entity that is stipulated as having at least human intelligence as if were as literal and rigid as a non-AI computer.
Semantcs isn’t optional. Nothing could qualify as an AGI,let alone a super one, unless it could hack natural language. So Loosemore architectures don’t make anything harder, since semantics has to be solved anyway.
“code in the high-level sentence, and let the AI figure it out.”
So it’s impossible to directly or indirectly code in the compex thing called semantics, but possible to directly or indirectly code in the compex thing called morality? What? What is your point? You keep talking as if I am suggesting there is someting that can be had for free, without coding. I never even remotely said that.
If the AI is too dumb to understand ‘make us happy’, then why should we expect it to be smart enough to understand ‘figure out how to correctly understand “make us happy”, and then follow that instruction’? We have to actually code ‘correctly understand’ into the AI. Otherwise, even when it does have the right understanding, that understanding won’t be linked to its utility function.
I know. A Loosemore architecture AI has to treat its directives as directives. I never disputed that. But coding “follow these plain English instructions” isn’t obviously harder or more fragile than coding “follow <>”. And it isn’t trivial, and I didn’t say it was.
Yes, but that’s stupidity on the part of the human programmer, and/or on the part of the seed AI if we ask it for advice.
That depends on the architecture. In a Loosemore architecture, the AI interprets high-level directives itself, so if it gets them wrong, that’s it’s mistake.
There is no theorem which proves a rationalist must be honest—must speak aloud their probability estimates.
Speaking what you believe may be frankness, candour or tactlessness, but it isn’t honesty. Honesty is not lying. It involves no requirement to call people Fatty or Shorty.
Goertzel appears to be a respected figuer in the field. Could you point the interested reader to your critique of his work?
Almost everyhting he said has been civil, well informed an on topic. He has made one complaint about doenvoting, and EY has made an ad-hom against him. EYs behaviour has been worse.
Richard, please don’t be bullied off the site. It is LW that needs to learn how to handle debate and disagremeent, since they are basic to rationality.
On the other hand...
http://en.m.wikipedia.org/wiki/Is_logic_empirical%3F