[link] Nine Ways to Bias Open-Source AGI Toward Friendliness
Ben Goertzel and Joel Pitt: Nine Ways to Bias Open-Source AGI Toward Friendliness. Journal of Evolution and Technology—Vol. 22 Issue 1 – February 2012 - pgs 116-141.
Abstract
While it seems unlikely that any method of guaranteeing human-friendliness (“Friendliness”) on the part of advanced Artificial General Intelligence (AGI) systems will be possible, this doesn’t mean the only alternatives are throttling AGI development to safeguard humanity, or plunging recklessly into the complete unknown. Without denying the presence of a certain irreducible uncertainty in such matters, it is still sensible to explore ways of biasing the odds in a favorable way, such that newly created AI systems are significantly more likely than not to be Friendly. Several potential methods of effecting such biasing are explored here, with a particular but non-exclusive focus on those that are relevant to open-source AGI projects, and with illustrative examples drawn from the OpenCog open-source AGI project. Issues regarding the relative safety of open versus closed approaches to AGI are discussed and then nine techniques for biasing AGIs in favor of Friendliness are presented:
1. Engineer the capability to acquire integrated ethical knowledge.
2. Provide rich ethical interaction and instruction, respecting developmental stages.
3. Develop stable, hierarchical goal systems.
4. Ensure that the early stages of recursive self-improvement occur relatively slowly and with rich human involvement.
5. Tightly link AGI with the Global Brain.
6. Foster deep, consensus-building interactions between divergent viewpoints.
7. Create a mutually supportive community of AGIs.
8. Encourage measured co-advancement of AGI software and AGI ethics theory.
9. Develop advanced AGI sooner not later.
In conclusion, and related to the final point, we advise the serious co-evolution of functional AGI systems and AGI-related ethical theory as soon as possible, before we have so much technical infrastructure that parties relatively unconcerned with ethics are able to rush ahead with brute force approaches to AGI development.
I’d say it’s worth a read—they have pretty convincing criticism against the possibility of regulating AGI (section 3). I don’t think that their approach will work if there’s a hard takeoff or a serious hardware overhang, though it could maybe work if there isn’t. It might also work if there was the possibility for a hard takeoff, but not instantly after developing the first AGI systems.
Here’s their regulation criticism:
I like their coining of “AGI Sputnik moment.”
The government restrictions on cryptography is surely the nearest example within IT.
The government also restricts basic “intellectual development” activities, such as “copying stuff” and “inventing stuff”.
Neither of those examples presents a resounding success story. Restrictions on cryptography proved infeasible, and were abandoned; copyright prohibits duplication of others’ work only when it’s for profit, and is only sporadically effective even at that.
There’s also there monopolies and mergers commission. The government doesn’t foster the development of big and powerful agents that might someday compete with it.
There are seeds of some good ideas in Ben’s paper like having the goal of the AGI system maintained in a distributed peer to peer system like bit torrent or bitcoin, preventing it from getting too corrupted. That partially addresses one of my concerns with friendliness, the cosmic ray coming in and flips one important register and the AI turns Evil instead of Good (Laugh all you want, I was genuinely worried about this once upon a time)
My idea follows - (warning: Some familiarity with the bitcoin protocol may be needed) An open morality project to provide the goal for a future AGI could begin with a community effort to understand goodness, praise goodness and reward goodness. The community’s reputation points/karma could be maintained in a bitcoin like distributed ledger, practically unhackable if the community has gone unmolested for 2 years or so.
Modifying the highest goal—Be friendly would be impossible. Modifying the weights of the lower level subgoals would require karma and more karma as you go higher towards the highest goal. Changing weights drastically, adding new subgoals would require more karma and there could be allowances for goals/behaviors that are supported/opposed greatly by a few and goals/behaviours supported by vast numbers.
Instead of paying money to a foundation subject to future corruption, future philanthropists can just back the “coin” with extra money, strengthening the hand of all members in the community and further strengthening the goal.
Infact, existing communities could transfer their Karma/Reputation points onto the initial distribution and then begin from there, with future Karma coming only coming from the peer to peer network. The coin distribution could be a limited one or a slowly growing one depending on the best estimates of the coders.
Section 9 seems a little shaky—and it coincides with Ben’s interests—i.e. he’d benefit it this recommendation were followed.