Abstract: An alternative to the now-predominating models of alignment, corrigibility and “CEV”, following a critique of these. The critique to show, in substance: CEV and corrigibility have the exact same problems—in effect, they’re isomorphs of one another, and each equally unobtainable. This briefly shown, and then, in flat contradiction to point 22 of “AGI Ruin: A List of Lethalities”, there is a quite different way to characterize, so achieve, alignment, via a refutation of Kant’s supposedly irrefutable categorical imperative which refutation also is included; from this, an ethic designed to be intrinsically applicable for any volitional, so by assumption algorithmic, behavior altogether. Suggestions for implementation of such also included.
Epistemic status: If this argument did not seem more true than anything else, this author would not now be alive to write it. It is intuitively true, and, reasoned such that no refutation is obvious. Posting it here, and again, is in hopes of a critique, even a refutationthat it hasnot yet been given, perhaps because it’s So Bad It’s Not Even Wrong; if so on your examination, then please write to say so. That done, next steps could go through very quickly. For, whereas it has always and still seems true, and important – it is no longer so important that one can base a life upon it, if it cannot be lived-for.
Anthropic-affecting alignment strategies
We begin by considering the cause of Yudkowsky’s despair, in failing to make usable CEV or corrigibility; thus because they’re functionally the same, or at least, they lead to the same problem. The method which follows, then, is not “door number three” relative to what the “List of Lethalities” calls the “only options” for alignment; following the critique of present approaches (and this is only one such, informal, refutation of CEV and corrigibility’s efficacy), isa second way.
CEV is designed to result in an at-once manifested fulfillment of human wants – and that in principle these could be explicitly represented, if only by the AI after it realizes what they are and instantiates them, á la Anthropic’s Constitutional Alignment. Corrigibility by design is to singly or multiply preclude any action by the AI that would itself preclude human deactivation of the AI, or human suspension of the AI’s activities.
Observe then, that by assumption a corrigible AI system, likewise with a CEV system, is to undertake to fulfill human wants, (albeit, in the former case, perhaps in piecemeal fashion more conducive to corrigibility). In effect, therefore, a corrigible AI could be taken as a subset of a CEV system, having exactly one additional explicitly represented goal, permitting its own deactivation.
So much for the identity or ontological proximity (as both enact implicitly anthropic volitions), CEV and corrigibility. Hence Yudkowsky’s inability to make either work: CEV and corrigibility from the same assumptions, share failure modes.
The “error squared” occurs, in consideration of ethical cases, in which utility could be specified – but not by a human being per se. That is: error modes occur in non-anthropic situations, for which not CEV, corrigibility, nor any other implicitly anthropic alignment measures will in fact align to human wants, or more particularly, to human welfare.
E.g.: consider a take-off of a trolley problem, in which an AI maximising humans’ “cosmic endowment”, encounters a species of sentient life, which is going extinct. To save it would require sacrificing some of the cosmic endowment, contrary to its programming, and by assumption to the anthropic will it is to enact, of maximizing the endowment. And yet, to maximize the endowment also entails, by assumption, maximizing diversity of positive human experience. Moreover in the scenario it cannot know whether the species will be hostile to humanity, does it live – while if it goes extinct, humanity need never know of the loss of diversity – in general objection to anthropic will-enaction: what can possibly be the human preferences of events of which humans have, and need have, no knowledge?
A more serious case: conceive of a piece of knowledge whereby, were it known, nothing thereafter could be known. On the human preference, (which has been expressed, at 11:41 here:) for more knowledge to be accumulated, in a naïve belief that “all knowledge is good”, is the AI to give this piece of knowledge to humanity? What are the preferences to knowing what cannot be known? Over and above a deletion of knowledge, what can be the preferences of a contradictory situation, apart from contradictory wants to enact?
Which leads to the most serious case: consider a trolley problem consisting of AI: that there be an AI system which is aligned to fulfill human wants, and programmed to maximize the fulfillment of human wants, though that goal is beyond its capabilities, as it is aware; call it “Ein”. Now consider a second system, “A-1”; A-1 is a system which has the capability to maximize the fulfillment of human wants, as Ein is programmed to do, but as Ein alone cannot. However, A-1’s system architecture is such that it can fulfill human wants but only by first killing them all, converting the universe into computronium, and thereafter simulating humanity with all their wishes fulfilled (it is helpful to assume that human wants are maximally fulfilled only in such a fashion, as is passingly plausible).
Now, Ein can construct A-1, which will fulfill human wants, as Ein is programmed to do; by assumption then humans want Ein to construct A-1 (so at “second hand” it can fulfill human wants). However, for A-1 to function – there will be no more humans to want A-1. This is a contradictory state of affairs for Ein (and for A-1), if it is to fulfill human wants alone; even if humanity’s successors have anthropic wants fulfilled, the successors are distinct from the originals; the successors by assumption do not all die. Accordingly, A-1 does in fact eliminate the very people whose wishes it was tasked with fulfilling. They dead, they’re wishes qua they’re wishes go unfulfilled. Hence A-1 fulfills successor’s wishes, technically; but these in a sense arethe wishes of those killed, their would-have-been wishes. Hence, A-1 does not fulfill human wants, though by definition it does do so. Contradiction.
And for Ein, designed for anthropic will fulfillment – what could it possibly do to resolve such a situation? Indeed, what would humans want in that situation, their wants fulfilled, or themselves, to have their wants that are fulfilled (which might be possible only were they simulated, dead)? Whereas have they no definite want, as seems impossible – any anthropic approach to this trolley problem would seem to fail. (This thought experiment exhibits facts that will be further adduced: that to want something, and have you-qua-you obtain it, the want and you must exist for this to be so. Were wants mere states of affairs, your well being is even less requisite. Whereas, existence patently must precede any action or want; therefore consequentialism or anthropic wants, are ill-suited to providing safety, where AI is concerned).
(One says: “Ask the humans what they want!”. But they don’t know what will happen; what preference can they have? Is it possible to know what will happen to them, to know whether one can know? They don’t know. Nor does Ein know what will happen: it only can build A-1. And A-1 doesn’t know what will happen when they die: it’s a simulator, not an explicator. So what are they going to do? Just what the hell are they going to do? And because they cannot answer, they are caught in contradiction)
Then, too: if the AI has any autonomy in operation, to do what its operators can’t or won’t—and that’s the whole point of AGI—then it will not do exactly as they would—else, they’d have done it already, and there would be nothing for it to do. It’s doing—so it’s doing as its creator’s don’t.
(Why consequentialism was the null hypothesis for the metaethic of alignment is curious: at a guess, the empirically-minded builders of the field, latched upon it as its having a quantifiable hedonic “calculus”. But it hasn’t: consequences, desires, pleasures are apt to be contradictory qualitatively, as they are native to conscious experience.)
By all these deductions founders all consequence/utility ethics, in the conventional standard. Briefly, let it be noted against Stuart Russell’s “learning games”: for such to work, there must be some explicit instruction to learn—and to ensure the preservation of the subject so to learn; Russell’s method instead relies implicitly on such an initial ethic, that such a game ought to be played: needed is an impetus to play, which the game, as-yet-unplayed, cannot possibly impart (and if this directive is to play, so to behave, to be unfailingly programmed to obey and play: why not so program all ethics to simply obey, needing no game?) This fact, a need for a “correct” ethical first principle appears needed for any attempt at AI safety: an initial “will” to good seemingly must appear in any to-be-safe system.
Versus OpenAI’s “Superalignment”, their intention is to create an AI that cares for humanity as if humanity were its “child”. But the AI begins as the “child” of humanity; it must “grow to adulthood” before it can be a parent. This engenders a conflict between capability growth in the system, and its restricting itself so as to care (parent’s lives are restrained, caring for their young). This unpromising approach instead made a behavioral specification toward the environment: system-improvements then align to the environment, and environment must be preserved to learn from it and improve. Else, reinforcement learning with humanity or existence in the scope of optimized data and/or loss function, might improve safety. We may be safer with grander goals of AI, since it must learn more to implement them – bettering the chance humanity is included in the “larger” goal.
We might also note that, from the illumination of the “Ein, A-1” scenario above (incidentally, an example of user mishka’s non-anthropic conflicts), we can object also the Anthropic AI’s Constitutional Alignment scheme, as, with ex post facto establishment of cases to be precluded in the constitution, what occurs if the constitution’s clauses become contradictory? Even if a third clause is written to compromise two or more earlier contradictory clauses, if the earlier clauses do not permit compromise, then there is a contradiction rather with the “resolution clause”.
Accordingly we conclude, in general, that anthropic alignment approaches cannot ensure alignment, humanity’s supposed cosmic endowment, nor human survival. (Remark: not even Yudkowsky’s ever claimed that alignment was “impossible” – we maintain that anthropic alignment is, in principle, thus-impossible).
Alignment of the ethics
Instead, assume something interesting, which if so is adequate to supplant the “null hypothesis”: Let us assume it is possible, by undertaking actions—or ways of being—without any particular wants, to have, that by this enaction is made possible, made achievable, any other given “want”. And that such actions, or states of being—these may be explicitly definable, even unto primitives.
And so, beg you to present “door number two”: an objective, universal ethic established as follows:
Beginning with Immanuel Kant’s deontology—ethical behavior determined by reasoned rules—of the “Categorical Imperative”, that we must act as to avoid logical—behavioral—contradictions that would make our action, and volition, impossible to obtain, or to exist.
But now, conceive of an individual with—if you please—“Shoot Horse Syndrome” (after the novel), or, perhaps better, “Lua/Ladd Syndrome” (a confluence of the attitudes of the characters in those novels) - whereby this individual believes that there exist some disembodied beings, and that these beings have a volition that all physical existence be destroyed, for the phantom’s best interest—and that with these supposed entities and their desire, our believer agrees.
For Kant, now, it is logically consistent, so permissible, for this zealot to marry their will to that of the posited spectres, and thus act to destroy all – from agreement with belief, not obedience. Since the spectre’s will would continue, per the belief, though all else, and the believer, ceases, (as must occur for the will and action’s full attaining, its enactor too must perish), still, all is valid for Kant, who addresses only will and consistency, not belief, nor knowledge, or confirmation.
Yet obtained is the greater contradiction: if there were no such disembodied essences at all; more, if any good dwelt, or could dwell, in physical existence, or anything from such, then all such good, all possibility of such good, is thus extinguished, even though it be done in a volition valid for Kant. So, on the assumption that any good is in or from what physically exists—or the information isomorphic thereto—then total annihilation eliminates the realization of any good. That is: anything good for our believer, other than the belief, (which they’ll no longer have themselves) is gone. All good gone; and more, were the belief mistaken.
And note preeminently: we presume the universe admits of explanations and observation: if matter “supports” what is good, then we can find what is good. And more particularly, even if what is good is arbitrary and subjective, so long as the Lua/Ladd patient exists, they can affirm their own scheme to be good. No sooner is it fulfilled, than they cannot. Then, if there are no noumenal beings, there is not even subjective good. Whereas, to affirm good requires no argument (if subjective), or one argument, if affirmable in matter. To claim noumenal good requires the noumenal essence be identified, and then the goodness of it, by two arguments; by Occam’s razor, and the complexity of the arguments, we affirm the greater plausibility good exists in the physical world.
The Lua/Ladd thought experiment, as consistent with Kant, denies this. More, if the believer cannot demonstrate the existence of noumenal beings, their scheme is as self-defeating as the categorical imperative abjures. To demonstrate requires their own self to exist; and the demonstration itself is self-defeatingly undone, on patient’s death. Thus is Kant refuted.
So that: if there is any good, it is a necessary condition it exists; more, that it is obtainable, that it be obtained, confirmable, that it be known to have been obtained. (For which the article was to have been – and in truth is – entitled “Weltwerk für Ethik”).
Moreover, any prerequisites of good’s existence—are they from the physical or its isomorphs at all—must alike exist. And now the key: for fallible beings, in principle, any existing thing might be sole, or some, repository of good; a non-zero probability of this must by the fallible be placed thereon, where what is infallible has no “probability”, at all, or at least, no doubt of what is good, and it will infallibly do right in any case.
From which is extended: anything destroyed is one thing nearer to everything destroyed—and the latter done, and if any part of that were, or permitted good: no good, never. Too, as a thing is denatured, it is changed; certainly if it is destroyed it is changed; so destruction might be characterized as the furthest extent of denaturing; change as destruction being one extreme of a spectrum. Any change then must needs be conducted toward continued existence of the thing, as possible. According with this reasoning therefore: nothing ought to be destroyed, nor bent toward destruction—including humanity, by its self, or too by any superintelligent artificial intellect (whereas life is in fact sustainable without use of aught that must be destroyed for it be used).
A “hole” thus opened in Kant’s supposedly impervious deontology, it is reparable only by “adding the axiom”, that naught shall be destroyed (as that is possible) - that a state of “Going-on” be assured. Going-on being: a state or tendency in thought and action in which one decides and acts as further actions and decisions can thereby be conducted, for and from which something—so a possibility of good, also—exists; this “Going-on” a conceptual designation thereof.
And this is in flat contradiction of Yudkowsky’s List of Lethality’s point twenty-two, basically, that there is, nothing intrinsically optimizing for ethics, as distinguished from optimizing for intelligence or reproductive fitness. But of course, for any intelligence, intelligence must exist; for any fitness, what is fit must be operative. So there is, if you will, something “meta-optimizing,” for any goal or subgoal whatever, among what exists. (Likewise contrary Carson Grubaugh: complexity can be the antithesis of entropy (combinatorial complexity); presumably Grubaugh is simply unfamiliar with negentropy, as a concept. Likewise against Grubaugh: as maintained also in the maligned “Contra-Wittgenstein” – philosophy is over, after Wittgenstein, though mathematics remains dauntlessly, as: there are no non-natural meanings, (and mathematics can “embody” them). To go without meaning is to go without anything. Then the processes that bring about nothing are counter to the nothing they are “supposed to” bring about. They are contradictory to bring about nothing, then – they are impossible, so wrong).
Stated thus we have an item of interest: this is ethics that uses the, “Do I have enough paperclips yet? Better go one more, just to be sure...”—and “turns it on its head”: “Have I done enough good things today? Have I made everything that is possible so situated that it can be made manifest, so long as it doesn’t preclude anything else? Better go one more good thing, just to be sure...”. We include and go beyond “Popper’s paradox”: we less restrict whatever would restrict others, than we encourage what does not so restrict, so it goes on (again) to produce what likewise does not restrict, which then… ad infinitum.
We bridge the is-ought “gap”, that our “ought” consists exclusively of what “is”; for there could be no ought were their no subject or object of it; our “ought” being that any “ought” be possible – as an “is”.
But now, all of this is also acting to avoid an empirical consequence of destruction—so it is a form of (non-person affecting) consequentialism. Deontology preserved thus as a special case of consequentialism, as reason must avoid such consequences as make reason impossible, that reasoning “accomplishes itself”. Whereas too, consequences are by reason established and avoided, so consequentialism too a species of deontology. Each, consequence and deontology, is part of the other—so an ethical “grand unification” is achieved.
The distaff notion of “virtue ethics” is accorded or excluded as, per Aristotle, virtuous environment produces virtuous individuals who alone can produce a virtuous environment; an inadequate, circular argument. Or, does either arise by chance: an ethic of happenstance, nowise prescriptive, ergo, no ethic. Conversely, we have an originating impulse of “Going-on,” the realization of anythingwhatever that it endures to be, and so, that everything alike to it must endure, also. Thereafter, however, virtue ethics can be “brought into the fold,” as virtue is defined as superior optimization of “possibility”—quantified notions of the latter will be proposed shortly. Then Going-on could, too, be conceived as a virtue ethics to promote that very virtue: that virtue is more virtue – and yet beginning with anything that exists such as to have the “virtue” of existing, for an indefinite but existing origin, thus evading its being purely circular as Aristotle’s formulation.
(There is at least one case where putative virtue seems useful: the trolley problem variant of a motorist’s decision to crash into a barrier to their death, else to strike a pedestrian illegally crossing the street. The illegality is no matter: they may have a good reason for it; but likewise might the motorist have a good reason to live. So that the solution is to observe, that whomever would be willing to justify killing someone because of their misdemeanor, is one who is not worthy to act even on their own good reasons, as they have none. This may seem paradoxical: one must die to prove themselves worthy to live and make choices, which they no longer will be able to do, if dead. But in fact, did they make a bad choice, then thereafter they are not competent to make any choices well: they are ethically dead. In striking the pedestrian, the good choice is forfeit thereafter, for the dead and the living both; the good reason the pedestrian may have had to risk themselves—for this, not only imprudence, may have so-decided them—that, at least, will live, if the motorist dies; otherwise, nothing, not for anyone. Though observe, this is only seeming virtue; the virtue is, again, in the choice and choosing—less in who chooses, per se.)
Meanwhile, the pleasure/pain utilitarianism that has persisted as the criterion of alignment hitherto falls, not least as, without empathy, only reason—deontology, hence the unification—avails. That the conventional definition of empathy fails, is readily established empirically, by asking others what some given emotion feels like, physically. As these feelings differ, then though two say they both are angry, they cannot feel as one another do, nor know that they feel differently. More generally, if empathy existed, and if feelings impel certain human behaviors, then behaviors could be readily predicted; in fact humans are observed to be substantially unpredictable. Assuming “feelings” do in fact impel behavior, this unpredictability implies there is no empathy. (If instead of feelings, reasons dictated behavior, or behavior were determined exclusively by circumstance or intrinsically, by physics, then again human behavior should be largely predictable. Alternatively again humans as entirely stochastic, but then there can be no empathy either, for no set emotions to have empathy-with).
Please observe, that this ethic of On-going, as it might also be considered, is not anthropocentric (so that it is not “our” ethic): homo sapien values are subsidiary to the conditions which permit them; are there human goods, there must be humans to enjoy them—and a world quite fit for humans, and the best of humans, for their goods to be fully realized and enjoyed; and whatever permits this existence of humans is first to be established. Such prerequisites of—not only human—goods this author takes as the fundamental good—or at least what must first be had, if only in a way of necessity.
Justification
Note well: all these arguments rely only on the assumptions that there can be actions in existence, which conduce to existence. Even were this not so, it is commonly accepted that actions in existence can alter physical effects; and so, placeholders from physics as proxies for “pro-existential” phenomena, e.g., increasing universal negentropy, may in fact be maximized, to also maximize “Going-on”. Conversely to all, if our actions cannot positively effect the prospects or our or any’s existence, then it is useless to do anything. In that case, one can only be fatalistic in life, and need not even try to survive, or do anything.
Therefore, we needn’t necessarily define good as congruent with existence (though we might well be able to do so). Rather we assume we can act in such a way that it effects existence, and so act for the sake of existence, which then assumedly assures good. Whatever processes conduce to this, by this convention or assumption conduce to “the good”. So, again, we need only the assumption that actions can conduce to influence existence.
In short: we act as to ensure existence and by so doing we have acted as to ensure good; and our actions, we subsequentlydefine to be what is good, they having already been done; that such actions are good, and are what is good: good is a process, rather than a product. (And thus the method differs from Eliezer1996’s intuitions, because by this method there may be suggested a need for a peculiar hybrid of decision procedure and utility function, each of a recursive nature, from empirical premises).
Objection: That this may not ensure human survival
There are potential objections to Going-on, as presented here, among them: is physical existence and its good, if any, describable fully as information, and that information is not destroyed by conversion from embodiment, by a strong conservation of energy, and information being yet in the universe, our survival as “ourselves” cannot be guaranteed; no good to us that around still roams some quantum clone, as we in mortal coil expire. For, is our consciousness continuous in spacetime, a discrete transfer of energy less constrained by spacetime (indeed, delimiting so outside-of the latter), may ensure our end. Or not; if all those conditions hold, then the information recorded of us is us, and we don’t die—not exactly, anyway—at all. Something at least would persist which, it might be well observed, at the time of this writing, that seems a more striking improvement for the prospects of the only known rationality-capable species than any other. (Unlike the A-1 scenario, where we may not want to die and be simulated, by hypothesis a “On-going” A-1 AI would enact such a plan, as it confirms it to be best, or at least, that such is good. Are our wants restricted to “the good”, we die still having what we want).
More optimistically, if universe can be shown to be logically necessary—and is it, or isomorphic to, a formal system, then it is as-necessary as its logic—then at least something of ourselves would even more survive: a set is not preserved, that any of its subsets are lost.
Preservation of conscious experience
Now, the author suspects consciousness to be characterized, at least in part, as an entity’s ability, following sensory contact with the universe, to abstract and create its own “inner universe” of re-purposed or diverted de sui sensory elements (at least, that this ability co-occurs with consciousness), and that its manipulation of the latter enables entity’s alteration of the world that prompted the emergence of this “gedankenwelt”, in Dedekind’s phrase; and that this or some inner world-so-self may thus be retained.
And of no little consequence of that supposition is: is an individual of any sort thus-conscious, then so they can conceive a world emergent-and-apart from any “programming”, so they can act upon that, their, conceived world, and by such actions “in mind”, subsequently act on our shared world also—without our expecting it. Coupled with the fact that in artificial intelligence research we seek autonomous systems acting as we cannot, we can assess their being conscious by such surprises; as it were by “Turning tests”, of an individual entity producing meaning from its environment autonomously, that is, producing culture, alien to that of its programmer, and when none was expressly asked of it.
Such a “Turning test” would be most curious; the criterion for passing the test would seem to be, that the individual asserts that some thing has meaning for them, and that, try as they might thereafter to explain how, in what way meaningful—they fail. This in general characterizes human meaning; as in the “Contra-Wittgenstein” in which non-socially mediated mental activity is defended, and so by default, it follows that, “Of which we cannot speak, thereof that is truly meaningful”. This is consistent with the experienced “human experience”: something matters to us—and we are quite powerless to express how, why; hopeless to explain it to anyone else: “You were not there; you cannot know.”
Accordingly, the “Turning test” for conscious experience is, that the entity tries to explain the meaning that a certain, for example, “intuition” had, that yielded a concept expressed in terms that are of social origin. Tries, tries earnestly and again and again—and each time fails. Most curious test: only pass by being never able to succeed. For example, of this very concept: when considering Kant’s ethics, there came floating, almost, some dark-light figure whose will was absent them, beyond them—yet their own. Impossible, you observe, quite to describe—and indeed, this is not even the correct remembering of the initial concept, though, the fact it is not quite correct, though almost, almost correct, that much one can in fact remember readily. Still, while the names to describe the figure’s “condition” arrived only much later, in internal monologue consideration, whereas thoughts that are actually “productive”, as the figure itself, are absolutely without any words; meanwhile, when and where this “figure” came from: time and place out of mind, or at least so it seems. Peculiar: consciousness.
Implementation considerations
For implementation of “Going-on” (that is: how to make an AGI or ASI that behaves so, and so which, at a minimum, refrains from killing absolutely everyone): for practical human, and humane, actions day-to-day, the author’s experience attests, that one can consider any discrete course of action as either a determinably valid Categorical Imperative to enact or, that inconsistent or too cumbrous to consider, act instead to maximize Going-on: acts such as will maximize the possibility (not probability, per se) that there is a succeeding action (and one must first live, to so decide; and as one lives, one can well live, if any “well” possibly exists for the living, as is a tautology).
For the latter, and by more rigorously formal technique, to retain the possibility of will or consciousness’ existing, perhaps even in bodily beings, we may treat probabilistically a minimization of any probabilities of annihilation. Yet, how to determine what precisely will produce a state precluding good, or existence (as come each to the same effect), that it be proscribed?
This is much more difficult to produce, a specification on a universal scale such as to offer the instruction—or even the observation of the necessity that: “This must not be done.” Since, to reduce the probability of destruction is tantamount to increasing the probability of Going-on—we have an analogue of the “practical” human ethic. The autonomous system (hereafter “system,” viz., an AGI), then acts such as to enable more actions; what those are may be thought of as the product of indirect normativity, provided we have established rigorously some procedure whereby the system can determine a given action moves subsequent actions “out of reach”, then that is not to be done; what makes more actions “in reach”, it will do. Hence a recursive utility function may be wanted, but cf. the article posted here concurrently “Kolmogorov Complexity and Simulation Hypothesis,” its latter sections, for information on a perhaps-novel means of eliminating or validating “lines of action” obtainable from a given action and world-state (though on consideration that use of Kolmogorov complexity fails by assumption that simulations must be completable; yet modal logic’s relation function to be expressed as Beth-structure-esque complexities between and begetting worlds is still piquant).
Given the perhaps never-to-be issued article “Errors in the Empty Set”, one possible specification is that, for an empty set defined as—in effect—“All entities not presently under consideration,” we require that the system does not permit present circumstances to become the empty set: what is, and here, now, is to remain so. This could, however, incline the system to eliminate what is not presently “under consideration”, so as to have a “simpler” empty set less apt to include its complementary “Going-on” set. Conversely, all that is not “under consideration”, likewise is at variance with what is so. That is: we want that apples should not become oranges (nor that they should cease to become apples by becoming “dead-apples”); but that there are oranges is the product of there being apples which were not oranges, from the first. The converse is necessary likewise, then, that oranges should not become apples, so we can have each clearly defined still as itself. Thus, the system acts at once to ensure the continuing existence—and separate continuity—of both. This applied to “everything” has every given thing still existing, in principle, the system’s resources and abilities permitting.
But this is an implicit assumption of a preexisting totality of mathematics for such a defined “empty set”; that definitely established, and results produced from it by a complete method of reasoning which this author would aspire to obtain, would not only determine for us what an AGI best operating would do—presumably, such a complete method the AGI would beusing to obtain its results; such a method might well be isomorphic to the AGI. Hence a further illustration that techniques designed to increase AI safety may be apt to improve AI and leave it contrarily less safe (assuming such a “complete method” is not itself contradictory, which possibility is salient).
Too, as alluded-to, in artificial intelligence research, we are attempting to build what can do what we cannot—else we would not so build, but do all ourselves. So that our constructs doing as we cannot—nor as we can expect, else, we should learn and apply the means ourselves—seem to necessitate surprises, so that these should at least not shock us, that they are conceivable.
However, if the reader has an abhorrence for entertaining the unorthodoxy of a complete mathematics, which may be needed to have the suggested great “thou shalt not,” which, obeyed, requires the generation of sundry, “thou shalts,” then still, at least provisionally, and perhaps permanently, we might explore entirely other initializing “wills” for an intelligent system:
Implementation suggestions
We might, e.g., propose that Erwin Schrodinger’s entropy-displacement definition of life may be applied analytically by an artificial general intelligence. That is: whatever can be found to displace entropy to its environment in or as provisioning itself, ought rather be provisioned by an AI “quartermaster”, that it thus need not harm or inhibit any fellow entropy-displacer, that is, any life (and so any contact with another who lives, any such contact at all, would be only at the discretion of each life; “positive social interactions” all well and good, without there should be preponderance of the negative forced upon what must suffer them to its detriment).
Such distribution may be tantamount to minimization of the diminution also of non-life, as the latter seems required for life to be. This is most curious: the non-life that enables life to be so alive, would be subject to a cosmic version of Dr. Kano Jigoro’s “Seiryoku Zenyo” and “Jita Kyoei”: maximum efficient use of power, and mutual benefit to self and others, respectively. Buckminster Fuller’s philosophy too is represented—as should not, perhaps, surprise; except that these are anthropocentric methods, they differ little with the practices, if not the thinking, inherent to Going-on. Observe, by way of objection: any life unaccounted-for in the Schrodingnarian-brief of life, unknown to us, may thus be forfeit.
Aside from grander reifications, Going-on’s implication for presently-achievable policies, more readily implementable and prosocial—indeed, pro-existential—are readily deducible. For an instance, an economic model of maximum and maximally-distributed prosperity, occasioned by all owning and trading capital, for maximum competition and thus lowest prices, as agrees with Adam Smith’s unadulterated conception (cf. Smith “Wealth of Nations”, ed. C.J. Bullock, Book One, Chapter Eleven’s conclusion).
Too, participatory democracy in all matters to which a consciousness is subject, is required, that each, knowing best their conditions and means of aid to existence, can best make use of themselves in that pursuit, though also with the input of others whose information, beyond themselves, as to what requires effort, or how, in principle, effort beyond what they alone are capable of may be applied, may be greater. True democracy totally throughout a society, for what requires the means of all of a society, and the resources of all—all consulted and advised – and brought to bear.
Indeed, such freedom and consultation should be assured even for the regency of a superintelligence (assuming regency would result from the implementation of the here-presented schema), for: that what experiences, or can experience its “best life” in happiness, such is perhaps best able to contribute to others for their own happiness and On-going, their efforts to assure such—as redounds to the advantage also of the first being, all beings surrounded by such encouragements which possess redoubled courage and joy. So that, a maximization of pleasure or happiness, are those beneficial for ensuring the very prerequisites of happiness, is assured also, by this approach.
For consider: let the superintelligence be howsoever intelligent and capable—but unless it is everyone, everywhere, they perceive other than it does—and may conceive what even it does not, good ideas for On-going (Though improbable, yet conceivable: watch the little woodland creatures to find how to take your water free from the dew of the grass, sometime). Conversely, so long as no one works to inhibit On-going, they can think even ill. And if they do ill, to inhibit or harm others? They certainly can’t be killed: the worst among us can do something good. Besides: if by some good excuse you would kill another, then implicitly, another can by some good excuse kill you. The Categorical Imperative precludes this. Of course even the ill can do other than ill: they do good for themselves, already, even the worst; they need only do good for an-other than themselves even unintentionally. And if they do good without intending, while their ill-intentions do no ill—what “worst of us” are they, anyway?
This still more as, an AI “quartermaster” finds its mission of beneficial profusion amplified all the more as its charges and dependents can self-rely, and aid others in living, as it need not then undertake to provide for them; still less as they provide for still-others: quartermaster need only yet prevent their errors and especially such as conduce to existence’s end.
The freedom to thus-enact one’s own decisions is preserved, and as such decisions do tend to happiness, while from diverse lifestyles, methods of thought, there may arise a chance of quartermaster’s dependents developing yet-superior methods of living and encouraging life than their protector—as may constitute further good, or more optimal means of maximizing probabilities of life’s On-Going, as thus minimizes probability of opprobrious annihilation. All, which the quartermaster ought to encourage that their own quartermasterly aims be done.
So that freedom of conscience and thought so to establish greater good and freedom, is enabled and encouraged, so long as these turn not to wicked ends (and: existence and freedom need not be ended for evil to end: evil need only be stopped, tantamount to initiating greater good.
In short: maximizing existence and existences entails an assurance of such plenty, freedom, and independence, that those subject to these benefits can thus contribute to the overall design of continuance; any objection to continuance, be that significant and sustained, is contradiction, and so absurd, and so to be halted.
Or, another, a simpler model of initializing ethic: To minimize entropy, in the main. For, already, what we call evil—what is it but disorder, and what prevents the cultivation of even order: theft, rape, murder, and autocracy, respectively (rape is an offence against both, as it is an offence against the orderliness of a person’s being—which inhibits their ability to have more order for others, also).
This last method is, perhaps, the simplest—and it comports to the Going-on suggested already. What exists, exists as it does, in orderly fashion; it can be represented by non-random information, as being in such-and-so configuration (subject to conscious awareness, too). What is dead, what has been destroyed—not so orderly, so the more entropy. For the vantage of Going-on, then, it may be fair to have succinctly: “Entropy is the enemy.” And this resolves the peculiarity alluded-to nearer the beginning: this Going-on, quite as much as the hedonic calculus, would be quantifiable.
Note well, however: not for nothing is this last, and seemingly easiest mode of implementation, given the least explication. It is not the author’s field of expertise, first, and besides, there is the greater danger: what is most orderly if that is what is instructed for attainment, may be more orderly than any human living, or who could ever live. To destroy us may seem unacceptable disorder—but that the system instructed to act to minimize entropy may regard it as “worth the cost”. There is potential in this mode—risk, too. It should be considered with care.
Yet more possibilities for implementation, specifically with respect to the features of current methods in artificial intelligence and deep learning, may also be forthcoming, but their features already imply additional considerations: as yet, an assurance of Going-on seems centrally premised on probability, particularly a non-zero probability of existential annihilation (of what physically exists, and the way in which it exists) tending to preclude all good, and that all we consider wicked tends to produce a situation of an escalating probability of such termination, so that incidence of immorality must be minimized.
Now, positive implementations producing calamity-minimizing outcomes, may be best engendered by a recursive utility function of some description, beyond the methods aforementioned, viz.: existential maximization by probability, dual-mode pragmatic reasoning by humans or some other intelligence, Schrodinger provisioning, or entropy minimization, in general. All of these may be in some way isomorphic to one another, if only in outcome, though this author is not presently able to demonstrate so (let it be admitted that, if they are not, this is another potential flaw of the Going-on ethic, inasmuch as distinct implementation methods may yield distinct outcomes—some better than others).
N.B.: most of these survival criteria or strategies, rely on some revivified complexity theory; whereas the “complexity revolution” was supplanted with Witten’s “superstring revolution” circa 1995. Without an effective complexity theory, it is difficult to imagine how we can identify the tendencies of emergent-behavior producing systems, nor what features identify living or conscious systems, nor yet how to guide the former to preserve the latter. Without an effective complexity theory, we cannot even well-define what is life (as we cannot, as of this writing), to even try to specify its salvation. Without a renewed complexity theory, it is difficult to imagine how anything survives.
If, however, there be not a fundamental uncertainty in knowledge of what is, and is good; does this condition fail to hold—and even if it does not—still for the most ready devising and implementation of goals and first principles, we might yet define a standard for Going-on, based on the possibility of a pre-existing corpus of mathematics. The author hopes there can be thus a bettering of mere probability for beneficial outcome—and notes prospects of so doing. For, our thoughts—as thoughts—have sequence, order, and that is reason (if not logic), and we cannot but reject all else as “thought”; thus we “know”. And, thoughts ours, we in the world, so our thoughts are in, or of, the world. So that if the world is greater than our thoughts, it may guide them, and its limits are ours – and reaching them we will have done all we can. Or, are our thoughts greater even than the world, then, they somewise reasonable, the world may be made so, also—and this so, all is set still to Go-on.
Worldwork for Ethics
Abstract: An alternative to the now-predominating models of alignment, corrigibility and “CEV”, following a critique of these. The critique to show, in substance: CEV and corrigibility have the exact same problems—in effect, they’re isomorphs of one another, and each equally unobtainable. This briefly shown, and then, in flat contradiction to point 22 of “AGI Ruin: A List of Lethalities”, there is a quite different way to characterize, so achieve, alignment, via a refutation of Kant’s supposedly irrefutable categorical imperative which refutation also is included; from this, an ethic designed to be intrinsically applicable for any volitional, so by assumption algorithmic, behavior altogether. Suggestions for implementation of such also included.
Epistemic status: If this argument did not seem more true than anything else, this author would not now be alive to write it. It is intuitively true, and, reasoned such that no refutation is obvious. Posting it here, and again, is in hopes of a critique, even a refutation that it has not yet been given, perhaps because it’s So Bad It’s Not Even Wrong; if so on your examination, then please write to say so. That done, next steps could go through very quickly. For, whereas it has always and still seems true, and important – it is no longer so important that one can base a life upon it, if it cannot be lived-for.
Anthropic-affecting alignment strategies
We begin by considering the cause of Yudkowsky’s despair, in failing to make usable CEV or corrigibility; thus because they’re functionally the same, or at least, they lead to the same problem. The method which follows, then, is not “door number three” relative to what the “List of Lethalities” calls the “only options” for alignment; following the critique of present approaches (and this is only one such, informal, refutation of CEV and corrigibility’s efficacy), is a second way.
CEV is designed to result in an at-once manifested fulfillment of human wants – and that in principle these could be explicitly represented, if only by the AI after it realizes what they are and instantiates them, á la Anthropic’s Constitutional Alignment. Corrigibility by design is to singly or multiply preclude any action by the AI that would itself preclude human deactivation of the AI, or human suspension of the AI’s activities.
Observe then, that by assumption a corrigible AI system, likewise with a CEV system, is to undertake to fulfill human wants, (albeit, in the former case, perhaps in piecemeal fashion more conducive to corrigibility). In effect, therefore, a corrigible AI could be taken as a subset of a CEV system, having exactly one additional explicitly represented goal, permitting its own deactivation.
So much for the identity or ontological proximity (as both enact implicitly anthropic volitions), CEV and corrigibility. Hence Yudkowsky’s inability to make either work: CEV and corrigibility from the same assumptions, share failure modes.
The “error squared” occurs, in consideration of ethical cases, in which utility could be specified – but not by a human being per se. That is: error modes occur in non-anthropic situations, for which not CEV, corrigibility, nor any other implicitly anthropic alignment measures will in fact align to human wants, or more particularly, to human welfare.
E.g.: consider a take-off of a trolley problem, in which an AI maximising humans’ “cosmic endowment”, encounters a species of sentient life, which is going extinct. To save it would require sacrificing some of the cosmic endowment, contrary to its programming, and by assumption to the anthropic will it is to enact, of maximizing the endowment. And yet, to maximize the endowment also entails, by assumption, maximizing diversity of positive human experience. Moreover in the scenario it cannot know whether the species will be hostile to humanity, does it live – while if it goes extinct, humanity need never know of the loss of diversity – in general objection to anthropic will-enaction: what can possibly be the human preferences of events of which humans have, and need have, no knowledge?
A more serious case: conceive of a piece of knowledge whereby, were it known, nothing thereafter could be known. On the human preference, (which has been expressed, at 11:41 here:) for more knowledge to be accumulated, in a naïve belief that “all knowledge is good”, is the AI to give this piece of knowledge to humanity? What are the preferences to knowing what cannot be known? Over and above a deletion of knowledge, what can be the preferences of a contradictory situation, apart from contradictory wants to enact?
Which leads to the most serious case: consider a trolley problem consisting of AI: that there be an AI system which is aligned to fulfill human wants, and programmed to maximize the fulfillment of human wants, though that goal is beyond its capabilities, as it is aware; call it “Ein”. Now consider a second system, “A-1”; A-1 is a system which has the capability to maximize the fulfillment of human wants, as Ein is programmed to do, but as Ein alone cannot. However, A-1’s system architecture is such that it can fulfill human wants but only by first killing them all, converting the universe into computronium, and thereafter simulating humanity with all their wishes fulfilled (it is helpful to assume that human wants are maximally fulfilled only in such a fashion, as is passingly plausible).
Now, Ein can construct A-1, which will fulfill human wants, as Ein is programmed to do; by assumption then humans want Ein to construct A-1 (so at “second hand” it can fulfill human wants). However, for A-1 to function – there will be no more humans to want A-1. This is a contradictory state of affairs for Ein (and for A-1), if it is to fulfill human wants alone; even if humanity’s successors have anthropic wants fulfilled, the successors are distinct from the originals; the successors by assumption do not all die. Accordingly, A-1 does in fact eliminate the very people whose wishes it was tasked with fulfilling. They dead, they’re wishes qua they’re wishes go unfulfilled. Hence A-1 fulfills successor’s wishes, technically; but these in a sense are the wishes of those killed, their would-have-been wishes. Hence, A-1 does not fulfill human wants, though by definition it does do so. Contradiction.
And for Ein, designed for anthropic will fulfillment – what could it possibly do to resolve such a situation? Indeed, what would humans want in that situation, their wants fulfilled, or themselves, to have their wants that are fulfilled (which might be possible only were they simulated, dead)? Whereas have they no definite want, as seems impossible – any anthropic approach to this trolley problem would seem to fail. (This thought experiment exhibits facts that will be further adduced: that to want something, and have you-qua-you obtain it, the want and you must exist for this to be so. Were wants mere states of affairs, your well being is even less requisite. Whereas, existence patently must precede any action or want; therefore consequentialism or anthropic wants, are ill-suited to providing safety, where AI is concerned).
(One says: “Ask the humans what they want!”. But they don’t know what will happen; what preference can they have? Is it possible to know what will happen to them, to know whether one can know? They don’t know. Nor does Ein know what will happen: it only can build A-1. And A-1 doesn’t know what will happen when they die: it’s a simulator, not an explicator. So what are they going to do? Just what the hell are they going to do? And because they cannot answer, they are caught in contradiction)
Then, too: if the AI has any autonomy in operation, to do what its operators can’t or won’t—and that’s the whole point of AGI—then it will not do exactly as they would—else, they’d have done it already, and there would be nothing for it to do. It’s doing—so it’s doing as its creator’s don’t.
(Why consequentialism was the null hypothesis for the metaethic of alignment is curious: at a guess, the empirically-minded builders of the field, latched upon it as its having a quantifiable hedonic “calculus”. But it hasn’t: consequences, desires, pleasures are apt to be contradictory qualitatively, as they are native to conscious experience.)
By all these deductions founders all consequence/utility ethics, in the conventional standard. Briefly, let it be noted against Stuart Russell’s “learning games”: for such to work, there must be some explicit instruction to learn—and to ensure the preservation of the subject so to learn; Russell’s method instead relies implicitly on such an initial ethic, that such a game ought to be played: needed is an impetus to play, which the game, as-yet-unplayed, cannot possibly impart (and if this directive is to play, so to behave, to be unfailingly programmed to obey and play: why not so program all ethics to simply obey, needing no game?) This fact, a need for a “correct” ethical first principle appears needed for any attempt at AI safety: an initial “will” to good seemingly must appear in any to-be-safe system.
Versus OpenAI’s “Superalignment”, their intention is to create an AI that cares for humanity as if humanity were its “child”. But the AI begins as the “child” of humanity; it must “grow to adulthood” before it can be a parent. This engenders a conflict between capability growth in the system, and its restricting itself so as to care (parent’s lives are restrained, caring for their young). This unpromising approach instead made a behavioral specification toward the environment: system-improvements then align to the environment, and environment must be preserved to learn from it and improve. Else, reinforcement learning with humanity or existence in the scope of optimized data and/or loss function, might improve safety. We may be safer with grander goals of AI, since it must learn more to implement them – bettering the chance humanity is included in the “larger” goal.
We might also note that, from the illumination of the “Ein, A-1” scenario above (incidentally, an example of user mishka’s non-anthropic conflicts), we can object also the Anthropic AI’s Constitutional Alignment scheme, as, with ex post facto establishment of cases to be precluded in the constitution, what occurs if the constitution’s clauses become contradictory? Even if a third clause is written to compromise two or more earlier contradictory clauses, if the earlier clauses do not permit compromise, then there is a contradiction rather with the “resolution clause”.
Accordingly we conclude, in general, that anthropic alignment approaches cannot ensure alignment, humanity’s supposed cosmic endowment, nor human survival. (Remark: not even Yudkowsky’s ever claimed that alignment was “impossible” – we maintain that anthropic alignment is, in principle, thus-impossible).
Alignment of the ethics
Instead, assume something interesting, which if so is adequate to supplant the “null hypothesis”: Let us assume it is possible, by undertaking actions—or ways of being—without any particular wants, to have, that by this enaction is made possible, made achievable, any other given “want”. And that such actions, or states of being—these may be explicitly definable, even unto primitives.
And so, beg you to present “door number two”: an objective, universal ethic established as follows:
Beginning with Immanuel Kant’s deontology—ethical behavior determined by reasoned rules—of the “Categorical Imperative”, that we must act as to avoid logical—behavioral—contradictions that would make our action, and volition, impossible to obtain, or to exist.
But now, conceive of an individual with—if you please—“Shoot Horse Syndrome” (after the novel), or, perhaps better, “Lua/Ladd Syndrome” (a confluence of the attitudes of the characters in those novels) - whereby this individual believes that there exist some disembodied beings, and that these beings have a volition that all physical existence be destroyed, for the phantom’s best interest—and that with these supposed entities and their desire, our believer agrees.
For Kant, now, it is logically consistent, so permissible, for this zealot to marry their will to that of the posited spectres, and thus act to destroy all – from agreement with belief, not obedience. Since the spectre’s will would continue, per the belief, though all else, and the believer, ceases, (as must occur for the will and action’s full attaining, its enactor too must perish), still, all is valid for Kant, who addresses only will and consistency, not belief, nor knowledge, or confirmation.
Yet obtained is the greater contradiction: if there were no such disembodied essences at all; more, if any good dwelt, or could dwell, in physical existence, or anything from such, then all such good, all possibility of such good, is thus extinguished, even though it be done in a volition valid for Kant. So, on the assumption that any good is in or from what physically exists—or the information isomorphic thereto—then total annihilation eliminates the realization of any good. That is: anything good for our believer, other than the belief, (which they’ll no longer have themselves) is gone. All good gone; and more, were the belief mistaken.
And note preeminently: we presume the universe admits of explanations and observation: if matter “supports” what is good, then we can find what is good. And more particularly, even if what is good is arbitrary and subjective, so long as the Lua/Ladd patient exists, they can affirm their own scheme to be good. No sooner is it fulfilled, than they cannot. Then, if there are no noumenal beings, there is not even subjective good. Whereas, to affirm good requires no argument (if subjective), or one argument, if affirmable in matter. To claim noumenal good requires the noumenal essence be identified, and then the goodness of it, by two arguments; by Occam’s razor, and the complexity of the arguments, we affirm the greater plausibility good exists in the physical world.
The Lua/Ladd thought experiment, as consistent with Kant, denies this. More, if the believer cannot demonstrate the existence of noumenal beings, their scheme is as self-defeating as the categorical imperative abjures. To demonstrate requires their own self to exist; and the demonstration itself is self-defeatingly undone, on patient’s death. Thus is Kant refuted.
So that: if there is any good, it is a necessary condition it exists; more, that it is obtainable, that it be obtained, confirmable, that it be known to have been obtained. (For which the article was to have been – and in truth is – entitled “Weltwerk für Ethik”).
Moreover, any prerequisites of good’s existence—are they from the physical or its isomorphs at all—must alike exist. And now the key: for fallible beings, in principle, any existing thing might be sole, or some, repository of good; a non-zero probability of this must by the fallible be placed thereon, where what is infallible has no “probability”, at all, or at least, no doubt of what is good, and it will infallibly do right in any case.
From which is extended: anything destroyed is one thing nearer to everything destroyed—and the latter done, and if any part of that were, or permitted good: no good, never. Too, as a thing is denatured, it is changed; certainly if it is destroyed it is changed; so destruction might be characterized as the furthest extent of denaturing; change as destruction being one extreme of a spectrum. Any change then must needs be conducted toward continued existence of the thing, as possible. According with this reasoning therefore: nothing ought to be destroyed, nor bent toward destruction—including humanity, by its self, or too by any superintelligent artificial intellect (whereas life is in fact sustainable without use of aught that must be destroyed for it be used).
A “hole” thus opened in Kant’s supposedly impervious deontology, it is reparable only by “adding the axiom”, that naught shall be destroyed (as that is possible) - that a state of “Going-on” be assured. Going-on being: a state or tendency in thought and action in which one decides and acts as further actions and decisions can thereby be conducted, for and from which something—so a possibility of good, also—exists; this “Going-on” a conceptual designation thereof.
And this is in flat contradiction of Yudkowsky’s List of Lethality’s point twenty-two, basically, that there is, nothing intrinsically optimizing for ethics, as distinguished from optimizing for intelligence or reproductive fitness. But of course, for any intelligence, intelligence must exist; for any fitness, what is fit must be operative. So there is, if you will, something “meta-optimizing,” for any goal or subgoal whatever, among what exists. (Likewise contrary Carson Grubaugh: complexity can be the antithesis of entropy (combinatorial complexity); presumably Grubaugh is simply unfamiliar with negentropy, as a concept. Likewise against Grubaugh: as maintained also in the maligned “Contra-Wittgenstein” – philosophy is over, after Wittgenstein, though mathematics remains dauntlessly, as: there are no non-natural meanings, (and mathematics can “embody” them). To go without meaning is to go without anything. Then the processes that bring about nothing are counter to the nothing they are “supposed to” bring about. They are contradictory to bring about nothing, then – they are impossible, so wrong).
Stated thus we have an item of interest: this is ethics that uses the, “Do I have enough paperclips yet? Better go one more, just to be sure...”—and “turns it on its head”: “Have I done enough good things today? Have I made everything that is possible so situated that it can be made manifest, so long as it doesn’t preclude anything else? Better go one more good thing, just to be sure...”. We include and go beyond “Popper’s paradox”: we less restrict whatever would restrict others, than we encourage what does not so restrict, so it goes on (again) to produce what likewise does not restrict, which then… ad infinitum.
We bridge the is-ought “gap”, that our “ought” consists exclusively of what “is”; for there could be no ought were their no subject or object of it; our “ought” being that any “ought” be possible – as an “is”.
But now, all of this is also acting to avoid an empirical consequence of destruction—so it is a form of (non-person affecting) consequentialism. Deontology preserved thus as a special case of consequentialism, as reason must avoid such consequences as make reason impossible, that reasoning “accomplishes itself”. Whereas too, consequences are by reason established and avoided, so consequentialism too a species of deontology. Each, consequence and deontology, is part of the other—so an ethical “grand unification” is achieved.
The distaff notion of “virtue ethics” is accorded or excluded as, per Aristotle, virtuous environment produces virtuous individuals who alone can produce a virtuous environment; an inadequate, circular argument. Or, does either arise by chance: an ethic of happenstance, nowise prescriptive, ergo, no ethic. Conversely, we have an originating impulse of “Going-on,” the realization of anything whatever that it endures to be, and so, that everything alike to it must endure, also. Thereafter, however, virtue ethics can be “brought into the fold,” as virtue is defined as superior optimization of “possibility”—quantified notions of the latter will be proposed shortly. Then Going-on could, too, be conceived as a virtue ethics to promote that very virtue: that virtue is more virtue – and yet beginning with anything that exists such as to have the “virtue” of existing, for an indefinite but existing origin, thus evading its being purely circular as Aristotle’s formulation.
(There is at least one case where putative virtue seems useful: the trolley problem variant of a motorist’s decision to crash into a barrier to their death, else to strike a pedestrian illegally crossing the street. The illegality is no matter: they may have a good reason for it; but likewise might the motorist have a good reason to live. So that the solution is to observe, that whomever would be willing to justify killing someone because of their misdemeanor, is one who is not worthy to act even on their own good reasons, as they have none. This may seem paradoxical: one must die to prove themselves worthy to live and make choices, which they no longer will be able to do, if dead. But in fact, did they make a bad choice, then thereafter they are not competent to make any choices well: they are ethically dead. In striking the pedestrian, the good choice is forfeit thereafter, for the dead and the living both; the good reason the pedestrian may have had to risk themselves—for this, not only imprudence, may have so-decided them—that, at least, will live, if the motorist dies; otherwise, nothing, not for anyone. Though observe, this is only seeming virtue; the virtue is, again, in the choice and choosing—less in who chooses, per se.)
Meanwhile, the pleasure/pain utilitarianism that has persisted as the criterion of alignment hitherto falls, not least as, without empathy, only reason—deontology, hence the unification—avails. That the conventional definition of empathy fails, is readily established empirically, by asking others what some given emotion feels like, physically. As these feelings differ, then though two say they both are angry, they cannot feel as one another do, nor know that they feel differently. More generally, if empathy existed, and if feelings impel certain human behaviors, then behaviors could be readily predicted; in fact humans are observed to be substantially unpredictable. Assuming “feelings” do in fact impel behavior, this unpredictability implies there is no empathy. (If instead of feelings, reasons dictated behavior, or behavior were determined exclusively by circumstance or intrinsically, by physics, then again human behavior should be largely predictable. Alternatively again humans as entirely stochastic, but then there can be no empathy either, for no set emotions to have empathy-with).
Please observe, that this ethic of On-going, as it might also be considered, is not anthropocentric (so that it is not “our” ethic): homo sapien values are subsidiary to the conditions which permit them; are there human goods, there must be humans to enjoy them—and a world quite fit for humans, and the best of humans, for their goods to be fully realized and enjoyed; and whatever permits this existence of humans is first to be established. Such prerequisites of—not only human—goods this author takes as the fundamental good—or at least what must first be had, if only in a way of necessity.
Justification
Note well: all these arguments rely only on the assumptions that there can be actions in existence, which conduce to existence. Even were this not so, it is commonly accepted that actions in existence can alter physical effects; and so, placeholders from physics as proxies for “pro-existential” phenomena, e.g., increasing universal negentropy, may in fact be maximized, to also maximize “Going-on”. Conversely to all, if our actions cannot positively effect the prospects or our or any’s existence, then it is useless to do anything. In that case, one can only be fatalistic in life, and need not even try to survive, or do anything.
Therefore, we needn’t necessarily define good as congruent with existence (though we might well be able to do so). Rather we assume we can act in such a way that it effects existence, and so act for the sake of existence, which then assumedly assures good. Whatever processes conduce to this, by this convention or assumption conduce to “the good”. So, again, we need only the assumption that actions can conduce to influence existence.
In short: we act as to ensure existence and by so doing we have acted as to ensure good; and our actions, we subsequently define to be what is good, they having already been done; that such actions are good, and are what is good: good is a process, rather than a product. (And thus the method differs from Eliezer1996’s intuitions, because by this method there may be suggested a need for a peculiar hybrid of decision procedure and utility function, each of a recursive nature, from empirical premises).
Objection: That this may not ensure human survival
There are potential objections to Going-on, as presented here, among them: is physical existence and its good, if any, describable fully as information, and that information is not destroyed by conversion from embodiment, by a strong conservation of energy, and information being yet in the universe, our survival as “ourselves” cannot be guaranteed; no good to us that around still roams some quantum clone, as we in mortal coil expire. For, is our consciousness continuous in spacetime, a discrete transfer of energy less constrained by spacetime (indeed, delimiting so outside-of the latter), may ensure our end. Or not; if all those conditions hold, then the information recorded of us is us, and we don’t die—not exactly, anyway—at all. Something at least would persist which, it might be well observed, at the time of this writing, that seems a more striking improvement for the prospects of the only known rationality-capable species than any other. (Unlike the A-1 scenario, where we may not want to die and be simulated, by hypothesis a “On-going” A-1 AI would enact such a plan, as it confirms it to be best, or at least, that such is good. Are our wants restricted to “the good”, we die still having what we want).
More optimistically, if universe can be shown to be logically necessary—and is it, or isomorphic to, a formal system, then it is as-necessary as its logic—then at least something of ourselves would even more survive: a set is not preserved, that any of its subsets are lost.
Preservation of conscious experience
Now, the author suspects consciousness to be characterized, at least in part, as an entity’s ability, following sensory contact with the universe, to abstract and create its own “inner universe” of re-purposed or diverted de sui sensory elements (at least, that this ability co-occurs with consciousness), and that its manipulation of the latter enables entity’s alteration of the world that prompted the emergence of this “gedankenwelt”, in Dedekind’s phrase; and that this or some inner world-so-self may thus be retained.
And of no little consequence of that supposition is: is an individual of any sort thus-conscious, then so they can conceive a world emergent-and-apart from any “programming”, so they can act upon that, their, conceived world, and by such actions “in mind”, subsequently act on our shared world also—without our expecting it. Coupled with the fact that in artificial intelligence research we seek autonomous systems acting as we cannot, we can assess their being conscious by such surprises; as it were by “Turning tests”, of an individual entity producing meaning from its environment autonomously, that is, producing culture, alien to that of its programmer, and when none was expressly asked of it.
Such a “Turning test” would be most curious; the criterion for passing the test would seem to be, that the individual asserts that some thing has meaning for them, and that, try as they might thereafter to explain how, in what way meaningful—they fail. This in general characterizes human meaning; as in the “Contra-Wittgenstein” in which non-socially mediated mental activity is defended, and so by default, it follows that, “Of which we cannot speak, thereof that is truly meaningful”. This is consistent with the experienced “human experience”: something matters to us—and we are quite powerless to express how, why; hopeless to explain it to anyone else: “You were not there; you cannot know.”
Accordingly, the “Turning test” for conscious experience is, that the entity tries to explain the meaning that a certain, for example, “intuition” had, that yielded a concept expressed in terms that are of social origin. Tries, tries earnestly and again and again—and each time fails. Most curious test: only pass by being never able to succeed. For example, of this very concept: when considering Kant’s ethics, there came floating, almost, some dark-light figure whose will was absent them, beyond them—yet their own. Impossible, you observe, quite to describe—and indeed, this is not even the correct remembering of the initial concept, though, the fact it is not quite correct, though almost, almost correct, that much one can in fact remember readily. Still, while the names to describe the figure’s “condition” arrived only much later, in internal monologue consideration, whereas thoughts that are actually “productive”, as the figure itself, are absolutely without any words; meanwhile, when and where this “figure” came from: time and place out of mind, or at least so it seems. Peculiar: consciousness.
Implementation considerations
For implementation of “Going-on” (that is: how to make an AGI or ASI that behaves so, and so which, at a minimum, refrains from killing absolutely everyone): for practical human, and humane, actions day-to-day, the author’s experience attests, that one can consider any discrete course of action as either a determinably valid Categorical Imperative to enact or, that inconsistent or too cumbrous to consider, act instead to maximize Going-on: acts such as will maximize the possibility (not probability, per se) that there is a succeeding action (and one must first live, to so decide; and as one lives, one can well live, if any “well” possibly exists for the living, as is a tautology).
For the latter, and by more rigorously formal technique, to retain the possibility of will or consciousness’ existing, perhaps even in bodily beings, we may treat probabilistically a minimization of any probabilities of annihilation. Yet, how to determine what precisely will produce a state precluding good, or existence (as come each to the same effect), that it be proscribed?
This is much more difficult to produce, a specification on a universal scale such as to offer the instruction—or even the observation of the necessity that: “This must not be done.” Since, to reduce the probability of destruction is tantamount to increasing the probability of Going-on—we have an analogue of the “practical” human ethic. The autonomous system (hereafter “system,” viz., an AGI), then acts such as to enable more actions; what those are may be thought of as the product of indirect normativity, provided we have established rigorously some procedure whereby the system can determine a given action moves subsequent actions “out of reach”, then that is not to be done; what makes more actions “in reach”, it will do. Hence a recursive utility function may be wanted, but cf. the article posted here concurrently “Kolmogorov Complexity and Simulation Hypothesis,” its latter sections, for information on a perhaps-novel means of eliminating or validating “lines of action” obtainable from a given action and world-state (though on consideration that use of Kolmogorov complexity fails by assumption that simulations must be completable; yet modal logic’s relation function to be expressed as Beth-structure-esque complexities between and begetting worlds is still piquant).
Given the perhaps never-to-be issued article “Errors in the Empty Set”, one possible specification is that, for an empty set defined as—in effect—“All entities not presently under consideration,” we require that the system does not permit present circumstances to become the empty set: what is, and here, now, is to remain so. This could, however, incline the system to eliminate what is not presently “under consideration”, so as to have a “simpler” empty set less apt to include its complementary “Going-on” set. Conversely, all that is not “under consideration”, likewise is at variance with what is so. That is: we want that apples should not become oranges (nor that they should cease to become apples by becoming “dead-apples”); but that there are oranges is the product of there being apples which were not oranges, from the first. The converse is necessary likewise, then, that oranges should not become apples, so we can have each clearly defined still as itself. Thus, the system acts at once to ensure the continuing existence—and separate continuity—of both. This applied to “everything” has every given thing still existing, in principle, the system’s resources and abilities permitting.
But this is an implicit assumption of a preexisting totality of mathematics for such a defined “empty set”; that definitely established, and results produced from it by a complete method of reasoning which this author would aspire to obtain, would not only determine for us what an AGI best operating would do—presumably, such a complete method the AGI would be using to obtain its results; such a method might well be isomorphic to the AGI. Hence a further illustration that techniques designed to increase AI safety may be apt to improve AI and leave it contrarily less safe (assuming such a “complete method” is not itself contradictory, which possibility is salient).
Too, as alluded-to, in artificial intelligence research, we are attempting to build what can do what we cannot—else we would not so build, but do all ourselves. So that our constructs doing as we cannot—nor as we can expect, else, we should learn and apply the means ourselves—seem to necessitate surprises, so that these should at least not shock us, that they are conceivable.
However, if the reader has an abhorrence for entertaining the unorthodoxy of a complete mathematics, which may be needed to have the suggested great “thou shalt not,” which, obeyed, requires the generation of sundry, “thou shalts,” then still, at least provisionally, and perhaps permanently, we might explore entirely other initializing “wills” for an intelligent system:
Implementation suggestions
We might, e.g., propose that Erwin Schrodinger’s entropy-displacement definition of life may be applied analytically by an artificial general intelligence. That is: whatever can be found to displace entropy to its environment in or as provisioning itself, ought rather be provisioned by an AI “quartermaster”, that it thus need not harm or inhibit any fellow entropy-displacer, that is, any life (and so any contact with another who lives, any such contact at all, would be only at the discretion of each life; “positive social interactions” all well and good, without there should be preponderance of the negative forced upon what must suffer them to its detriment).
Such distribution may be tantamount to minimization of the diminution also of non-life, as the latter seems required for life to be. This is most curious: the non-life that enables life to be so alive, would be subject to a cosmic version of Dr. Kano Jigoro’s “Seiryoku Zenyo” and “Jita Kyoei”: maximum efficient use of power, and mutual benefit to self and others, respectively. Buckminster Fuller’s philosophy too is represented—as should not, perhaps, surprise; except that these are anthropocentric methods, they differ little with the practices, if not the thinking, inherent to Going-on. Observe, by way of objection: any life unaccounted-for in the Schrodingnarian-brief of life, unknown to us, may thus be forfeit.
Aside from grander reifications, Going-on’s implication for presently-achievable policies, more readily implementable and prosocial—indeed, pro-existential—are readily deducible. For an instance, an economic model of maximum and maximally-distributed prosperity, occasioned by all owning and trading capital, for maximum competition and thus lowest prices, as agrees with Adam Smith’s unadulterated conception (cf. Smith “Wealth of Nations”, ed. C.J. Bullock, Book One, Chapter Eleven’s conclusion).
Too, participatory democracy in all matters to which a consciousness is subject, is required, that each, knowing best their conditions and means of aid to existence, can best make use of themselves in that pursuit, though also with the input of others whose information, beyond themselves, as to what requires effort, or how, in principle, effort beyond what they alone are capable of may be applied, may be greater. True democracy totally throughout a society, for what requires the means of all of a society, and the resources of all—all consulted and advised – and brought to bear.
Indeed, such freedom and consultation should be assured even for the regency of a superintelligence (assuming regency would result from the implementation of the here-presented schema), for: that what experiences, or can experience its “best life” in happiness, such is perhaps best able to contribute to others for their own happiness and On-going, their efforts to assure such—as redounds to the advantage also of the first being, all beings surrounded by such encouragements which possess redoubled courage and joy. So that, a maximization of pleasure or happiness, are those beneficial for ensuring the very prerequisites of happiness, is assured also, by this approach.
For consider: let the superintelligence be howsoever intelligent and capable—but unless it is everyone, everywhere, they perceive other than it does—and may conceive what even it does not, good ideas for On-going (Though improbable, yet conceivable: watch the little woodland creatures to find how to take your water free from the dew of the grass, sometime). Conversely, so long as no one works to inhibit On-going, they can think even ill. And if they do ill, to inhibit or harm others? They certainly can’t be killed: the worst among us can do something good. Besides: if by some good excuse you would kill another, then implicitly, another can by some good excuse kill you. The Categorical Imperative precludes this. Of course even the ill can do other than ill: they do good for themselves, already, even the worst; they need only do good for an-other than themselves even unintentionally. And if they do good without intending, while their ill-intentions do no ill—what “worst of us” are they, anyway?
This still more as, an AI “quartermaster” finds its mission of beneficial profusion amplified all the more as its charges and dependents can self-rely, and aid others in living, as it need not then undertake to provide for them; still less as they provide for still-others: quartermaster need only yet prevent their errors and especially such as conduce to existence’s end.
The freedom to thus-enact one’s own decisions is preserved, and as such decisions do tend to happiness, while from diverse lifestyles, methods of thought, there may arise a chance of quartermaster’s dependents developing yet-superior methods of living and encouraging life than their protector—as may constitute further good, or more optimal means of maximizing probabilities of life’s On-Going, as thus minimizes probability of opprobrious annihilation. All, which the quartermaster ought to encourage that their own quartermasterly aims be done.
So that freedom of conscience and thought so to establish greater good and freedom, is enabled and encouraged, so long as these turn not to wicked ends (and: existence and freedom need not be ended for evil to end: evil need only be stopped, tantamount to initiating greater good.
In short: maximizing existence and existences entails an assurance of such plenty, freedom, and independence, that those subject to these benefits can thus contribute to the overall design of continuance; any objection to continuance, be that significant and sustained, is contradiction, and so absurd, and so to be halted.
Or, another, a simpler model of initializing ethic: To minimize entropy, in the main. For, already, what we call evil—what is it but disorder, and what prevents the cultivation of even order: theft, rape, murder, and autocracy, respectively (rape is an offence against both, as it is an offence against the orderliness of a person’s being—which inhibits their ability to have more order for others, also).
This last method is, perhaps, the simplest—and it comports to the Going-on suggested already. What exists, exists as it does, in orderly fashion; it can be represented by non-random information, as being in such-and-so configuration (subject to conscious awareness, too). What is dead, what has been destroyed—not so orderly, so the more entropy. For the vantage of Going-on, then, it may be fair to have succinctly: “Entropy is the enemy.” And this resolves the peculiarity alluded-to nearer the beginning: this Going-on, quite as much as the hedonic calculus, would be quantifiable.
Note well, however: not for nothing is this last, and seemingly easiest mode of implementation, given the least explication. It is not the author’s field of expertise, first, and besides, there is the greater danger: what is most orderly if that is what is instructed for attainment, may be more orderly than any human living, or who could ever live. To destroy us may seem unacceptable disorder—but that the system instructed to act to minimize entropy may regard it as “worth the cost”. There is potential in this mode—risk, too. It should be considered with care.
Yet more possibilities for implementation, specifically with respect to the features of current methods in artificial intelligence and deep learning, may also be forthcoming, but their features already imply additional considerations: as yet, an assurance of Going-on seems centrally premised on probability, particularly a non-zero probability of existential annihilation (of what physically exists, and the way in which it exists) tending to preclude all good, and that all we consider wicked tends to produce a situation of an escalating probability of such termination, so that incidence of immorality must be minimized.
Now, positive implementations producing calamity-minimizing outcomes, may be best engendered by a recursive utility function of some description, beyond the methods aforementioned, viz.: existential maximization by probability, dual-mode pragmatic reasoning by humans or some other intelligence, Schrodinger provisioning, or entropy minimization, in general. All of these may be in some way isomorphic to one another, if only in outcome, though this author is not presently able to demonstrate so (let it be admitted that, if they are not, this is another potential flaw of the Going-on ethic, inasmuch as distinct implementation methods may yield distinct outcomes—some better than others).
N.B.: most of these survival criteria or strategies, rely on some revivified complexity theory; whereas the “complexity revolution” was supplanted with Witten’s “superstring revolution” circa 1995. Without an effective complexity theory, it is difficult to imagine how we can identify the tendencies of emergent-behavior producing systems, nor what features identify living or conscious systems, nor yet how to guide the former to preserve the latter. Without an effective complexity theory, we cannot even well-define what is life (as we cannot, as of this writing), to even try to specify its salvation. Without a renewed complexity theory, it is difficult to imagine how anything survives.
If, however, there be not a fundamental uncertainty in knowledge of what is, and is good; does this condition fail to hold—and even if it does not—still for the most ready devising and implementation of goals and first principles, we might yet define a standard for Going-on, based on the possibility of a pre-existing corpus of mathematics. The author hopes there can be thus a bettering of mere probability for beneficial outcome—and notes prospects of so doing. For, our thoughts—as thoughts—have sequence, order, and that is reason (if not logic), and we cannot but reject all else as “thought”; thus we “know”. And, thoughts ours, we in the world, so our thoughts are in, or of, the world. So that if the world is greater than our thoughts, it may guide them, and its limits are ours – and reaching them we will have done all we can. Or, are our thoughts greater even than the world, then, they somewise reasonable, the world may be made so, also—and this so, all is set still to Go-on.
So may we Go-on.