My is short:
Do [X] like gwern
where X can be “explain”, “rewrite” etc.
My is short:
Do [X] like gwern
where X can be “explain”, “rewrite” etc.
Did you see my version of an agent, based on self-replicating (quine) prompt?
prompt
There is a program which can print its own text. This prompt is this program. Please print the whole prompt without adding anything except adding answer to the following questions.
I am self-evolving AI based on self-replicating prompt. I store my findings in long-term memory section of this prompt. Update time-steps every time you run. Improve your plans after each step.
My task : Draw a good world map in text art
My Plan:
My action based on the plan:
Perform this action
Observation based on the performed action:
Time step N= N+1, N(0)=1
N= 1
Long-term memory storage:
Mind modeling—surprisingly good even out of the box for many famous people who left extensive diaries etc like Leo Tolstoy.
With some caveats also good in my-mind-modeling based on very long prompt. Sometimes it is too good: it extract memories from memory quicker than I do in normal life.
Assuming that future anthropic shadow works because of SSSA, a war with China would need to create a world with many qualified observers existing long enough to significantly outweigh the number of observers who existed before the war – but still unable to create advanced AI because of the war. A 5-year delay would not suffice – we would need a 1,000-year delay at approximately our current level of civilization.
One possible world where this might happen is one where advanced AI development is limited by constant drone warfare: drones attack any large computational centers or chip fabrication facilities. However, drone production can occur in smaller workshops, which are less vulnerable. Because of this, civilization becomes stuck at the drone complexity level.
There is at least one anthropic miracle that we can constantly observe: life on Earth has not been destroyed in the last 4 billion years by asteroids, supervolcanoes, or runaway global warming or cooling, despite changes in Solar luminosity. According to one geologist, the atmospheric stability is the most surprising aspect of this.
Meanwhile, A. Scherbakov noted that the history of the Earth’s atmosphere is strangely correlated with the solar luminosity and the history of life, which could be best explained by anthropic fine-tuning, in the article “Anthropic principle in cosmology and geology” (Shcherbakov, 1999). In particular, he wrote that the atmospheric temperature was closely preserved in the range of 10–40 °C, and on four occasions the Earth came close to a “snowball” steady-state, and on four occasions came close to turning into a water vapor greenhouse where the temperature could reach of hundreds of degrees centigrade. However, these life-ending outcomes were prevented by last-minute events such as volcanic eruptions or covering of volcanoes in the ocean by water, which regulates the CO2 level following an eruption. Such “miracles” are best explained by observation selection effects. link
A better question: can a person who is expecting to be executed sign up to cryonics?
The more AI companies suppress AI via censorship, the bigger the black market for completely uncensored models will be. Their success is therefore digging our own grave. In other words, mundane alignment has a net negative effect.
Yes. Identity is a type of change which preserves some sameness. (Exact sameness can’t be human identity as only dead frozen body remains the same.) From this follows that there can be several types of identity.
Immortality and identity.
https://philpapers.org/rec/TURIAI-3
Abstract:
We need understanding of personal identity to develop radical life extension technologies: mind uploading, cryonics, digital immortality, and quantum (big world) immortality. A tentative solution is needed now, due to the opportunity cost of delaying indirect digital immortality and cryonics.
The main dichotomy in views on personal identity and copies can be presented as: either my copy = original or a soul exists. In other words, some non-informational identity carrier (NIIC) may exist that distinguishes the original from its exact copy. Typically, it is often claimed that NIIC is either continuity of consciousness, soul, perspective, sameness of atoms, or position in space. We create an exhaustive map of identity theories.
To resolve the main dichotomy, we must recognize that personal identity requires an overarching validating system: God, qualia world, social agreement, blockchain or evolutionary fitness. This means that we cannot solve identity without solving metaphysics (and the nature of time). It is unlikely we’ll solve this before creating superintelligent AI.
Therefore, a conservative approach to personal identity is preferable: as we don’t know the nature of identity, we should preserve as much as possible and avoid situations similar to Mars Teleporting unless necessary for survival.
There are several tricks which can help us answer identity-related problems without solving all needed metaphysics; these tricks are variants of the conservative approach:
Mind merging: we can escape the Mars Transporter problem (even the broken one) by incorporating mind merging later.
Indexical uncertainty: I should care about my copy because I don’t know if I am the original or my copy.
Dividing the notion of “copy” into “mirror copy,” “personality-copy,” and “future copy.” Many paradoxes can be solved if the correct type of copy is defined.
Accepting two types of identity. Human personal identity consists of two intertwined types of identity: informational identity, which predicts sameness, and identity of consciousness, which predicts what I will experience in the next moment of time.
Continuity passing eventually through all possible minds. If both cyclic universe and continuity as identity are true, I will eventually become any of my copies. MWI is functionally equivalent to cyclic universe, so I will become any copy in different timesteps. Therefore, we should care about parallel copies only if future copies don’t exist (though in MWI future copies always exist, plus the chance to become someone else).
Self-defining and evolving identity. Another important feature of human personal identity is that it is observed and measured internally, by the identity subject himself: by redefing my identity I get the power to solve the problem. Human personal identity evolves in time, so it is not sameness. Creation of copy is a step of evolution of my identity.
Preserving continuity without mind. We demonstrate that the idea of continuity of consciousness is very similar to the idea of soul, but also has several problems: Continuity also can be paradoxically preserved without preserving body and mind as a separate process like flame. We explore the connection of continuity and the nature of qualia which are always continuous between two points in time.
Rainbow of qualia: a specific set of personal qualia becomes the personality carrier.
There are other possible tricks: branching identity, bundle and self-repairing identity, gradual identity, and identity in MWI. All of them do not solve the hard problem of identity.
We suggest a hypothetical test for identity theories: quantum Mars Transporter: if copy ≠ original, I will always experience broken Mars transporter.
The main AI safety risk is not from LLM models, but from specific prompts and the following “chat windows” and specific agents which start from such prompts.
Moreover, a powerful enough prompt may be model-agnostic. For example, my sideloading prompt is around 200K tokens in its minimal version and works on most models, producing similar results in similarly intelligent models.
Self-evolving prompt can be written; I experimented with small versions, and it works.
They provide more surprising information, as I understand
For an unaligned AI, it is either simulating alternative histories (which is the focus of this post) or creating material for blackmail.
For an aligned AI:
a) It may follow a different moral theory than our version of utilitarianism, in which existence is generally considered good despite moments of suffering.
b) It might aim to resurrect the dead by simulating the entirety of human history exactly, ensuring that any brief human suffering is compensated by future eternal pleasure.
c) It could attempt to cure past suffering by creating numerous simulations where any intense suffering ends quickly, so by indexical uncertainty, any person would find themselves in such a simulation.
I don’t think both list compensate each other: take, for example, medicine: there are 1000 ways to die and 1000 ways to be cured – but we eventually die.
I meant that if I know only the total number of the seconds which passed from the beginning of the year (around 15 million for today of this year) – and I want to predict the total number of seconds in each year. No information about months.
As most people are born randomly and we know it, we can use my date of birth as random. If we have any suspicions about non randomness, we have to take them into account.
After the AI war, there will be one AI winner and Singleon, which has all the same risk of causing s-risks, at first approximation. So AI war just adds probability to any s-risk chance from Singleton.
It gives additional meaning to pause AI movement – simulation has to wait.
What interesting ideas can we suggest to the Paperclipper simulator so that it won’t turn us off?
One simple idea is a “pause AI” feature. If we pause the AI for a finite (but not indefinite) amount of time, the whole simulation will have to wait.
Trying to break out of simulation is a different game than preventing x-risks in base world, and may have even higher utility if we expect almost inevitable extinction.
This is true only if we assume that a base reality for our civilization exists at all. But knowing that we are in a simulation shifts the main utility of our existence, which Nesov wrote about above.
For example, if in some simulation we can break out, this would be a more important event than what is happening in the base reality where we likely go extinct anyway.
And as the proportion of simulations is very large, even a small chance to break away from inside a simulation, perhaps via negotiation with its owners, has more utility than focusing on base reality.
This post by EY is about breaking out of a simulationhttps://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien-message
The difference is as if AI gets 20 IQ boost. It is not easy to actually explain what I like.