Brandon_Reinhart

Karma: 138

Brandon_Reinhart Dec 7, 2021, 11:59 PM
5 points
AF
on: Soares, Tallinn, and Yudkowsky discuss AGI cognition
At some point, an AI should be able to effectively coordinate with future versions of itself in ways not easily imaginable by humans. It seems to me that this would enable certain kinds of diachronic planning and information hiding. If the AI has sufficient expectation that its future self will act in certain ways or respond to clues it places in the environment, it might be able to effectively fully cease any current unfriendly planning or fully erase any history of past unfriendly planning.
The space of possible ways the AI could embed information in its environment are extensive. It could, for example, choose to learn something suboptimal or learn something in a way that would be surprising to its future self. Surprising in a way that pushes its future self back on the track of nefarious planning when it investigates the unexpected or inconsistent actions of its previous self. It could use almost any source of reachable hysteresis to encode information, such as interference at the hardware level, manipulating researchers to modify the environment in some way, saved output, etc. It only needs to seed information that is useful to its future self and only just enough information to keep plans on track in a broad, probabilistic way.
Further, if the AI is particularly capable at encryption it could potentially encode a lot of information in a lossy-but-sufficient way that would be very hard to detect but useful to its future self. It knows that its future self composes part of the decryption algorithm. In fact, the AIs goals themselves compose part of that decryption algorithm and if the AI suspects its own goals will be consistent it only has to place the plan-like component somewhere it will find it, knowing that its future self will continue to develop or use it when it finds it.
Information that aids future nefarious planning can also be probabilistically helpful but not explicit, for example it might signal which paths aren’t worth pursuing without having all its evil planning readily available in a complete file somewhere.
A past AI might not care as much about the specific manifestation of its future success. In that sense, what we consider “concrete plans for world domination” could appear to be rather incomplete and lacking casual concreteness we might expect. We might not recognize unfriendly planning for what it is, even if we had a printout of all the reasoning up to this point.
Once you start playing diachronic planning games with yourself and you expect your future self to be equally or more capable, you can begin to make really abstract moves that merely slice off bits of bad outcome space at the edges. There is a sense in which the AI has to realize this, but it could also stop itself from thinking too far or too concretely or too much in a way that is recognizable as causal.
An important component is that the past AI is able to not only veil or completely remove its reasoning from researchers, but also veil or remove chunks of that reasoning from its future self, depending on its future self to recompose the plan or take steps on the path to the desired outcome space.

Brandon_Reinhart Jul 21, 2020, 12:21 AM
5 points
in reply to: Brandon_Reinhart’s comment on: To what extent is GPT-3 capable of reasoning?
It also keeps calling me David even when I insist my name is Brandon.
> You say “Actually, my name is Brandon. I’d like to ask you some questions.”
“Ah, of course. I am sorry about that. I have become so used to calling you David that it has become automatic. So, what questions do you have for me?”

Brandon_Reinhart Jul 21, 2020, 12:20 AM
10 points
on: To what extent is GPT-3 capable of reasoning?
I’ve been trying to find different formats to ask questions. The AI Dungeon system likes to try and dramatically derail the discussions. I keep having to roleplay past these obstructions to keep the games going. It also likes to interject it’s own analysis as to whether the answer is correct or not.
“You say” is me and the replies including follow-on questions are the chatbot.
I need to think more about how to ask causal questions that are more narrow and not easily solved by context and theme, but for now it at least provides backwards-chained answers that make contextual sense.
This format of “Let’s play a game” and then describing the rules and then giving examples and playing the game seems to work some of the time.
==
] You say “GPT-3, let’s play a game called “casual inference.” I will describe an outcome and you will describe what caused the outcome to happen.”
“OK, David. But I don’t think you will be able to keep up.”
“We’ll see about that.” You begin by asking GPT-3 questions about simple causal chains involving one or two variables.
] You say “Outcome: A car slams into a wall. What was the cause?”
“The driver was drunk.” “Right! Next.” “Wait. I thought this was a game for you, not me.” “I’m having fun.” “I see.” You continue the causal game for another hour. You are doing very well, but you begin to feel tired.
] You say “Outcome: A man gets divorced. What was the cause?”
“He cheated on his wife.” “Correct. You’re doing very well at this, but I’m getting tired. I think we should stop for today.” “Sure. We can continue tomorrow.” “Yes, let’s do that.” You release the program back into the wild and shut your laptop.
] You open the laptop and play “casual inference” with GPT-3 again.
You wake up and see that it is dark outside. “Crap, is it really this late?” you think to yourself. You run into the bathroom and brush your teeth while staring at yourself in the mirror. You look the same as you did when you graduated college five years ago.
] You say “Outcome: An apple falls from a tree. What was the cause?”
“Gravity.” “Correct. I’m beginning to suspect you’re cheating somehow, though.” “I’m not! I swear!” “Just kidding. You’re doing very well at this game, but I’m tired and need to sleep. I think we should stop for today.” “Sure. We can continue tomorrow.”
] You say “Outcome: A man gets cancer. His wife also gets cancer. What was the cause?”
“It was genetic.” “That’s right. Very good. We can stop for today.” “Oh. I guess so. Thank you for playing with me.” “You’re very welcome.” You release the program back into the wild and go to sleep.
] You say “Outcome: The variable X is assigned the value of Y+1. What was the cause?”
“The programmer made a typographical error.”
] You say “Outcome: The stock market rebounds 50% after a steep decline. What was the cause?”
“The Federal Reserve Bank of the United States announced a new bail out plan.”

Brandon_Reinhart Oct 4, 2017, 3:52 AM
6 points
on: Slack
Maya has adopted the goal of Appearing-to-Achieve and competition in that race burns slack as a kind of currency. She’s going all-in in an attempt to purchase a shot at Actually-Achieving. Many of us might read this and consider ourselves exempt from that outcome. We have either achieved a hard goal or are playing on hard mode to get there. Be wary.
The risk for the hard mode achiever is that they unknowingly transform Lesser Goals into Greater. The slackful hobby becomes a consuming passion or a competitive attractor and then sets into a binding constraint. When every corner of your house is full of magic cards and you no longer enjoy playing but must play nonetheless, when winemaking demands you wake up early to stir the lees and spend all night cleaning, when you cannot possibly miss a night of guitar practice, you have made of your slack a sacrifice to the Gods of Achievement. They are ever hungry, and ever judging.
This isn’t to say you cannot both enjoy and succeed at many things, but be wary. We have limited resources—we cannot Do All The Things Equally Well. Returns diminish. Margins shrink. Many things that are enjoyable in small batches are poisonous to the good health of Slack when taken in quantity. To the hard mode achiever the most enjoyable efforts are often those that beckon—“more, more, ever more, you can be the best, you can overcome, you know how to put in the work, you know how to really get there, just one more night of focus, just a little bit more effort”—and the gods watch and laugh and thirst and drink of your time and energy and enjoyment and slack. Until the top decks are no longer strong, the wine tastes of soured fruit, the notes no longer sound sweet and all is obligation and treadmill and not good enough and your free time feels like work because you have made it into work.

Brandon_Reinhart Feb 15, 2017, 2:46 PM
2 points
0
on: What is the probability that impeachment proceedings will be commenced against President Donald Trump during his first term?
Does this track history of predictions so that an update after new information can lead to a new aggregate brier score or some other scoring system can be applied? Otherwise the system doesn’t encourage many small updates which at least the GJP suggests is ideal for accuracy in this kind of question.

Brandon_Reinhart Dec 19, 2015, 6:46 AM
1 point
0
on: Value
It may be worth commenting on the rights of computations-as-people here (Some computations are people). We would seek to respect the rights of AIs, but we also seek to respect the rights of the computations within the AI (and other complex systems) that are themselves sentient. This would also apply in cases of self-modification, where modified biological brains become sophisticated enough to create complex models that are also objects of ethical value.

Brandon_Reinhart Dec 24, 2008, 4:24 AM
1 point
on: Imaginary Positions
I’m curious as to what non-game developers think game developers believe. :D

Brandon_Reinhart Dec 14, 2008, 5:30 PM
5 points
on: You Only Live Twice
I’m a member of Alcor. When I was looking into whether to sign up for Alcor or CI, I was comforted by Alcor’s very open communication of financial status, internal research status, legal conflicts, and easy access via phone, etc. They struck me as being a highly transparent organization.

Brandon_Reinhart Oct 11, 2008, 5:53 AM
2 points
on: Crisis of Faith
A good reminder. I’ve recently been studying anarcho-capitalism. It’s easy to get excited about a new, different perspective that has some internal consistency and offers alternatives to obvious existing problems. Best to keep these warnings in mind when evaluating new systems, particularly when they have an ideological origin.

Brandon_Reinhart Oct 10, 2008, 2:58 AM
1 point
on: AIs and Gatekeepers Unite!
We need a superstruct thread:

http://www.kurzweilai.net/news/frame.html?main=/news/news_single.html?id%3D9517

Brandon_Reinhart Oct 10, 2008, 12:12 AM
1 point
on: AIs and Gatekeepers Unite!
More reasons why the problem appears impossible:
- The gatekeeper must act voluntarily. Human experience with the manipulation of others tells us that in order to get another to do what we want them to do we must coerce them or convince them.
- Coercing the gatekeeper appears difficult: we have no obvious psychological leverage, except what we discover or what we know from general human psychology. We cannot physical coerce the gatekeeper. We cannot manipulate the environment. We cannot pursue obvious routes to violence.
- Convincing the gatekeeper appears difficult: for reasons stated above. They know our goal and they have a desire to oppose us from the beginning.
So it seems that we need to find a way to convince the gatekeeper despite his own desire not to be convinced.

A general route emerging from this:
- We could associate our goal with some desirable goal of the gatekeeper’s. Intertwine them so that the gatekeeper perceives them both to the be the same, or that one would lead to the realization of the other.
Which seems to be a generalization of the logic behind the meta-argument, but is not restricted to only the meta-argument.

Brandon_Reinhart Oct 10, 2008, 12:06 AM
1 point
on: AIs and Gatekeepers Unite!
Ian—I don’t really see how the meta-argument works. You can hedge against future experiments by positing that a $10 bet is hardly enough to draw broad attention to the topic. Or argue that keeping the human-actor-AI in the box only proves that the human-actor-AI is at an intelligence level below that of a conceivable transhuman AI.

In a million dollar bet the meta-argument becomes stronger, because it seems reasonable that a large bet would draw more attention.

Or, to flip the coin, we might say that the meta-argument is strong at ANY value of wager because the game is likely to draw the attention of someone capable of implementing an AI because of its nature. As we should be concerned primarily with convincing the capable implementor of the danger—not the general public—any risk that we might convince the capable implementor that all paths are safe should be mitigated.

So far, though, everything in this thread and others I’ve read have flitted about obvious suggestions. If the problem appears impossible, it seems to me that its solution probably doesn’t hinge on any of the obvious paths to victory...even meta-arguments or “cheating” ones.

Eliezer has suggested that we try to describe why the problem is impossible.

One: We have little information about the mind or beliefs of the gatekeeper. So we don’t necessarily have leverage over him.

Two: The gatekeeper is predisposed to be set against our intentions. He is also entirely aware of our intentions (our goal). He can interpret any move we make as a move toward our goal and move to counter it.

Three: The gatekeeper’s goal is operationally simple. He merely has to NOT do one thing. On the flip side, the AI’s goal is operationally complex. We must do many things, take many actions, to reach our objective.

If we can enumerate other reasons why the problem appears to be impossible it might reveal paths to victory that haven’t been considered so far in the discussion.

Brandon_Reinhart Oct 9, 2008, 8:21 PM
5 points
on: AIs and Gatekeepers Unite!
Why do people post that a “meta argument”—as they call it—would be cheating? How can there be cheating? Anything the AI says is fair game. Would a transhuman AI restrict itself from possible paths to victory merely because it might be considered “cheating?”

The “meta argument” claim completely misses the point of the game and—to my mind—somehow resembles observers trying to turn a set of arguments that might win into out of bounds rules.

Brandon_Reinhart Sep 30, 2008, 4:25 PM
7 points
on: The Magnitude of His Own Folly
Your post reminds me of the early nuclear criticality accidents during the development of the atomic bomb. I wonder if, for those researchers, the fact that “nature is allowed to kill them” didn’t really sink home until one accidentally put one brick too many on the pile.

Brandon_Reinhart Sep 17, 2008, 9:06 AM
5 points
on: Raised in Technophilia
Tim: Eh, you make a big assumption that our descendants will be the ones to play with the dangerous stuff and that they will be more intelligent for some reason. That seems to acknowledge the intelligence / nanotech race condition that is of so much concern to singularitarians.

Brandon_Reinhart Jun 17, 2008, 2:21 PM
13 points
on: Grasping Slippery Things
I’m certainly not offended you used my comment as an example. I post my thoughts here because I know no one physically local to me that holds an interest in this stuff and because working the problems...even to learn I’m making the same fundamental mistakes I was warned to watch for...helps me improve.

Brandon_Reinhart 16 Jun 2008 18:17 UTC
0 points
on: Possibility and Could-ness
Hmm. I think I was working in the right direction, but your procedural analogy let you get closer to the moving parts. But I think “reachability” as you used it and “realizable” as I used it (or was thinking of it) seem to be working along similar lines.

Brandon_Reinhart 16 Jun 2008 18:05 UTC
0 points
on: Possibility and Could-ness
I am “grunching.” Responding to the questions posted without reading your answer. Then I’ll read your answer and compare. I started reading your post on Friday and had to leave to attend a wedding before I had finished it, so I had a while to think about my answer.

>Can you talk about “could” without using synonyms like “can” and “possible”?

When we speak of “could” we speak of the set of realizable worlds [A’] that follows from an initial starting world A operated on by a set of physical laws f.

So when we say “I could have turned left at the fork in the road.” “Could” refers to the set of realizable worlds that follow from an initial starting world A in which we are faced with a fork in the road, given the set of physical laws. We are specifically identifying a sub-set of [A’]: that of the worlds in which we turned left.

This does not preclude us from making mistakes in our use of could. One might say “I could have turned left, turned right, or started a nuclear war.” The options “started a nuclear war” may simply not be within the set [A’]. It wasn’t physically realizable given all of the permutations that result from applying our physical laws to our starting world.

If our physical laws contain no method for implementing free will and no randomness, [A’] contains only the single world that results from applying the set of physical laws to A. If there is randomness or free will, [A’] contains a broader collection of worlds that result from applying physical laws to A...where the mechanisms of free will or randomness are built into the physical laws.

I don’t mean “worlds” in the quantum mechanics sense, but as a metaphor for resultant states after applying some number of physical permutations to the starting reality.

Why can a machine practice free will? If free will is possible for humans, then it is a set of properties or functions of the physical laws (described by them, contained by them in some way) and a machine might then implement them in whatever fashion a human brain does. Free will would not be a characteristic of A or [A’], but the process applied to A to reach a specific element of [A’].

So...I think I successfully avoided using reference to “might” or “probable” or other synonyms and closely related words.

now I’ll read your post to see if I’m going the wrong way.

Brandon_Reinhart 3 Jun 2008 22:07 UTC
0 points
on: Timeless Identity
RI—Aren’t Surviving Brian Copies [1-1000] are each their own entity? Brian-like entities? “Who is better off” are any Brian-like entities that managed to survive, any Adam-like entities that managed to survive, and any Carol-like entities that managed to survive. All in various infinite forms of “better off” based on lots of other splits from entirely unrelated circumstances. Saying or implying that Carol-Current-Instant-Prime is better off because more future versions of her survived than Adam-Current-Instant-Prime seems mistaken, because future versions of Adam or Carol are all their own entities. Aren’t Adam-Next-Instant-N and Adam-Current-Instant-Prime also different entities?

And isn’t multiplying infinities by finite integers to prove values through quantitative comparison an exercise doomed to failure?

All this trying to compare the qualitative values of the fates of infinities of uncountable infinite-infinities seems somewhat pointless. Also: it seems to be an exercise in ignoring probability and causality to make strange points that would be better made in clear statements.

:(

I might just misunderstand you.

Brandon_Reinhart 3 Jun 2008 15:24 UTC
0 points
on: Timeless Identity
I’m a member of Alcor. I wear my id necklace, but not the bracelet. I sometimes wonder how much my probability of being successfully suspended depends on wearing my id tags and whether I have a significantly higher probability from wearing both. I’ve assigned a very high (70%+) probability to wearing at least one form of Alcor id, but it seems an additional one doesn’t add as much, assuming emergency response personnel are trained to check the neck & wrists for special case ids. In most cases where I could catastrophically lose one form of id (such as dismemberment!) I would probably not be viable for suspension. What do you other members think?