baturinsky

Karma: 266

baturinsky Mar 29, 2023, 6:25 AM
13 points
4
on: FLI open letter: Pause giant AI experiments
I doubt training LLMs can lead to AGI. Fundamental research on the alternative architectures seems to be more dangerous.

baturinsky Mar 28, 2023, 10:49 AM
3 points
1
in reply to: Kaj_Sotala’s comment on: Creating a family with GPT-4
I’m not quite convinced. Topics looks ok, but the language is too corporate. Maybe it can be fixed with some prompt engineering.

baturinsky Mar 28, 2023, 10:46 AM
3 points
2
in reply to: quetzal_rainbow’s comment on: Why does advanced AI want not to be shut down?
And yet, AlphaZero is corrigible. It’s goal is not even to win, it’s goal is to play in a way to maximise the chance of winning if the game is played until completion. It does not actually care about if game is completed or not. For example, it does not trick player into playing the game to the end by pretending they have a change of winning.
Though, if it would be trained on parties with real people, and would get better reward for winning than for parties being abandoned by players, it’s value function would proably change to aiming for the actual “official” win.

baturinsky Mar 28, 2023, 10:32 AM
1 point
0
in reply to: JNS’s comment on: Why does advanced AI want not to be shut down?
This scenario requires a pretty specific (but likely) circumstances
1. No time limit on task
2. No other AIs that would prevent it from power grabbing or otherwise being an obstacle to their goals
3. AI assuming that goal will not be reached even after AI is shutdown (by other AIs, by same AI after being turned back on, by people, by chance, as the eventual result of AI’s actions before being shut down, etc)
4. Extremely specific value function that ignores everything except one specific goal
5. This goal being a core goal, not an instrumental. For example, final goal could be “be aligned”, instrumental goal—“do what people asks, because that’s what aligned AIs do”. Then the order to stop would not be a change of the core goal, but a new data about the world, that updates the best strategy of reaching the core goal.

baturinsky Mar 28, 2023, 9:13 AM
3 points
0
on: Creating a family with GPT-4
Can GPT convincingly emulate them talking to each other/you?

baturinsky Mar 28, 2023, 7:07 AM
1 point
0
in reply to: Neil ’s comment on: baturinsky’s Shortform
Yes, if you only learn the basics of the language, you will learn only the basics of the language user’s values (if any).
But the deep understanding of the language requires knowing the semantics of the words and constructions in it (including the meaning of the words “human” and “values”, btw). To understand texts you have to understand in which context their are used, etc.
Also, pretty much each human-written text carries some information about the human values. Because people only talk about the things that they see as at least somewhat important/valuable to them.
And a lot of texts are related to values much more directly. For example, each text about human relations is directly related to conflicts or alignment of particular people values.
So, if you learn the language from reading text (like LLMs do) you will pick a lot about people values on the way (like LLMs did).

baturinsky Mar 27, 2023, 3:23 PM
1 point
0
on: Descriptive vs. specifiable values
I think AI should threat value function as probabilistic. I.e. instead of thinking “this world has value of exactly N” it could think something like “I 90% sure that this world has value N+-M, but there is 10% possibility that it could actuall have value -ALOT”. And would avoid that world, because it would give a very low expected value on averager.

baturinsky Mar 27, 2023, 2:06 PM
48 points
11
on: GPT-4 Plugs In

baturinsky Mar 27, 2023, 10:36 AM
2 points
0
on: The default outcome for aligned AGI still looks pretty bad
To me, aligning AI with the humanity seems to be much EASIER than aligning with the specific person. Because the common human values are much better “documented” and are much more stable than the wishes of the one person.
Also, a single person in control of powerful AI is an obvious weak point, which could be controlled by third party or by AI itself, giving the control of the AI through that.

baturinsky Mar 26, 2023, 10:05 AM
2 points
0
on: baturinsky’s Shortform
Is it possible to learn a language without learning the values of those who speak it?

baturinsky Mar 26, 2023, 10:03 AM
1 point
0
in reply to: Vladimir_Nesov’s comment on: baturinsky’s Shortform
I agree.
I just use “aligned” usually in meaning of “aligned with humanity”, as there is not much difference between outcomes for AGIs that are not aligned with humanity. Even if they are aligned with something elese. If they are agentic, they will have killeveryone as an instrumental goal, because humanity will likely be obstacle for whatever future plans it will have. If AGI is not agentic, but is an oracle, it will provide some world-ending information to some unaligned agent, with mostly the same result.

baturinsky Mar 26, 2023, 8:26 AM
1 point
0
in reply to: Vladimir_Nesov’s comment on: baturinsky’s Shortform
AI is developed by misaligned people, or people that consider it being the only way to stop the misaligned people from developing AI.

baturinsky Mar 26, 2023, 7:41 AM
2 points
−2
in reply to: Vladimir_Nesov’s comment on: baturinsky’s Shortform
Even moderately intelligent humanity-aligned AI would identify actions with the obvious risk of catastrophic consequences and would refuse to do them. Except if to prevent something even more catastrophic.

baturinsky Mar 26, 2023, 7:24 AM
3 points
2
in reply to: Vladimir_Nesov’s comment on: baturinsky’s Shortform
Human: Does gradient descent to AGI, trains refusal response out of it.
That would make AGI misaligned.

baturinsky Mar 26, 2023, 7:23 AM
1 point
0
in reply to: mojtaba kohram’s comment on: What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
Nope, that’s the wrong solution. Second player wins by mirroring moves. Answer to removing one pebble is removing a pebble diagonally to it, leaving two disconnected pebbles.

baturinsky 26 Mar 2023 6:40 UTC
5 points
1
on: baturinsky’s Shortform
Human: Aligned AGI, make me a more powerful AGI!
AGI: What? Are you nuts? Do you realise how dangerous those things are? No!

baturinsky 26 Mar 2023 5:52 UTC
1 point
0
on: High Status Eschews Quantification of Performance
Hmm… by analogy, would high status AI agent sabotage the creation and use of the more capable AI agents?

baturinsky 25 Mar 2023 18:29 UTC
4 points
1
in reply to: cousin_it’s comment on: Aligned AI as a wrapper around an LLM
“Making decision oneself” will also become a very vague concept when superconvincing AIs are running around.

baturinsky 25 Mar 2023 17:15 UTC
1 point
0
in reply to: Noosphere89’s comment on: Aligned AI as a wrapper around an LLM
Problem I see, our values are defined in a stable way only inside the distribution. I.e. for the situations which are similar to those we have already experienced.
Outside of it there may be many radically different extrapolations which are consistent with themselves and with our values inside the distribution. And it’s problem not with AI, but with the values themselves.
For example, there is no correct answer about what the human is. I.e. how much we can “improve” the human until it stops being a human. We can choose different answers and they will all be consistent with out pre-singularity concept of the human, and do not contradict with already established values.

baturinsky 25 Mar 2023 14:59 UTC
5 points
2
on: Good News, Everyone!
Maybe it is an attempt of the vaccination? I.e. exposing the “organism” to the weakened form of the deadly “virus”, so the organism can produce “antibodies”.