Ofer

Karma: 1,070

Send me anonymous feedback: https://docs.google.com/forms/d/e/1FAIpQLScLKiFJbQiuRYBhrBbVYUo_c6Xf0f8DN_blbfpJ-2Ml39g1zA/viewform

Any type of feedback is welcome, including arguments that a post/comment I wrote is net negative.

Some quick info about me:

I have a background in computer science (BSc+MSc; my MSc thesis was in NLP and ML, though not in deep learning).

You can also find me on the EA Forum.

Feel free to reach out by sending me a PM. (Update: I’ve turned off email notifications for private messages. If you send me a time sensitive PM, consider also pinging me about it via the anonymous feedback link above.)

Ofer Jul 28, 2022, 10:08 PM
LW: 1 AF: 1
2
AF
in reply to: Richard_Ngo’s comment on: Principles of Privacy for Alignment Research
Maybe the question here is whether including certain texts in relevant training datasets can cause [language models that pose an x-risk] to be created X months sooner than otherwise.

The relevant texts I’m thinking about here are:
1. Descriptions of certain tricks to evade our safety measures.
2. Texts that might cause the ML model to (better) model AIS researchers or potential AIS interventions, or other potential AI systems that the model might cooperate with (or that might “hijack” the model’s logic).

Ofer Jul 28, 2022, 9:26 PM
LW: 1 AF: 1
0
AF
in reply to: Richard_Ngo’s comment on: Principles of Privacy for Alignment Research
Is that because you think it would be hard to get the relevant researchers to exclude any given class of texts from their training datasets [EDIT: or prevent web crawlers from downloading the texts etc.]? Or even if that part was easy, you would still feel that that lever is very small?

Ofer Jul 24, 2022, 8:36 AM
4 points
2
in reply to: Michaël Trazzi’s comment on: Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

First point: by “really want to do good” (the really is important here) I mean someone who would be fundamentally altruistic and would not have any status/power desire, even subconsciously.

Then I’d argue the dichotomy is vacuously true, i.e. it does not generally pertain to humans. Humans are the result of human evolution. It’s likely that having a brain that (unconsciously) optimizes for status/power has been very adaptive.

Regarding the rest of your comment, this thread seems relevant.

Ofer Jul 24, 2022, 6:37 AM
10 points
7
in reply to: Lukas_Gloor’s comment on: Connor Leahy on Dying with Dignity, EleutherAI and Conjecture
I’d add to that bullet list:
- Severe conflicts of interest are involved.

Ofer Jul 24, 2022, 6:35 AM
12 points
6
in reply to: Lone Pine’s comment on: Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

I strong downvoted your comment in both dimensions because I found it disagreeable and counterproductive.

Generally, I think it would be net-negative to discourage such open discussions about unilateral, high-risk interventions—within the EA/AIS communities—that involve conflicts of interest. Especially, for example, unilateral interventions to create/fund for-profit AGI companies, or to develop/disseminate AI capabilities.

Ofer Jul 24, 2022, 6:24 AM
13 points
6
in reply to: Michaël Trazzi’s comment on: Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Like, who knew that the thing would become a Discord server with thousands of people talking about ML? That they would somewhat succeed? And then, when the thing is pretty much already somewhat on the rails, what choice do you even have? Delete the server? Tell the people who have been working hard for months to open-source GPT-3 like models that “we should not publish it after all”?

I think this eloquent quote can serve to depict an important, general class of dynamics that can contribute to anthropogenic x-risks.

Ofer Jul 24, 2022, 6:22 AM
3 points
2
in reply to: Michaël Trazzi’s comment on: Connor Leahy on Dying with Dignity, EleutherAI and Conjecture
I don’t think discussing whether someone really wants to do good or whether there is some (possibly unconscious?) status-optimization process is going to help us align AI.

Two comments:
- [wanting to do good] vs. [one’s behavior being affected by an unconscious optimization for status/power] is a false dichotomy.
- Don’t you think that unilateral interventions within the EA/AIS communities to create/fund for-profit AGI companies, or to develop/disseminate AI capabilities, could have a negative impact on humanity’s ability to avoid existential catastrophes from AI?

Ofer Jul 14, 2022, 8:37 AM
1 point
0
in reply to: Kei’s comment on: Comment on “Propositions Concerning Digital Minds and Society”
This concern seems relevant if (1) a discount factor is used in an RL setup (otherwise the systems seems as likely to be deceptively aligned with or without the intervention, in order to eventually take over the world), and (2) a decision about whether the system is safe for deployment is made based on its behavior during training.

As an aside, the following quote from the paper seems relevant here:

Ensuring copies of the states of early potential precursor AIs are preserved to later receive benefits would permit some separation of immediate safety needs and fair compensation.

Ofer Jun 19, 2022, 10:28 AM
LW: 3 AF: 2
AF
in reply to: Jan_Kulveit’s comment on: Announcing the Alignment of Complex Systems Research Group
I think this comment is lumping together the following assumptions under the “continuity” label, as if there is a reason to believe that either they are all correct or all incorrect (and I don’t see why):
1. There is large distance in model space between models that behave very differently.
2. Takeoff will be slow.
3. It is feasible to create models that are weak enough to not pose an existential risk yet able to sufficiently help with alignment.
I bet more on scenarios where we get AGI when politics is very different compared to today.

I agree that just before ~~“super transformative”~~ ~AGI systems are first created, the world may look very differently than it does today. This is one of the reasons I think Eliezer has too much credence on doom.

Ofer May 31, 2022, 4:37 AM
LW: 9 AF: 5
AF
on: Six Dimensions of Operational Adequacy in AGI Projects
Even with adequate closure and excellent opsec, there can still be risks related to researchers on the team quitting and then joining a competing effort or starting their own AGI company (and leveraging what they’ve learned).

Ofer May 27, 2022, 4:30 AM
LW: 3 AF: 3
AF
on: autonomy: the missing AGI ingredient?
Do you generally think that people in the AI safety community should write publicly about what they think is “the missing AGI ingredient”?

It’s remarkable that this post was well received on the AI Alignment Forum (18 karma points before my strong downvote).

Ofer May 9, 2022, 8:49 PM
6 points
on: Hard evidence that mild COVID cases frequently reduce intelligence
Regarding the table in the OP, there seem to be strong selection effects that are involved. For example, the “recruitment setting” for the “Goërtz 2020″ study is described as:

Recruitment from Facebook groups for COVID-19 patients with persistent symptoms and registries on the website of the Lung Foundation on COVID-19 information

Ofer May 8, 2022, 11:49 AM
1 point
on: Video and Transcript of Presentation on Existential Risk from Power-Seeking AI
Hey there!

And then finally there are actually some formal results where we try to formalize a notion of power-seeking in terms of the number of options that a given state allows a system. This is work [...] which I’d encourage folks to check out. And basically you can show that for a large class objectives defined relative to an environment, there’s a strong reason for a system optimizing those objectives to get to the states that give them many more options.

Do you understand the main theorems in that paper and for what environments they are applicable? (My impression is that very few people do, even though the work has been highly praised within the AI alignment community.)

[EDIT: for more context see this comment.]

Ofer May 6, 2022, 2:33 PM
1 point
on: Various Alignment Strategies (and how likely they are to work)

Rather than letting super-intelligent AI take control of human’s destiny, by merging with the machines humans can directly shape their own fate.

.

Since humans connected to machines are still “human”, anything they do definitionally satisfies human values.

We are already connected to machines (via keyboards and monitors). The question is how a higher bandwidth interface will help in mitigating risks from huge, opaque neural networks.

Ofer Apr 27, 2022, 8:03 PM
LW: 1 AF: 1
AF
in reply to: Not Relevant’s comment on: Thoughts on gradient hacking
Suppose that each subnetwork does general reasoning and thus up until some point during training the subnetworks are useful for minimizing loss.

Ofer Apr 27, 2022, 7:48 PM
3 points
in reply to: Not Relevant’s comment on: Gradient hacking
[EDIT: sorry, I need to think through this some more.]

Ofer Apr 27, 2022, 6:09 PM
1 point
in reply to: Not Relevant’s comment on: Mesa-utility functions might not be purely consequentialist
I wouldn’t use the myopic vs. long-term framing here. Suppose a model is trained to play chess via RL, and there are no inner alignment problems. The trained model corresponds to a non-myopic agent (a chess game can last for many time steps). But the environment that the agent “cares” about is an abstract environment that corresponds to a simple chess game. (It’s an environment with less than $13^{64}$ states). The agent doesn’t care about our world. Even if some potential activation values in the network correspond to hacking the computer that runs the model and preventing the computer from being turned off etc., the agent is not interested in doing that. The computer that runs the agent is not part of the agent’s environment.

Ofer Apr 27, 2022, 4:32 PM
3 points
AF
in reply to: Evan R. Murphy’s comment on: Relaxed adversarial training for inner alignment
If the model that is used as a Microscope AI does not use any optimization (search), how will it compute the probability that, say, Apple’s engineers will overcome a certain technical challenge?

Ofer Apr 27, 2022, 3:54 PM
1 point
in reply to: Not Relevant’s comment on: Mesa-utility functions might not be purely consequentialist
Agents that don’t care about influencing our world don’t care about influencing the future weights of the network.

Ofer Apr 23, 2022, 4:23 PM
2 points
on: Mesa-utility functions might not be purely consequentialist
(Haven’t read the OP thoroughly so sorry if not relevant; just wanted to mention...)

If any part of the network at any point during training corresponds to an agent that “cares” about an environment that includes our world then that part can “take over” the rest of the network via gradient hacking.