Ofer

Karma: 1,069

Send me anonymous feedback: https://docs.google.com/forms/d/e/1FAIpQLScLKiFJbQiuRYBhrBbVYUo_c6Xf0f8DN_blbfpJ-2Ml39g1zA/viewform

Any type of feedback is welcome, including arguments that a post/comment I wrote is net negative.

Some quick info about me:

I have a background in computer science (BSc+MSc; my MSc thesis was in NLP and ML, though not in deep learning).

You can also find me on the EA Forum.

Feel free to reach out by sending me a PM. (Update: I’ve turned off email notifications for private messages. If you send me a time sensitive PM, consider also pinging me about it via the anonymous feedback link above.)

Ofer Apr 15, 2023, 8:50 PM
4 points
0
on: On Caring about our AI Progeny
I think the important factors w.r.t. risks re [morally relevant disvalue that occurs during inference in ML models] are probably more like:
1. The training algorithm. Unsupervised learning seems less risky than model-free RL (e.g. the RLHF approach currently used by OpenAI maybe?); the latter seems much more similar, in a relevant sense, to the natural evolution process that created us.
2. The architecture of the model.
Being polite to GPT-n is probably not directly helpful (though it can be helpful by causing humans to care more about this topic). A user can be super polite to a text generating model, and the model (yielded by model-free RL) can still experience disvalue, particularly during an ‘impossible inference’, one in which the input text (the “environment”) is bad in the sense that there is obviously no way to complete the text in a “good” way.

See also: this paper by Brian Tomasik.

Ofer Mar 19, 2023, 3:27 PM
14 points
5
in reply to: Lone Pine’s comment on: More information about the dangerous capability evaluations we did with GPT-4 and Claude.
My question was about whether ARC gets to evaluate [the most advanced model that the AI company created so far] before the company creates a slightly more advanced model (by scaling up the architecture, or by continuing the training process of the evaluated model).

Ofer Mar 19, 2023, 8:29 AM
LW: 11 AF: 3
2
AF
on: More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Did OpenAI/Anthropic allow you to evaluate smaller scale versions* of GPT4/Claude before training the full-scale model?

* [EDIT: and full-scale models in earlier stages of the training process]

Ofer Mar 10, 2023, 9:57 PM
4 points
−4
on: Speed running everyone through the bad alignement bingo. $5k bounty for a LW conversational agent

Will this actually make things worse? No, you’re overthinking this.

This does not seem like a reasonable attitude (both in general, and in this case specifically).

Ofer Mar 5, 2023, 12:57 AM
LW: 5 AF: 1
−3
AF
on: Acausal normalcy

Having thought a bunch about acausal trade — and proven some theorems relevant to its feasibility — I believe there do not exist powerful information hazards about it that stand up to clear and circumspect reasoning about the topic.

Have you discussed this point with other relevant researchers before deciding to publish this post? Is there a wide agreement among relevant researchers that a public, unrestricted discussion about this topic is net-positive? Have you considered the unilateralist’s curse and biases that you may have (in terms of you gaining status/prestige from publishing this)?

Ofer Jan 15, 2023, 11:11 AM
2 points
0
on: Predictable Outcome Payments
Re impact markets: there’s a problem regarding potentially incentivizing people to do risky, net-negative things (that can end up being beneficial). I co-authored this post about the topic.

Ofer Jan 8, 2023, 10:23 PM
LW: 2 AF: 1
0
AF
in reply to: Rohin Shah’s comment on: Categorizing failures as “outer” or “inner” misalignment is often confused
(Though even in that case it’s not necessarily a generalization problem. Suppose every single “test” input happens to be identical to one that appeared in “training”, and the feedback is always good.)

Ofer Jan 7, 2023, 5:35 PM
LW: 2 AF: 1
0
AF
on: Categorizing failures as “outer” or “inner” misalignment is often confused
Generalization-based. This categorization is based on the common distinction in machine learning between failures on the training distribution, and out of distribution failures. Specifically, we use the following process to categorize misalignment failures:
1. Was the feedback provided on the actual training data bad? If so, this is an instance of outer misalignment.
2. Did the learned program generalize poorly, leading to bad behavior, even though the feedback on the training data is good? If so, this is an instance of inner misalignment.
This categorization is non-exhaustive. Suppose we create a superintelligence via a training process with good feedback signal and no distribution shift. Should we expect that no existential catastrophe will occur during this training process?

Ofer Jan 6, 2023, 12:04 PM
2 points
0
on: Collapse Might Not Be Desirable
Relevant & important: The unilateralist’s curse.

Ofer Sep 8, 2022, 9:40 AM
0 points
0
in reply to: Elizabeth’s comment on: Impact Shares For Speculative Projects

I’m interested in hearing what you think the counterfactuals to impact shares/retroactive funding in general are, and why they are better.

The alternative to launching an impact market is to not launch an impact market. Consider the set of interventions that get funded if and only if an impact market it launched. Those are interventions that no classical EA funder decides to fund in a world without impact markets, so they seem unusually likely to be net-negative. Should we move EA funding towards those interventions, just because there’s a chance that they’ll end up being extremely beneficial? (Which is the expected result of launching a naive impact market.)

Ofer Sep 7, 2022, 9:28 AM
−2 points
0
in reply to: Elizabeth’s comment on: Impact Shares For Speculative Projects

I expect prosocial projects to still be launched primarily for prosocial reasons, and funding to be a way of enabling them to happen and publicly allocating credit. People who are only optimizing for money and don’t care about externalities have better ways available to pursue their goals, and I don’t expect that to change.

It seems that according to your model, it’s useful to classify (some) humans as either:

(1) humans who are only optimizing for money, power and status; and don’t care about externalities.

(2) humans who are working on prosocial projects primarily for prosocial reasons.

~~If your model is true, how come the genes that cause humans to be type (1) did not completely displace the genes that cause humans to be type (2) throughout human evolution?~~

According to my model (without claiming originality): Humans generally tend to have prosocial motivations, and people who work on projects that appear prosocial tend to believe they are doing it for prosocial reasons. But usually, their decisions are aligned with maximizing money/power/status (while believing that their decisions are purely due to prosocial motives).

Also, according to my model, it is often very hard to judge whether a given intervention for mitigating x-risks is net-positive or net-negative (due to an abundance of crucial considerations). So subconscious optimizations for money/power/status can easily end up being extremely harmful.

If you describe the problem as “this encourages swinging for the fences and ignoring negative impact”, impact shares suffer from it much less than many parts of effective altruism. Probably below average. Impact shares at least have some quantification and feedback loop, which is more than I can say for the constant discussion of long tails, hits based giving, and scalability.

But a feedback signal can be net-negative if it creates bad incentives (e.g. an incentive to regard an extremely harmful outcome that a project can end up causing as if that potential outcome was neutral).

Ofer Sep 6, 2022, 7:26 PM
1 point
0
in reply to: Elizabeth’s comment on: Impact Shares For Speculative Projects
(To be clear, my comment was not about the funding of your specific project but rather about the general funding approach that is referred to in the title of the OP.)

Ofer Sep 6, 2022, 12:13 PM
0 points
0
on: Impact Shares For Speculative Projects
How do you avoid the problem of incentivizing risky, net-negative projects (that have a chance of ending up being beneficial)?

You wrote:

Ultimately we decided that impact shares are no worse than the current startup equity model, and that works pretty well. “No worse than startup equity” was a theme in much of our decision-making around this system.

If the idea is to use EA funding and fund things related to anthropogenic x-risks, then we probably shouldn’t use a mechanism that yields similar incentives as “the current startup equity model”.

Ofer Sep 4, 2022, 11:34 AM
LW: 4 AF: 2
0
AF
in reply to: paulfchristiano’s comment on: We may be able to see sharp left turns coming

The smooth graphs seem like good evidence that there are much smoother underlying changes in the model, and that the abruptness of the change is about behavior or evaluation rather than what gradient descent is learning.

If we’re trying to predict abrupt changes in the accuracy of output token sequences, the per-token log-likelihood can be a useful signal. What’s the analogous signal when we’re talking about abrupt changes in a model’s ability to deceptively conceal capabilities, hack GPU firmware, etc.? What log-likelihood plots can we use to predict those types of abrupt changes in behavior?

Ofer Aug 30, 2022, 5:45 PM
30 points
6
in reply to: Vaniver’s comment on: Common misconceptions about OpenAI
Does everyone who work at OpenAI sign a non-disparagement agreement? (Including those who work on governance/policy?)

Ofer Aug 30, 2022, 4:54 AM
15 points
−2
in reply to: Vaniver’s comment on: Common misconceptions about OpenAI
Yes. To be clear, the point here is that OpenAI’s behavior in that situation seems similar to how, seemingly, for-profit companies sometimes try to capture regulators by paying their family members. (See 30 seconds from this John Oliver monologue as evidence that such tactics are not rare in the for-profit world.)

Ofer Aug 28, 2022, 10:02 AM
10 points
2
in reply to: Adam Scholl’s comment on: Common misconceptions about OpenAI
Another bit of evidence about OpenAI that I think is worth mentioning in this context: OPP recommended a grant of $30M to OpenAI in a deal that involved OPP’s then-CEO becoming a board member of OpenAI. OPP hoped that this will allow them to make OpenAI improve their approach to safety and governance. Later, OpenAI appointed both the CEO’s fiancée and the fiancée’s sibling to VP positions.
What links here?
- Wei Dai's comment on My takes on the FTX situation will (mostly) be cold, not hot by Holden Karnofsky (EA Forum; Nov 19, 2022, 2:48 PM; 105 points)
- Evan R. Murphy's comment on My takes on the FTX situation will (mostly) be cold, not hot by Holden Karnofsky (EA Forum; Nov 19, 2022, 11:36 PM; 17 points)

Ofer Aug 25, 2022, 4:56 PM
LW: 1 AF: 1
0
AF
in reply to: Ofer’s comment on: Common misconceptions about OpenAI
Sorry, that text does appear in the linked page (in an image).

Ofer Aug 25, 2022, 4:51 PM
LW: 1 AF: 1
0
AF
on: Common misconceptions about OpenAI

The Partnership may never make a profit

I couldn’t find this quote in the page that you were supposedly quoting from. The only google result for it is this post. Am I missing something?

Ofer Jul 28, 2022, 10:12 PM
LW: 2 AF: 1
0
AF
in reply to: evhub’s comment on: Principles of Privacy for Alignment Research

That being said, I think that, most of the time, alignment work ending up in training data is good, since it can help our AI systems be differentially better at AI alignment research (e.g. relative to how good they are at AI capabilities research), which is something that I think is pretty important.

That consideration seems relevant only for language models that will be doing/supporting alignment work.