John Schulman

Karma: 483

John Schulman Jun 25, 2021, 5:17 AM
LW: 4 AF: 2
AF
in reply to: Charlie Steiner’s comment on: Frequent arguments about alignment
In my experience, you need separate teams doing safety research because specialization is useful—it’s easiest to make progress when both individuals and teams specialize a bit and develop taste and mastery of a narrow range of topics.

John Schulman Jun 25, 2021, 4:36 AM
LW: 4 AF: 4
AF
in reply to: Rohin Shah’s comment on: Frequent arguments about alignment
Yeah that’s also good point, though I don’t want to read too much into it, since it might be a historical accident.

John Schulman Jun 23, 2021, 6:59 PM
LW: 1 AF: 1
AF
in reply to: Beth Barnes’s comment on: Frequent arguments about alignment
yup, added a sentence about it

Frequent arguments about alignment

John SchulmanJun 23, 2021, 12:46 AM

103 points

17 comments5 min readLW link

John Schulman Jun 9, 2021, 3:46 PM
LW: 11 AF: 7
AF
on: “Decision Transformer” (Tool AIs are secret Agent AIs)
Basically agree—I think that a model trained by maximum likelihood on offline data is less goal-directed than one that’s trained by an iterative process where you reinforce its own samples (aka online RL), but still somewhat goal directed. It needs to simulate a goal-directed agent to do a good job at maximum likelihood. OTOH it’s mostly concerned with covering all possibilities, so the goal directed reasoning isn’t emphasized. But with multiple iterations, the model can improve quality (-> more goal directedness) at the expense of coverage/diversity.

John Schulman Jun 8, 2021, 10:19 PM
LW: 11 AF: 10
AF
on: The case for aligning narrowly superhuman models
Super clear and actionable—my new favorite post on AF.
I also agree with it, and it’s similar to what we’re doing at OpenAI (largely thanks to Paul’s influence).

John Schulman May 31, 2021, 6:03 PM
LW: 4 AF: 4
AF
in reply to: paulfchristiano’s comment on: Teaching ML to answer questions honestly instead of predicting human answers
D’oh, re: the optimum of the objective, I now see that the solution is nontrivial. Here’s my current understanding.
Intuitively, the MAP version of the objective says: find me a simple model theta1 such that there’s more-complex theta2 with high likelihood under p(theta2|theta1) (which corresponds to sampling theta2 near theta1 until theta2 satisfies head-agreement condition) and high data-likelihood p(data|theta2).
And this connects to the previous argument about world models and language as follows: we want theta1 to contain half a world model, and we want theta2 to contain the full world model and high data-likelihood (for one of the head) and the two heads agree. Based on Step1, the problem is still pretty underconstrained, but maybe that’s resolved in Step 2.

John Schulman May 31, 2021, 7:02 AM
LW: 1 AF: 1
AF
on: Teaching ML to answer questions honestly instead of predicting human answers
Isn’t the Step 1 objective (the unnormalized posterior log probability of (θ₁, θ₂)) maximized at θ₁ = θ₂=argmax L + prior? Also, I don’t see what this objective has to do with learning a world model.

John Schulman Jan 9, 2021, 8:08 PM
LW: 16 AF: 12
AF
on: The Case for a Journal of AI Alignment
I think this is a good idea. If you go ahead with it, here’s a suggestion.
Reviewers often procrastinate for weeks or months. This is partly because doing a review takes an unbounded amount of time, especially for articles that are long or confusing. So instead of sending the reviewers a manuscript with a due date, book a calendar event for 2 hours with the reviewers. The reviewers join a call or group chat and read the paper and discuss it. They can also help clear each other’s confusions. They aim to complete the review by the end of the time window.

John Schulman Jan 4, 2021, 4:38 AM
LW: 17 AF: 11
AF
on: Multi-dimensional rewards for AGI interpretability and control
There’s a decent amount of literature on using multiple rewards, though often it’s framed as learning about multiple goals. Here are some off the top of my head:
The Horde (classic): http://www.ifaamas.org/Proceedings/aamas2011/papers/A6_R70.pdf
Universal Value Function Approximators: http://proceedings.mlr.press/v37/schaul15.html
Learning to Act By Predicting: https://arxiv.org/abs/1611.01779
Temporal Difference Models: https://arxiv.org/abs/1802.09081
Successor Features: https://papers.nips.cc/paper/2017/hash/350db081a661525235354dd3e19b8c05-Abstract.html

Also see the discussion in Appendix D about prediction heads in OpenAI Five, used mostly for interpretability/diagnostics https://cdn.openai.com/dota-2.pdf.
What links here?
- Model-based RL, Desires, Brains, Wireheading by Steven Byrnes (Jul 14, 2021, 3:11 PM; 22 points)

John Schulman Dec 30, 2020, 11:31 PM
LW: 8 AF: 7
AF
on: Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian
The results in Neural Networks Are Fundamentally Bayesian are pretty cool—that’s clever how they were able to estimate the densities.
A couple thoughts on the limitations:
- There are various priors over functions for which we can calculate the exact posterior. (E.g., Gaussian processes.) However, doing Bayesian inference on these priors doesn’t perform as well as neural networks on most datasets. So knowing SGD is Bayesian is only interesting if we also know that the prior is interesting. I think the ideal theoretical result would be to show that SGD on neural nets is an approximation of Solomonoff Induction (or something like SI), and the approximation gets better as the NNs get bigger and deeper. But I have yet to see any theory that connects neural nets/ SGD to something like short programs.
- If SGD works because it’s Bayesian, then making it more Bayesian should make it work better. But according to https://arxiv.org/abs/2002.02405 that’s not the case. Lowering the temperature, or taking the MAP (=temperature 0) generalizes better than taking the full Bayesian posterior, as calculated by an expensive MCMC procedure.
What links here?
- Kaarel's comment on Deep Learning is cheap Solomonoff induction? by Lucius Bushnaq (Dec 11, 2024, 2:15 AM; 4 points)

John Schulman Dec 30, 2020, 10:25 PM
LW: 4 AF: 4
AF
on: Debate Minus Factored Cognition
I might be missing some context here, but I didn’t understand the section “No Indescribable Hellworlds Hypothesis” and how hellworlds have to do with debate.

John Schulman Dec 24, 2020, 1:23 AM
LW: 1 AF: 1
AF
in reply to: Beth Barnes’s comment on: Debate update: Obfuscated arguments problem
OK, I guess I’m a bit unclear on the problem setup and how it involves a training phase and deployment phase.

John Schulman Dec 23, 2020, 6:21 PM
LW: 8 AF: 6
AF
on: Debate update: Obfuscated arguments problem
Wonderful writeup!
I’m sure you’ve thought about this, but I’m curious why the following approach fails. Suppose we require the debaters to each initially write up a detailed argument in judge-understandable language and read each other’s argument. Then, during the debate, each debater is allowed to quote short passages from their opponent’s writeup. Honest will be able to either find a contradiction or an unsupported statement in Dishonest’s initial writeup. If Honest quotes a passage and says its unsupported, then dishonest has to respond with the supporting sentences.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer