Jan

Karma: 992

phd student in comp neuroscience @ mpi brain research frankfurt. https://twitter.com/janhkirchner and https://universalprior.substack.com/

Jan Jun 25, 2022, 9:22 PM
3 points
on: [Link] Adversarially trained neural representations may already be as robust as corresponding biological neural representations
There’s an important caveat here:
The visual stimuli are presented 8 degrees over the visual field for 100ms followed by a 100ms grey mask as in a standard rapid serial visual presentation (RSVP) task.
I’d be willing to bet that if you give the macaque more than 100ms they’ll get it right—That’s at least how it is for humans!
(Not trying to shift the goalpost, it’s a cool result! Just pointing at the next step.)

Jan Jun 22, 2022, 9:44 AM
2 points
in reply to: Guillaume Corlouer’s comment on: “Brain enthusiasts” in AI Safety
Great points, thanks for the comment! :) I agree that there are potentially some very low-hanging fruits. I could even imagine that some of these methods work better in artificial networks than in biological networks (less noise, more controlled environment).
But I believe one of the major bottlenecks might be that the weights and activations of an artificial neural network are just so difficult to access? Putting the weights and activations of a large model like GPT-3 under the microscope requires impressive hardware (running forward passes, storing the activations, transforming everything into a useful form, …) and then there are so many parameters to look at.
Giving researchers structured access to the model via a research API could solve a lot of those difficulties and appears like something that totally should exist (although there is of course the danger of accelerating progress on the capabilities side also).

Jan Jun 18, 2022, 7:06 PM
3 points
in reply to: jessicata’s comment on: “Brain enthusiasts” in AI Safety
Great point! And thanks for the references :)
I’ll change your background to Computational Cognitive Science in the table! (unless you object or think a different field is even more appropriate)

“Brain enthusiasts” in AI Safety

Jan and Samuel Nellessen

Jun 18, 2022, 9:59 AM

63 points

5 comments10 min readLW link

(universalprior.substack.com)

Jan Jun 9, 2022, 12:15 PM
3 points
in reply to: jacopo’s comment on: A descriptive, not prescriptive, overview of current AI Alignment Research
Thank you for the comment and the questions! :)
This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping—that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.

A descriptive, not prescriptive, overview of current AI Alignment Research

Jan, Logan Riggs, jacquesthibs and janus

Jun 6, 2022, 9:59 PM

139 points

21 comments7 min readLW link

Jan Jun 3, 2022, 8:08 PM
3 points
in reply to: Measure’s comment on: The Brain That Builds Itself
Oh true, I completely overlooked that! (if I keep collecting mistakes like this I’ll soon have enough for a “My mistakes” page)

Jan Jun 2, 2022, 1:53 PM
2 points
in reply to: Ilio’s comment on: The Brain That Builds Itself
Yes, good point! I had that in an earlier draft and then removed it for simplicity and for the other argument you’re making!

The Brain That Builds Itself

JanMay 31, 2022, 9:42 AM

57 points

6 comments8 min readLW link

(universalprior.substack.com)

Jan May 29, 2022, 12:42 PM
2 points
in reply to: Adam Jermyn’s comment on: Adversarial attacks and optimal control
This sounds right to me! In particular, I just (re-)discovered this old post by Yudkowsky and this newer post by Alex Flint that both go a lot deeper on the topic. I think the optimal control perspective is a nice complement to those posts and if I find the time to look more into this then that work is probably the right direction.

Jan May 24, 2022, 7:39 AM
4 points
on: [Alignment] Is there a census on who’s working on what?
As part of the AI Safety Camp our team is preparing a research report on the state of AI safety! Should be online within a week or two :)

Jan May 23, 2022, 8:36 PM
3 points
in reply to: gwern’s comment on: Adversarial attacks and optimal control
Interesting, I added a note to the text highlighting this! I was not aware of that part of the story at all. That makes it more of a Moloch-example than a “mistaking adversarial for random”-example.

Jan May 23, 2022, 9:29 AM
2 points
in reply to: Adam Jermyn’s comment on: Adversarial attacks and optimal control
Yes, that’s a pretty fair interpretation! The macroscopic/folk psychology notion of “surprise” of course doesn’t map super cleanly onto the information-theoretic notion. But I tend to think of it as: there is a certain “expected surprise” about what future possible states might look like if everything evolves “as usual”, $I_{p} ([x_{1}, \dots, x_{N}])$ . And then there is the (usually larger) “additional surprise” about the states that the AI might steer us into, $I_{ξ} ([x_{1}, \dots, x_{N}])$ . The delta between those two is the “excess surprise” that the AI needs to be able to bring about.
It’s tricky to come up with a straightforward setting where the actions of the AI can be measured in nats, but perhaps the following works as an intuition pump: “If we give the AI full, unrestricted access to a control panel that controls the universe, how many operations does it have to perform to bring about the catastrophic event?”. That’s clearly still not well defined (there is no obvious/privileged way that the panel should look like), but it shows that 1) the “excess surprise” is a lower bound (we wouldn’t usually give the AI unrestricted access to that panel) and 2) that the minimum amount of operations required to bring about a catastrophic event is probably still larger than 1.

Adversarial attacks and optimal control

JanMay 22, 2022, 6:22 PM

17 points

7 comments8 min readLW link

(universalprior.substack.com)

Jan May 9, 2022, 11:58 AM
1 point
in reply to: Algon’s comment on: Elementary Infra-Bayesianism
Thank you for your comment! You are right, these things are not clear from this post at all and I did not do a good job at clarifying that. I’m a bit low on time atm, but hopefully, I’ll be able to make some edits to the post to set the expectations for the reader more carefully.
The short answer to your question is: Yep, X is the space of events. In Vanessa’s post it has to be compact and metric, I’m simplifying this to an interval in R. And $P_{+} / P_{-}$ can be derived from $P_{g}^{H}$ by plugging in g=0 and replacing the measure $m (A)$ by the Lesbegue integral $\int_{A} d m$ . I have scattered notes where I derive the equations in this post. But it was clear to me that if I want to do this rigorously in the post, then I’d have to introduce an annoying amount of measure theory and the post would turn into a slog. So I decided to do things hand-wavy, but went a bit too hard in that direction.

Elementary Infra-Bayesianism

JanMay 8, 2022, 12:23 PM

41 points

3 comments7 min readLW link

(universalprior.substack.com)

Jan May 5, 2022, 9:08 PM
2 points
on: High-stakes alignment via adversarial training [Redwood Research report]
Cool paper, great to see the project worked out! (:
One question: How do you know the contractors weren’t just answering randomly (or were confused about the task) in your “quality after filtering” experiments (Table 4)? Is there agreement across contractors about the quality of completions (in case they saw the same completions)?

Continental Philosophy as Undergraduate Mathematics

JanApr 26, 2022, 8:05 AM

16 points

3 comments9 min readLW link

(universalprior.substack.com)

Jan Apr 21, 2022, 8:42 AM
1 point
in reply to: gabrielrecc’s comment on: GPT-3 and concept extrapolation
Fascinating! Thanks for sharing!

Jan Apr 20, 2022, 11:52 AM
13 points
on: GPT-3 and concept extrapolation
Cool experiment! I could imagine that the tokenizer handicaps GPT’s performance here (reversing the characters leads to completely different tokens). With a character-level tokenizer GPT should/might be able to handle that task better!

Jan

“Brain en­thu­si­asts” in AI Safety

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

The Brain That Builds Itself

Ad­ver­sar­ial at­tacks and op­ti­mal control

Ele­men­tary In­fra-Bayesianism

Con­ti­nen­tal Philos­o­phy as Un­der­grad­u­ate Mathematics

“Brain enthusiasts” in AI Safety

A descriptive, not prescriptive, overview of current AI Alignment Research

Adversarial attacks and optimal control

Elementary Infra-Bayesianism

Continental Philosophy as Undergraduate Mathematics