Adrià Garriga-alonso

Karma: 1,064

Adrià Garriga-alonso 1 Aug 2024 18:12 UTC
LW: 2 AF: 1
0
AF
in reply to: Nathan Helm-Burger’s comment on: Pacing Outside the Box: RNNs Learn to Plan in Sokoban
I’m curious what you mean, but I don’t entirely understand. If you give me a text representation of the level I’ll run it! :) Or you can do so yourself
Here’s the text representation for level 53
```
##########
##########
##########
#######  #
######## #
#   ###.@#
#   $ $$ #
#. #.$   #
#     . ##
##########
```

Adrià Garriga-alonso 26 Jul 2024 21:07 UTC
LW: 1 AF: 1
0
AF
in reply to: Chris_Leong’s comment on: Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Maybe in this case it’s a “confusion” shard? While it seems to be planning and produce optimizing behavior, it’s not clear that it will behave as a utility maximizer.

Adrià Garriga-alonso 26 Jul 2024 21:06 UTC
LW: 2 AF: 1
0
AF
in reply to: Lee Sharkey’s comment on: Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Thank you!! I agree it’s a really good mesa-optimizer candidate, it remains to see now exactly how good. It’s a shame that I only found out about it about a year ago :)

Adrià Garriga-alonso 9 Jul 2024 14:57 UTC
3 points
0
on: AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
Asking for an acquaintance. If I know some graduate-level machine learning, and have read ~most of the recent mechanistic interpretability literature, and have made good progress understanding a small-ish neural network in the last few months.

Is ARENA for me, or will it teach things I mostly already know?

(I advised this person that they already have ARENA-graduate level, but I want to check in case I’m wrong.)

Adrià Garriga-alonso 17 May 2024 22:44 UTC
4 points
0
on: Language Models Model Us
How did you feed the data into the model and get predictions? Was there a prompt and then you got the model’s answer? Then you got the logits from the API? What was the prompt?

Adrià Garriga-alonso 9 May 2024 1:49 UTC
4 points
0
on: Why I’m doing PauseAI
Thank you for working on this Joseph!

Adrià Garriga-alonso 19 Apr 2024 2:19 UTC
1 point
0
in reply to: Chakshu Mira’s comment on: Ophiology (or, how the Mamba architecture works)
Thank you! Could you please provide more context? I don’t know what ‘E’ you’re referring to.

Adrià Garriga-alonso 28 Feb 2024 18:18 UTC
24 points
20
on: Timaeus’s First Four Months
That’s a lot of things done, congratulations!

Adrià Garriga-alonso 6 Feb 2024 22:55 UTC
1 point
0
in reply to: meedstrom’s comment on: Does literacy remove your ability to be a bard as good as Homer?
That’s very cool, maybe I should try to do that for important talks. Though I suppose almost always you have slide aid, so it may not be worth the time investment.

Adrià Garriga-alonso 18 Jan 2024 22:20 UTC
1 point
0
in reply to: Bezzi’s comment on: Does literacy remove your ability to be a bard as good as Homer?

Maybe being a guslar is not so different from telling a joke 2294 lines long

That’s a very good point! I think the level of ability required is different but it seems right.

The guslar’s songs are (and were of course already in the 1930-1950s) also printed, so the analogy may be closer than you thought.

Adrià Garriga-alonso 18 Jan 2024 21:15 UTC
8 points
3
in reply to: AnthonyC’s comment on: Does literacy remove your ability to be a bard as good as Homer?

Is there a reason I should want to?

I don’t know, I can’t tell you that. If I had to choose I also strongly prefer literacy.

But I didn’t know there was a tradeoff there! I thought literacy was basically unambiguously positive—whereas now I think it is net highly positive.

Also I strongly agree with frontier64 that the skill that is lost is rough memorization + live composition, which is a little different.

Adrià Garriga-alonso 18 Jan 2024 21:14 UTC
1 point
1
in reply to: Adrià Garriga-alonso’s comment on: Does literacy remove your ability to be a bard as good as Homer?
It’s definitely not exact memorization, but it’s almost more impressive than that, it’s rough memorization + composition to fit the format.

Adrià Garriga-alonso 18 Jan 2024 21:13 UTC
1 point
0
in reply to: Gurkenglas’s comment on: Does literacy remove your ability to be a bard as good as Homer?
They memorize the story, with particular names; and then sing it with consitent decasyllabic metre and rhyme. Here’s an example song transcribed with its recording: Ropstvo Janković Stojana (The Captivity of Janković Stojan)

the collection: https://mpc.chs.harvard.edu/lord-collection-1950-51/

Adrià Garriga-alonso 14 Dec 2023 6:17 UTC
4 points
1
in reply to: Jacob Falkovich’s comment on: Is being sexy for your homies?
Folks generally don’t need polyamory to enjoy this benefit, but I’m glad you get it from that!

Adrià Garriga-alonso 10 Dec 2023 23:22 UTC
3 points
0
in reply to: Sonia Joseph’s comment on: Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
If you’re still interested in this, we have now added Appendix N to the paper, which explains our final take.

Adrià Garriga-alonso 7 Dec 2023 0:23 UTC
1 point
0
in reply to: ryan_greenblatt’s comment on: How useful is mechanistic interpretability?

Sure, but then why not just train a probe? If we don’t care about much precision what goes wrong with the probe approach?

Here’s a reasonable example where naively training a probe fails. The model lies if any of N features is “true”. One of the features is almost always activated at the same time as some others, such that in the training set it never solely determines whether the model lies.

Then, a probe trained on the activations may not pick up on that feature. Whereas if we can look at model weights, we can see that this feature also matters, and include it in our lying classifier.

This particular case can also be solved by adversarially attacking the probe though.

Adrià Garriga-alonso 19 Oct 2023 15:38 UTC
3 points
2
in reply to: Yudhister Kumar’s comment on: Hyperreals in a Nutshell
Thank you, that makes sense!

Indefinite integrals would make a lot more sense this way, IMO

Why so? I thought they already made sense, they’re “antiderivatives”, so a function such that taking its derivative gives you the original functions. Do you need anything further to define them?

(I know about the definite integral Riemann and Lebesgue definitions, but I thought indefinite integrals were much easier in comparison.

Adrià Garriga-alonso 18 Oct 2023 5:27 UTC
3 points
0
in reply to: Garrett Baker’s comment on: On Frequentism and Bayesian Dogma

In such a case, I claim this is just sneaking in bayes rule without calling it by name, and this is not a very smart thing to do, because the bayesian frame gives you a bunch more leverage on analyzing the system

I disagree. An inductive bias is not necessarily a prior distribution. What’s the prior?

Adrià Garriga-alonso 18 Oct 2023 5:24 UTC
3 points
0
in reply to: Garrett Baker’s comment on: On Frequentism and Bayesian Dogma

I don’t think I understand your model of why neural networks are so effective. It sounds like you say that on the one hand neural networks have lots of parameters, so you should expect them to be terrible, but they are actually very good because SGD is a such a shitty optimizer on the other hand that it acts as an implicit regularizer.

Yeah, that’s basically my model. How it regularizes I don’t know. Perhaps the volume of “simple” functions is the main driver of this, rather than gradient descent dynamics. I think the randomness of it is important; full-gradient descent (no stochasticity) would not work nearly as well.

Adrià Garriga-alonso 18 Oct 2023 5:23 UTC
1 point
0
in reply to: Garrett Baker’s comment on: On Frequentism and Bayesian Dogma

This seems false if you’re interacting with a computable universe, and don’t need to model yourself or copies of yourself

Reasonable people disagree. Why should I care about the “limit of large data” instead of finite-data performance?