Sandi

Karma: 54

Sandi Apr 18, 2024, 6:20 PM
1 point
0
in reply to: Adam Shai’s comment on: Transformers Represent Belief State Geometry in their Residual Stream
Yep, that’s what I was trying to describe as well. Thanks!

Sandi Apr 17, 2024, 8:41 PM
5 points
0
on: Transformers Represent Belief State Geometry in their Residual Stream
We do this by performing standard linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors) which associated with them in the MSP.

I don’t understand how we go from this to the fractal. The linear probe gives us a single 2D point for every forward pass of the transformer, correct? How do we get the picture with many points in it? Is it by sampling from the transformer while reading the probe after every token and then putting all the points from that on one graph?

Is this result equivalent to saying “a transformer trained on an HMM’s output learns a linear representation of the probability distribution over the HMM’s states”?

Sandi May 3, 2022, 11:05 PM
1 point
in reply to: nostalgebraist’s comment on: larger language models may disappoint you [or, an eternally unfinished draft]
Very comprehensive, thank you!

Sandi May 3, 2022, 9:47 PM
1 point
on: larger language models may disappoint you [or, an eternally unfinished draft]
Epistemic status: I’m not familiar with the technical details of how LMs work, so this is more word association.
You can glide along almost thinking “a human wrote this,” but soon enough, you’ll hit a point where the model gives away the whole game. Not just something weird (humans can be weird) but something alien, inherently unfitted to the context, something no one ever would write, even to be weird on purpose.
What if the missing ingredient is a better sampling method, as in this paper? To my eye, the completions they show don’t seem hugely better. But I do buy their point that sampling for high probability means you get low information completions.

Sandi May 3, 2022, 9:29 PM
3 points
on: Quick Thoughts on A.I. Governance
How many of the decision makers in the companies mentioned care about or even understand the control problem? My impression was: not many.
Coordination is hard even when you share the same goals, but we don’t have that luxury here.
An OpenAI team is getting ready to train a new model, but they’re worried about it’s self improvement capabilities getting out of hand. Luckily, they can consult MIRI’s 2025 Reflexivity Standards when reviewing their codebase, and get 3rd-party auditing done by The Actually Pretty Good Auditing Group (founded 2023).
Current OpenAI wants to build AGI.^[1] Current MIRI could confidently tell them that this is a very bad idea. Sure they could be advised that step 25 of their AGI building plan is dangerous, but so were steps 1 through 24.
MIRI’s advice to them won’t be “oh implement this safety measure and you’re golden” because there’s no such safety measure because we won’t have solved alignment by then. The advice will be “don’t do that”, as it is currently, and OpenAI will ignore it, as they do currently.
1. ^
  Sure, they could actually mean “build AGI in a few decades when alignment is solved and we’re gonna freeze all our current AGI building efforts long before then”, but no they don’t.

Sandi Mar 29, 2022, 6:40 PM
11 points
on: Humans pretending to be robots pretending to be human
TL;DR: Thought this post was grossly misleading. Then I saw that the GPT3 playground/API changed quite a lot recently in notable and perhaps worrying ways. This post is closer to the truth than I thought but I still consider it misleading.
Initially strongly downvoted since the LW post implies (to me) that humans provide some of the GPT3 completions in order to fool users into thinking it’s smarter than it is. Was that interpretation of your post more in the eye of the beholder?
Nested three layers deep is one of two pieces of actual evidence:
InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI.
My impression was that InstructGPT was a new/separate model, available as an option in the API along with the base GPT3, that is openly finetuned with human feedback as a way of aligning the base model. That was the whole point of this paper: https://arxiv.org/abs/2203.02155
This is very different from what I saw this post as implying, because OpenAI are open about it, it’s different from the main GPT3 and it’s not humans providing completions but humans aligning a language model. Hence strong downvote.
(The examples about the completions for select “gotcha” prompts improving over time aren’t very compelling evidence for what this post implies. The ones changing in a day are pretty compelling, though—how weird!)
Then I opened the GPT3 API playground for the first time in a few months and realized that my understanding was outdated. Looks like InstructGPT and the old non-finetuned davinci have been merged into text-davinci-002, which is now the default model.
Trying the “Does grape juice taste better if you add sour milk?” prompt many times over keeps giving me the exact same answer that Gary got, even with max temperature. To test where on the spectrum between “aligned using human review” and “repeating human answers” this lies, I tried some variations:
Does grape juice taste better if you add sour cream?
No, grape juice does not taste better if you add sour cream.
Does wine taste better if you add sour cream?
There is no definitive answer to this question as it depends on personal preferences. Some people might find that wine tastes better when paired with sour cream, while others may not notice a difference. Ultimately, it is up to the individual to decide whether or not they think wine tastes better with sour cream.
Should I add sour milk to grape juice in order to improve taste?
It is not necessary to add sour milk to grape juice in order to improve its taste. Grape juice is already a tart and flavorful beverage. However, if you enjoy the taste of sour milk, you may add it to grape juice to create a tart and refreshing drink.
While GPT3 might not literally outsource a portion of the requests to MTurk, I don’t think it’s unfair to say that some of the completions are straight-up human provided. If corrected completion was added in a way that generalized (e.g. aligning using human feedback like in the paper), then it would have been a different story. But it clearly doesn’t.
So to recap:
- the curation of InstructGPT is now in the default model
- human completions are substituted within a day in response to publicized embarrassing completions (I’m alleging this)
- human completions aren’t added such that the model is aligned to give more helpful answers, because very similar prompts still give bad completions
In addition, and more intangibly, I’m noticing that GPT3 is not the model I used to know. The completions vary a lot less between runs. More strikingly, they have this distinct tone. It reads like a NYT expert fact checker or first page Google results for a medical query.
I tried one of my old saved prompts for a specific kind of fiction prompt and the completion was very dry and boring. The old models are still available and it works better there. But I won’t speculate further since I don’t have enough experience with the new (or the old) GPT3.

Sandi Feb 11, 2021, 3:25 AM
1 point
in reply to: ChristianKl’s comment on: We got what’s needed for COVID-19 vaccination completely wrong
The Kefauver-Harris Drug Amendments of 1962 coincide with a drop in the rate of life-span increase.
I believe that, but I couldn’t find a source. Do you remember where you got it from?

Sandi Oct 8, 2020, 8:34 AM
2 points
in reply to: Adele Lopez’s comment on: Inaccessible finely tuned RNG in humans?
I wonder if, in that case, your brain picks the stopping time, stopping point or “flick” strength using the same RNG source that is used when people just do it by feeling.
What if you tried a 50-50 slider on Aaronson’s oracle, if it’s not too exhausting to do it many times in a row? Or write down a sequence here and we can do randomness tests on it. Though I did see some tiny studies indicating that people can improve at generating random sequences.

Sandi Oct 7, 2020, 6:52 PM
1 point
in reply to: abramdemski’s comment on: Inaccessible finely tuned RNG in humans?
Hm, could we tell apart yours and Zack’s theories by asking a fixed group of people for a sequence of random numbers over a long period of time, with enough delay between each query for them to forget?

Sandi Oct 7, 2020, 6:44 PM
2 points
in reply to: ike’s comment on: Inaccessible finely tuned RNG in humans?
I seriously doubt the majority of the participants in these casual polls are doing anything like that.

Sandi Oct 7, 2020, 6:42 PM
1 point
in reply to: abramdemski’s comment on: Inaccessible finely tuned RNG in humans?
This occurred to me, but I didn’t see how it could work with different ratios. I guess if you have a sample from a variable with a big support (> 100 events) that’s uniformly distributed, that would work (e.g. if x is your birth date in days, then x/365 < 20 would work).
It would be interesting to test this with a very large sample where you know a lot of information about the respondents and then trying to predict their choice.

Sandi Oct 7, 2020, 6:31 PM
1 point
in reply to: Zack_M_Davis’s comment on: Inaccessible finely tuned RNG in humans?
Well, I’m quite satisfied with that. Thank you!

[Question] Inaccessible finely tuned RNG in humans?

SandiOct 7, 2020, 5:04 PM

23 points

13 comments1 min readLW link

Sandi Sep 17, 2020, 11:31 AM
1 point
in reply to: ChristianKl’s comment on: Rationality for Kids?
Here’s an Android game that works like Zendo but has colorful caterpillars, might be great for kids: https://play.google.com/store/apps/details?id=org.gromozeka1980.caterpillar_logic

Sandi Jul 31, 2017, 8:25 PM
2 points
on: Open thread, July 31 - August 6, 2017
What would be the physical/neurological mechanism powering ego depletion, assuming it existed? What stops us from doing hard mental work all the time? Is it even imaginable to, say, study every waking hour for a long period of time, without ever having an evening of youtube videos to relax? I’m not asking what the psychology of willpower is, but rather if there’s a neurology of willpower?

And beyond ego depletion, there’s a very popular model of willpower where the brain is seen as a battery, used up when hard work is being done and charged when relaxing. I see this as a deceptive intuition pump since it’s easy to imagine and yet it doesn’t explain much. What is this energy being used up, physically?

Surely it isn’t actual physical energy (in terms of calories) since I recall that the energy consumption of the brain isn’t significantly increased while studying. In addition, physical energy is abundant nowadays because food is plentiful. If the lack of physical energy was the issue, we could just keep going by eating more sugar.

The reason we can’t workout for 12 hours straight is understood, physiologically. Admittedly, I don’t understand it very well myself, but I’m sure an expert could provide reasons related to muscles being strained, energy being depleted, and so on. (Perhaps I would understand the mental analogue better if I understood this.) I’m looking for a similar mechanism in the brain.

To better explain what I’m talking about, what kind of answer would be satisfying, I’ll give you a couple fake explanations.
- Hard mental work sees higher electrical activity in the brain. If this is kept up for too long, neurons would get physically damaged due to their sensitivity. To prevent damage, brains evolved a felling of tiredness when the brain is overused.
- There is a resource (e.g. dopamine) that is literally depleted during tasking brain operation and regenerated when resting.
- There could also be a higher level explanation. The inspiration for this came from an old text by Yudkowsky. (I didn’t seriously look at those explanations as an answer to my problem because of reasons). I won’t quote the source since I think that post was supposed to be deleted. This excerpt gives a good intuitive picture:
My energy deficit is the result of a false negative-reinforcement signal, not actual damage to the hardware for willpower; I do have the neurological ability to overcome procrastination by expending mental energy. I don’t dare. If you’ve read the history of my life, you know how badly I’ve been hurt by my parents asking me to push myself. I’m afraid to push myself. It’s a lesson that has been etched into me with acid. And yes, I’m good enough at self-alteration to rip out that part of my personality, disable the fear, but I don’t dare do that either. The fear exists for a reason. It’s the result of a great deal of extremely unpleasant experience. Would you disable your fear of heights so that you could walk off a cliff? I can alter my behavior patterns by expending willpower—once. Put a gun to my head, and tell me to do or die, and I can do. Once.

Let me speculate on the answer.

1) There is no neurological limitation. The hardware could, theoretically, run demanding operations indefinitely. But, theories like ego depletion are deceptive memes that spread throughout culture, and so we came to accept an nonexistent limitation. Our belief in the myth is so strong, it might as well be true. The same mechanism as learned helplessness. Needless to say, this could potentially be overcome.

2) There is no neurological limitation, but otherwise useful heuristics stop us from kicking it into higher gear. All of the psychological explanations for akrasia, the kind that are discussed all the time here, come into play. For example, youtube videos provide a tiny, but steady and plentiful stimulus to the reward system, unlike programming, which can have a much higher payout, but one that’s inconsistent, unreliable and coupled with frustration. And so, due to a faulty decision making procedure, the brain never gets to the point where it works to its fullest potential. The decision making procedure is otherwise fast and correct enough, thus mostly useful, so simply removing it isn’t possible. The same mechanism as cognitive biases. It might be similar to how we cannot do arithmetic effortlessly even though the hardware is probably there.

3) There is an in-built neurological limitation because of an evolutionary advantage. Now, defining this evolutionary advantage can lead to the original problem. For example, it cannot be due to minimizing energy consumption, as discussed above. But other explanations don’t run into this problem. Laziness can often lead to more efficient solutions, which is beneficial, so we evolved ego depletion to promote it, and now we’re stuck with it. Of course, all the pitfalls customary to evolutionary psychology apply, so I won’t go in depth about this.

4) There is a neurological limitation deeply related to the way the brain works. Kind of like cars can only go so fast, and it’s not good for them if you push them to maximum speed all the time. At first glance, the brain is propagating charge through neurons all the same, regardless of how tiring an action it’s accomplishing. But one could imagine non-trivial complexities to how the brain functions which account for this particular limitation. I dare not speculate further since I know so little about neurology.

Sandi May 10, 2017, 8:46 PM
0 points
in reply to: HungryHippo’s comment on: Open thread, May 8 - May 14, 2017
What does TapLog lack, besides a reminder feature? It seems pretty nifty from the few screenshots I just saw.

Sandi May 9, 2017, 5:28 PM
0 points
in reply to: Lumifer’s comment on: Open thread, May 8 - May 14, 2017
Yeah, that’s why I kept comparing it to a spreadsheet. Ease of use is a big point. I don’t want to write SQL queries on my phone.

Sandi May 9, 2017, 5:27 PM
0 points
in reply to: ChristianKl’s comment on: Open thread, May 8 - May 14, 2017
Thanks! I didn’t know this was such a developed concept already and that there are so many people trying to measure stuff about themselves. Pretty cool. I’ll check out Quantified Self and what’s linked.

Sandi May 8, 2017, 10:06 PM
1 point
in reply to: lifelonglearner’s comment on: Introducing the Instrumental Rationality Sequence
That is indeed very low weight. My prior is pretty shaky as-is, but that evidence shouldn’t move it much.

I thought about priming a lot while reading. Many of the results he lists are similar to priming, but priming being false doesn’t mean all results similar to it are false. One could consider a broader hypothesis encompassing all that, namely “humans can be influenced by subtle clues to their subconsciousness to a significant degree”. That’s the similarity I see with priming, both it and many of Caldini’s hypothesis follow from this premise. The priming failure would suggest it’s false, but those experiments used extremely subtle subliminal clues, as if they were designed not to work. Much of Caldini’s work affirms this broader thesis. It’s no metastudy, but the guy lists a lot of studies, all affirming it. A lot of Kahneman’s work does, too. Surely it is acceptable that humans often act on instinct (unconsciously) and that they are subconsciously influenced by their surroundings. This follows from System 1 being so prevalent in our thought.

SSC has a new open thread right now, I should ask there. Maybe Scott can clear it up.

Sandi May 8, 2017, 7:44 PM
9 points
on: Open thread, May 8 - May 14, 2017
I have a neat idea for a smartphone app, but I would like to know if something similar exists before trying to create it.

It would be used to measure various things in one’s life without having to fiddle with spreadsheets. You could create documents of different types, each type measuring something different. Data would be added via simple interfaces that fill in most of the necessary information. Reminders based on time, location and other factors could be set up to prompt for data entry. The gathered data would then be displayed using various graphs and could be exported.

The cool thing is that it would be super simple to reliably measure most things on a phone in a way that’s much simpler than keeping a spreadsheet. For example: you want to measure how often you see a seagull. You’d create a frequency-measuring document, entitle it “Seagull sightings”, and each time you open it, there’d be a big button for you to press indicating that you just saw a seagull. Pressing the button would automatically record the time and date, perhaps the location, when this happened. Additional fields could be added, like the size of the seagull, which would be prompted and logged with each press. With a spreadsheet, you’d have to enter the date yourself, and the interface isn’t nearly as convenient.

Another example: you’re curious as to how long you sleep and how you feel in the morning. You’d set up an interval-measuring document with a 1-10 integer field for sleep quality and reminders tied into your alarm app or the time you usually wake up. Each morning you’d enter hours slept and rate how good you feel. After a while you could look at pretty graphs and mine for correlations.

A third example: you can emulate the experience sampling method for yourself. You would have your phone remind you to take the survey at specific times in the day, whereupon you’d be presented with sliders, checkboxes, text fields and other fields of your choosing.

This could be taken further in a useful way by adding a crowd sourcing aspect. Document-templates could be shared in a sort of template marketplace. The data of everyone using a certain template would accumulate in one place, making for a much larger sample size.

Sandi

[Question] Inac­cessible finely tuned RNG in hu­mans?

[Question] Inaccessible finely tuned RNG in humans?