Interested in math, Game Theory, etc.
Pattern
We rarely recognize how extraordinary it is that so many people can go entire lifetimes neither realizing how often, all that is necessary to steal famous artworks, is to find a way to use a window as a door, or grab a drywall knife and make a door, nor having to perform surgery on themselves.
The article is short enough—One page! - you should read it instead of the description that follows. One thing I appreciate about is that it covers just a subject, briefly, and does so well.
I’m not sure if I have the right to copy the article over, so I didn’t. I came across a screenshot of it online, and looked up the source above.
This article is about how feeling stupid is a sign of ignorance, but it’s something that happens when you’re learning (e.g grad+), especially when you’re working on projects to find out things that no else has yet. (e.g. PhD.)
At first I thought that on lesswrong, if someone was writing something like this, they’d probably make up some new words, or title it like: “The Feeling of Ignorance”. I looked up the definition of stupidity to see what I could find, and have pasted some results below. “Following” denotes which line I chose to follow (a part of) by looking up the definition (of a word from that line).
Before getting to that, I am first pasting what I wrote after that:
I would also note that ‘lacking good judgement’ might be how someone might characterize themselves having been in hindsight, when they are no longer ignorant. This seems unavoidable when no one has the necessary knowledge.
I think the title as is, is a part of the piece addressing an issue, and have not altered it for that reason.
-
stupidity
stoo͞-pĭd′ĭ-tē, styoo͞-
noun
The quality or condition of being stupid.
A stupid act, remark, or idea.
A state of stupor or stupefaction; torpidity of feeling or of mind.Following 1:
stupid
stoo͞′pĭd, styoo͞′-
adjective
Slow to learn or understand; obtuse.
Tending to make poor decisions or careless mistakes.
Marked by a lack of intelligence or care; foolish or careless.Following 3:
foolish
foo͞′lĭsh
adjective
Lacking or exhibiting a lack of good sense or judgment; silly.
Capable of arousing laughter; absurd or ridiculous.
Embarrassed; abashed.Following 3:
embarrassed
adjective
feeling uneasily or unpleasantly self-conscious due to some event or circumstance.
feeling inferior or unworthy and hence unpleasantly self-conscious.
Having a feeling of shameful discomfort.ETA:
What is now the second sentence.
“” around Following. Search “Following” to find that.
[Linkpost] The importance of stupidity in scientific research
So, again, you end up needing alignment to generalize way out of the training distribution
I assume this is ‘you need alignment if you are going to try ’generalize way out of the training distribution and give it a lot of power″ (or you will die).
And not something else like ‘it must stay ‘aligned’ - and not wirehead itself—to pull something like this off, even though it’s never done that before’. (And thus ‘you need alignment to do X’, not because you will die if you do, but because alignment means something like ‘the ability to generalize way out of the training distribution, and not, it’s ‘safe’* even though it’s doing that.)
*Safety being hard to define in a technical way, such that the definition can provide safety. (Sort of.)
… This happens in practice in real life, it is what happened in the only case we know about, and it seems to me that there are deep theoretical reasons to expect it to happen again: the first semi-outer-aligned solutions found, in the search ordering of a real-world bounded optimization process, are not inner-aligned solutions. This is sufficient on its own, even ignoring many other items on this list, to trash entire categories of naive alignment proposals which assume that if you optimize a bunch on a loss function calculated using some simple concept, you get perfect inner alignment on that concept.
Are there examples of inner-aligned solutions? (It seems I’m not up to date on this.)
‘This problem seems hard. Perhaps making AI that’s generally good, and then having the AI do it would be easier.’
lack of charity is a flaw of curiosity,
What?
What theorem?
How technical is the use of the word ‘distributed’ here?
While arranging my evening, I may perform some Bayesian updates. Maybe I learn that the movie is not available on Netflix, so I ask a friend if they have a copy, then check Amazon when they don’t. This process is reasonably well-characterized as me having a centralized model of the places I might find the movie, and then Bayes-updating that model each time I learn another place where I can/can’t find it.
It seems more like going through a list of places and checking off ‘not there’ than Bayesian updating. Sure, that’s a special case,
My friends and I, as a system, are not well-modeled as Bayesian updates to a single central knowledge-state; otherwise we wouldn’t check Netflix twice.
but it seems like ‘centrality’ is less likely to be the thing here than something else. Coordination is mentioned, but it seems more like you both check Netflix because you’re not asking ‘what if _ checks Netflix’. In other words, maybe you’re not acting in a ‘Bayesian manner’. Rather than evaluate the probability, you take the action. I would also guess you didn’t say Netflix because ‘the probability points that way’.
If you watch Netflix a lot (or have used it recently) then it might come to mind quickly. If your friend watches something else a lot, maybe they check there first.
There’s not much of a benefit of more elaborate protocols here (beyond texting your friend it’s not on netflix), if there’s not a lot of services to search. (Otherwise you could come up with a list together (or independently) and handle your parts (or pick some off the list at random, figuring that if both of you do that, you’re more likely to find it, even if you don’t coordinate more).) So I won’t go into a lot more detail here, other than mentioning:
There are other considerations at play here than probability: cost. You have Netflix so you check there.
1. Yeah, this is tricky. I didn’t like the terminology, but I didn’t have a replacement. It’s hard to come up with a term for this (for reasons discussed at length in the post). I was looking more at ‘both are ’boundaries″ and disambiguating that it is your boundary (versus the social one) that you are sort of opting in/asking others to work with you to define. (Opting-in (by self) to boundary exploration (of self by others).) ‘Boundary exploration’ still doesn’t sound good, though ‘boundary violation’ sounds worse. Emphasizing the opt-in part in the terminology seems helpful, given that it’s what you want is a surprise, hence it not being ‘someone asks for permission to push you in the pool’.
1⁄2. It seems clear that what you want would involve people asking someone other than the person being surprised. (Like planning a surprise party, or ‘Friend A throws Friend B into the pool in order to splash Friend C during a water fight/similar game’.)
2. Yeah, aside from the issue over all (surprising seems hard to scale)...You were mostly talking about other things, but it kind of sounded like you wanted a surprise party. (Or to be surprised by, not it, but what would happen there.) That seems like it could be
hard to do with a party.
Very dependent on stuff like where you are (versus talking about an abstract topic on LW). (Like, is the weather good enough that, your friends don’t tell you where the party will be, and the day of, they surprise you by*...going to the beach. Or some other place that’s fun for a group, and it’s a surprise.)
*associated details might include, your eyes are covered or closed until you get there etc.
This is a narrower topic than ‘how to handle/negotiate fitting the personal bounds rather than the other one, which is being treated in this post as serving a different purpose’, so I didn’t focus on it more.
3. That makes sense.
(Prompt:)
The important part would be:
1. The post communicates its point but the terminology could be better. (Which is probably why there are so many “hedges”.)
Less important:
2. In order to scale up, some things do require opt in/advance notice. Some possibilities are (largely) exclusive of each other. (A costume party and a surprise water balloon fight.)
3. The post mentions different subcultures have different rules, but talks about society boundaries like they are one thing only.(Purpose:)
Overall, I made notes as I read the post. (This post is fairly straightforward and didn’t need lots of re-reads to understand, but it is kind of long. More complex and long occasionally go together, so I made notes as I went. It’s also useful for more formed thoughts and has a few quotes or points I could go back and re-read, instead of having to skim the whole thing to get back to.)
There are all sorts of different domains in which we have those different boundaries. If the above were a representation of people’s feelings about personal space, then the person on the left would probably be big into hugs and slaps-on-the-shoulder, while the one on the right might not be comfortable sharing an elevator with more than one other person (if that).
If the above were a representation of, say, people’s openness to criticism, then the person on the left probably wouldn’t mind if you told them their presentation sucked, in front of an audience of their friends, colleagues, and potential romantic partners. Meanwhile, the person on the right would probably prefer that you send a private message checking to see whether they were even interested in critical feedback at this time.
Can we abandon one dimensional continuums as models for everything? (They work well for one thing! They’re awful everywhere else. As a ‘representation of people’s feelings about personal space’ it didn’t need an explanation—it was simple to extrapolate. Then you added more dimensions that don’t collapse well.)
Some people like hugs.
Some people don’t.
Some people are fine with ‘hugs and slaps’.
Some people are not.
The above also doesn’t do a great job of showing uncertainty in one’s boundaries, which is often substantial. The “grey area” between okay and not okay might be quite small, in some cases (you have a clear, unambiguous “line” that you do not want crossed) and quite wide in others where you’re not sure how you feel, and you might not know exactly where that gradient begins and ends.
While we’re here we might as well review set theory, namely the difference between:
[0, 1] and (0, 1), and maybe cover fuzzy sets as well (whatever those are).
But for any given subculture, it seems to me that society tries to set the boundaries at something like “ninety percent of the present/relevant/participating people will not have their personal boundaries violated.”
might want to emphasize subculture there.
depending on all sorts of factors
It seems like different groups clearly have this set to different thresholds. (4chan might be unusual in this regard, even if it is broadly accurate.)
I suspect some people’s minds will have leapt straight to the (true!) point that
If you wanted to write an essay without so many caveats, you could have talked about ‘boundaries between you and ‘the world″ and gone on about how sometimes you like ‘going on adventures’ where you are fine with that boundary between ‘you and ‘the world’ being different’.
Would that have served the purpose of this essay? Perhaps not.
The only way to tell that a given social boundary violation is benign is to find out, from the individual, whether it in fact failed to violate their personal boundary.
Fair enough. ‘failed’ isn’t an ideal metaphor. The pie thing might be fine as part of a game, once. (You lose the set of matches, pie to the face. The winner gets to...eat a pie. Normally.)
Yet another way is to say that if it did, in fact, cross your personal boundary, then it was by definition not benign in the sense intended here.
This is a different place from where this essay seemed like it was going at the start:
I said (among other things) that I’d really enjoy some benign boundary violations.
Perhaps the terminology could be refined further, to make that more clear.
I predict that nonzero readers will be something-like offended, or perhaps alarmed, that I’m trying to crystallize a concept like “benign boundary violation” at all, since it could e.g. be abused to give cover to those other, worse things.
I object to the terminology—it’s not clear.
Examples like ‘me and [friend name] like giving each other really hard high fives for fun. (but not too hard)’ are easy to get.
(remember, the fact that they are benign for me does not imply they are generally so):
I’d have added a ‘sometimes’ before the benign.
So they are indeed past the social boundary. But they didn’t violate my boundaries.
This is why the terminology is unclear. ‘boundary violation’ - which ‘boundary’?
but they all seem to dismiss a set of costs as not-being-costs, rather than properly weighing and accounting for them.
You could make this specific—costs to you.
For example: “Why don’t people just ask you if you’re chill with being hit with water balloons, and then ever after they can hit you with water balloons?”
‘What do you want for your party?’
‘Let’s have a waterballoon fight, etc.’
it’s still an update in the direction of diminished intimacy.
Do you want a party with fights with water balloons? Or a costume party? (With nerf darts?)
These are somewhat exclusive. Oh no! Choices!
Both, arguably requires more planning—costumes that are good with water (and to run in)...
We haven’t banned Reese’s from all public spaces, even though this is a hardship for people with peanut allergies, because it saves too few at too high a cost.
What public spaces have Reese’s?
(e.g. our society offers martial arts classes, which you can pay for and put into your weekly schedule. It does not offer friendly surprise attacks.)
Seems somewhat doable with pranks?
This is the part where I would like to have suggestions or recommendations or next actions, but I largely don’t. I didn’t anticipate this essay being nearly as fraught as it felt, when I first set out to write it. I thought that I would just say “sometimes it’s nice to be pushed into the pool,” and explain my three reasons why, and that would be that.
Obviously, we need to find a way to combine nerf darts with water in a way that isn’t terrible.
(And meanwhile the ten percent of people
This didn’t seem accurate? Did seem excessive.
Who (plural) like driving at 100 miles an hour?
And as a result, people are a little less creeping and terrified.
creeping?
bailey-and-motte’d
Enter stage right, the court will not accept X as evidence, but they will accept Y. (If a society is cool with a lot of things that aren’t cool, then there may be a sharp discontinuity where once a line is crossed that people know others will have their back, yeah, ‘the pitchforks come out’.)
In my culture. Not in this one. In this one, we don’t seem to have very many medium-sized responses left. We have some responses which average out to medium-sized, in that they’re sometimes huge and usually nothing, but that’s not the same thing.
Ah, is that a rejection I spy, of those utility axioms about which I have heard so much? (Or just ‘small consistent responses are more effective deterrence.’)
(And there are also, I suspect, a lot of people with legitimate medium-sized grievances who are going without justice because the only tools they have at their disposal are frowny-face stickers and hand grenades, and the former doesn’t suffice and the latter feels like overkill.)
And then there’s the problem of what happens next.
Which means that something like 4% of people will declare them to be okay; we round that to zero.
What if it’s going a little fast on a bike (with pedals) instead of a car?
Because the logical knots this ties various people into is hilarious. Mexico, you see, does not restrict imports of baby formula, so they are fully stocked and could easily handle our orders, and their babies seem to do fine.
And they don’t restrict exports?
What do you mean by narrative coherence?
There are other differences between the two, but I would say that depression is stronger than pessimism.
(Content warning: depressed/depressing sentiments.)
‘Everything seems to go wrong’
‘Why do anything?’
‘Nothing is worth doing.’
‘Life isn’t worth living.’
Only the first of these sounds like pessimism.
I don’t have a lot to say about the difference. There was a time when I thought things could be better if they were given a critical look. The flip side of that, is that things can be better if improved from an ‘optimistic perspective’.
If that benefit is actually realized, maybe the pessimist (often) avoids food poisoning by not eating at fast food restaurants (often). The optimist may gain from realizing/seizing opportunity, or trying things.
(‘Maybe squaring the circle is impossible. But I want to know why.’
’Then just read _’s proof that it’s impossible.′
‘I don’t see any reason it can’t be done, it seems like I just have to find a way. So I’m going to give it a go.’
(According to some proofs, squaring the circle is impossible ‘using only a [particular set of tools]’.))
There’s also something else there: ‘What’s the point in doing that? I want to.’ I think some stuff like doing less has an association with depression.
How is “”Depression is just contentment with a bad attitude” false exactly?
I don’t know where this is from. (It sounds like it’s responding to something.)
I didn’t find it confusing.
Ah. It looked like a useful technical term. (And here are a set of scenarios to watch out for epidemically, and what we call them, so they’re easy to remember.)
dry tinder
Where does this terminology come from?
https://www.readthesequences.com
https://www.readthesequences.com/#preface
I recommend the preface because it tells you [some] issues [that the author noticed] with the work. (It doesn’t mention ‘this research is no longer up to date/etc.’, but it does mention some other things:)
It was a mistake that I didn’t write my two years of blog posts with the intention of helping people do better in their everyday lives. I wrote it with the intention of helping people solve big, difficult, important problems, and I chose impressive-sounding, abstract problems as my examples.
In retrospect, this was the second-largest mistake in my approach. It ties in to the first-largest mistake in my writing, which was that I didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory. I didn’t realize that part was the priority; and regarding this I can only say “Oops” and “Duh.”
...
A third huge mistake I made was to focus too much on rational belief, too little on rational action.
The fourth-largest mistake I made was that I should have better organized the content I was presenting in the sequences. In particular, I should have created a wiki much earlier, and made it easier to read the posts in sequence.
That mistake at least is correctable. In the present work Rob Bensinger has reordered the posts and reorganized them as much as he can without trying to rewrite all the actual material (though he’s rewritten a bit of it).
...
My fifth huge mistake was that I—as I saw it—tried to speak plainly about the stupidity of what appeared to me to be stupid ideas.
...
To be able to look backwards and say that you’ve “failed” implies that you had goals. So what was it that I was trying to do?
...
In spite of how large my mistakes were, those two years of blog posting appeared to help a surprising number of people a surprising amount. It didn’t work reliably, but it worked sometimes. In modern society so little is taught of the skills of rational belief and decision-making, so little of the mathematics and sciences underlying them… that it turns out that just reading through a massive brain-dump full of problems in philosophy and science can, yes, be surprisingly good for you. Walking through all of that, from a dozen different angles, can sometimes convey a glimpse of the central rhythm.
Because it is all, in the end, one thing. I talked about big important distant problems and neglected immediate life, but the laws governing them aren’t actually different. There are huge gaps in which parts I focused on, and I picked all the wrong examples; but it is all in the end one thing. I am proud to look back and say that, even after all the mistakes I made, and all the other times I said “Oops”…
Even five years later, it still appears to me that this is better than nothing.
...-2015
There’s been some comments (or at least a post I think) on what that ‘book’/he had to say about Neural Networks. Understandability has been mentioned as an issue today, and I think that’s more of a problem where it’s less clear how to evaluate ‘ability’ or ‘performance’.
ETA:
1)
Added the [bracketed text] on the line duplicated below:
‘I recommend the preface because it tells you [some] issues [that the author noticed] with the work. ’
What if human preferences aren’t representable by a utility function
I’m responding to this specifically, rather than the question of RLHF and ‘human irrationality’.
I’m not saying this is the case, but what if ‘human preferences’ are representable by something more complicated. Perhaps an array or vector? Can it learn something like that?
What does this mean? Improve on what you’ve (the OP has) already written that’s here (LW) tagged corrigibility?
The overall point make sense, see how far you can go on:
‘principles for corrigbility’.
The phrasing at the end of the post was a little weird though.