I try to practice independent reasoning/critical thinking, to challenge current solutions to be more considerate/complete. I do not reply to DMs for non-personal (with respect to the user who reached out directly) discussions, and will post here instead with reference to the user and my reply.
ZY
Yeah that makes sense; the knowledge should still be there, just need to re-shift the distribution “back”
Haven’t looked too closely at this, but my initial two thoughts:
child consent is tricky.
likely many are foreign children, which may or may not be in the 75 million statistic
It is good to think critically, but I think it would be beneficial to present more evidence before making the claim or conclusion
This is very interesting, and thanks for sharing.
One thing that jumps out at me is they used an instruction format to prompt base models, which isn’t typically the way to evaluate base models. It should be reformatted to a completion type of task. If this is redone, I wonder if the performance of the base model will also increase, and maybe that could isolate the effect further to just RLHF.
I wonder if this has anything to do with also the number of datasets added on by RLHF (assuming a model go through supervised/instruction finetuning first, and then RLHF), besides the algorithm themselves.
Another good model to test on is https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 which only has instruction finetuning it seems as well.
The author seems to say that they figured it out at the end of the article, and I am excited to see their exploration in the next post.
I find it useful sometimes to think about “how to differentiate this term” when defining a term. In this case, in my mind it would be thinking about “reasoning”, vs “general reasoning” vs “generalization”.
Reasoning: narrower than general reasoning, probably would be your first two bullet points combined in my opinion
Generalization: even more general than general reasoning (does not need to be focused on reasoning). Seems could be the last two bullet points you have, particularly the third
General reasoning (this is not fully thought through): Now that we talked about “reasoning” and “generalization”, I see two types of definition
1. A bit closer to “reasoning”. first two of your bullet points, plus in multiple domains/multiple ways, but not necessarily unseen domains. In other simpler words, “reasoning in multiple domains and ways”.
2. A bit closer to “general” (my guess is this is closer to what you intended to have?): generalization ability, but focused on reasoning.
In my observation (trying to avoid I think!), “I think” is intended to (or actually should have been used to) point out perspective differences (which helps to lead to more accurate conclusions, including collaborative and effective communication), rather than confidence. In the latter case of misuse, it would be good if people clarify “this term is about confidence, not perspective in my sentence”.
True. I wonder for the average people, if being self-aware would at least unconsciously be a partial “blocker” on the next malevolence action they might do, and that may evolve across time too (even if it may take a bit longer than a mostly-good)
I highly agree with almost all of these points, and those are very consistent with my observation. As I am still relatively new to lesswrong, one big observation (based on my experience) I still see today, is disconnected concepts, definitions, and or terminologies with the academic language. Sometimes I see terminology that already exists in academia and introducing new concepts with the same name may be confusing without using channels academics are used to. There are some terms that I try to search on google for example, but the only relevant ones are from lesswrong or blogposts (which I still then read personally). I think this is getting better—in one of the recent conference reviews, I saw significant increase in submissions in AI safety working on X risks.
Another point as you have mentioned is the reverse ingestion of papers from academia; there are rich papers in interpretability as you have mentioned for example, and some concrete confusion I saw from professors or people already in that field is that why there is feels like a lack of connection with these papers or concepts, even though they seems to be pretty related.
About actions—many people that I see are concerned about AI safety risks in my usual professional group are people who are concerned about or working in current intentional risks like misuse. Those are actually also real risks and have already started (CSAM, deep fake porn with real people’s faces, privacy, potential bio/chem weapons), and needs to be worked on as well. It is hard to stop working on them, and transition directly to X risks.
However, I do think it is beneficial to keep merging the academic and AI safety groups, which I see are already underway with examples like more papers, and some PhD positions on AI Safety, industry positions etc; This will increase awareness of AI safety, and as you have mentioned the interests in the technical parts are shared, as they could be applied potentially to many kinds of safety, and hopefully not that much on capabilities (though sometimes not separable).
What would be some concrete examples/areas to work on for human flourishing? (Just saw a similar question on the definition; I wonder what could be some concrete areas or examples)
True; and they would only need to merge up to they reach a “swing state” type of voting distribution.
That would be interesting; on the other hand, why not just merge all the states? I guess it would be a more dramatic change and may be harder to execute and unnecessary in this case.
Yes, what I meant is exactly “there is no must, but only want”. But it feels like a “must” in some context that I am seeing, but I do not recall exactly where. And yeah true, there may be some survival bias.
I agree it is tragedy from human race’s perspective, but I think what I meant is from a non-human perspective to view this problem. For example, to an alien who is observing earth, human is just another species that rise up as a dominant species, as a thought experiment.
(On humans prefer to be childless—actually this already slowed down in many countries due to cost of raising a child etc, but yeah this is a digress on my part.)
My two cents:
The system has a fixed goal that it capably works towards across all contexts.
The system is able to capably work towards goals, but which it does, if any, may depend on the context.
From these two above, seems it would be good for you to define/clarify what exactly you mean by “goals”. I can see two definitions: 1. goals as in a loss function or objective that the algorithm is optimizing towards, 2. task specific goals like summarize an article, planning. There may be some other goals that I am unaware of, or this is obvious elsewhere in some context that I am not aware of. (From the shortform in the context shared, seems to be 1, but I have a vague feeling that 2 may not be aligned on this.)
For the example with dQw4w9WgXcQ in your initial operationalization when you were wondering about if it always generate Q—it just depends on the frequency. A good paper is https://arxiv.org/pdf/2202.07646 on frequency of this data and their rate of memorization if you were wondering if it is always (same context with training data, not different context/instruction).
I think that is probably not a good reason to be libertarian in my opinion? Could you also share maybe how much older were your than you siblings? If you are not that far apart, you and your siblings came from the same starting line, distributing is not going to happen in real life economically nor socially even if not libertarian (in real life, where we need equity is when the starting line is not the same and is not able to be changed by choice. A more similar analogy might be some kids are born with large ears, and large ears are favored by the society, and the large eared kids always get more candy). If you are ages apart with you being a lot older, it may make some limited sense to for your parents to re-distribute.
I am not quite sure about the writing/examples in computational kindness and responsibility offloading, but I think I feel the general idea.
For computational kindness, I think it is just really the difference in how people prefer to communicate, or making plans it seems, with the example on trip planning. I, for example, personally prefer being offered with their true thoughts—if they are okay with just really anything, or not. Anything is fine as long as that is what they really think or prefer (side talk: I generally think communicating real preferences will be the most efficient). I do not mind planning the trip myself in ways that I wanted. There is not really a right or wrong style. If the host person offered “anything is okay”, but the receiver do not like planning, they could also simply say “Any recommendations? I like xxx or xxx generally.” Communication goes both ways. The reason I think we should not say one way is better than another is, if the friend really wants to plan themselves, and then the host person planned a bunch, the receiver may also feel bad to reject the planned activities. Maybe what you really want to see, is that the host person cares enough to put some efforts into planning (just guessing)? And seems the relationship in this example between these two people are relatively close or require showing some efforts?
For responsibility offloading, I think some of these examples are not quite similar or parallel situations, but I generally get the proposal: “do not push other people in a pushy manner and should offer the clear option to say no” as opposed to a fake ask. In my opinion a fake ask is not true kindness—it is fake, so it is not really in any way kind. But at the same time, I’ve trained myself into taking the question literally—okay, if you asked, then it means you expect my answer to go both ways, then I will say no if I am thinking no. In the case the question is genuine—great! in the case it is not, too bad, the smoker should’ve be consistent with their words.
Most of these seems to be communication style differences, that just require another communication to sort out if the two parties need to communicate frequently.
Ah thanks. Do you know why these former rationalists were “more accepting” of irrational thinking? And to be extremely clear, does “irrational” here mean not following one’s preference with their actions, and not truth seeking when forming beliefs?
I don’t understand either. If it is meant what it meant, this is a very biased perception and not very rational (truth seeking or causality seeking). There should be better education systems to fix that.
On what evidence do I conclude what I think is know is correct/factual/true and how strong is that evidence? To what extent have I verified that view and just how extensively should I verify the evidence?
For this, aside from traditional paper reading from credible sources, one good approach in my opinion is to actively seek evidence/arguments from, or initiate conversations with people who have a different perspective with me (on both side of the spectrum if the conclusion space is continuous).
I am interested in learning more about this, but not sure what “woo” means; after googling, is it right to interpret as “unconventional beliefs” of some sort?
I personally agree with your reflection on suffering risks (including factory farming, systemic injustices, and wars) and the approach to donating to different cause areas. My (maybe unpopular under “prioritizing only 1” type of mindset) thought is: maybe we should avoid prioritizing only one single area (especially collectively), but recognize that in reality there are always multiple issues we need to fight about/solve. Personally we could focus professionally on one issue, and volunteer for/donate to another cause area, depending on our knowledge, interests, and ability; additionally, we could donate to multiple cause areas. Meanwhile, a big step is to be aware of and open our ears to the various issues we may be facing as a society, and that will (I hope) translate into multiple type of actions. After all some of these suffering risks involve human actions, and each of us doing something differently could help with reducing these suffering risks in both short and long term. But there are also many things that I do not know how to best balance as well.
A side note—I also hope you are not very very sad by thinking of “missing crucial considerations” (but also appreciate that you are trying to gather more information and learn more quickly; we all should do more of this too)! The key to me might be an open mind and the ability to consider different aspects of things; hopefully we will be on the path towards something “more complete”. Proactively, one approach I often try to do is talking to people who are passionate in different areas, who are different from me, and understand more from there. Also, I sometimes refer to https://www.un.org/en/global-issues for some ideas.