LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
Curated! It strikes me that asking “how would I update in response to...?” is both sensible and straightforward thing to be asking and yet not a form of question I’m seeing. I think we could be asking the same about slow vs fast takeoff, etc. and similar questions.
The value and necessity of this question also isn’t just about not waiting for future evidence to come in, but realizing that “negative results” require interpretation too. I also think there’s a nice degree of “preregistration” here is well that seems neat and maybe virtuous. Kudos and thank you.
I’m curious why the section on “Applying Rationality” in the About page you cited doesn’t feel like an answer.
Applying Rationality
You might value Rationality for its own sake, however, many people want to be better reasoners so they can have more accurate beliefs about topics they care about, and make better decisions.
Using LessWrong-style reasoning, contributors to LessWrong have written essays on an immense variety of topics on LessWrong, each time approaching the topic with a desire to know what’s actually true (not just what’s convenient or pleasant to believe), being deliberate about processing the evidence, and avoiding common pitfalls of human reason.
Beyond that, The Twelve Virtues of Rationality includes “scholarship” as the 11th virtue, and I think that’s a deep part of LessWrong’s culture and aims:
The eleventh virtue is scholarship. Study many sciences and absorb their power as your own. Each field that you consume makes you larger. If you swallow enough sciences the gaps between them will diminish and your knowledge will become a unified whole. If you are gluttonous you will become vaster than mountains. It is especially important to eat math and science which impinge upon rationality: evolutionary psychology, heuristics and biases, social psychology, probability theory, decision theory. But these cannot be the only fields you study. The Art must have a purpose other than itself, or it collapses into infinite recursion.
I would think it strange though if one could get better about reasoning and believing true things without actually trying to do that on specific cases. Maybe you could sketch out what you expect LW content to look like more.
Errors are my own
At first blush, I find this caveat amusing.
1. If there are errors, we can infer that those providing feedback were unable to identify them.
2. If the author was fallible enough to have made errors, perhaps they are are fallible enough to miss errors in input sourced from others.
What purpose does it serve? Given its often paired with “credit goes to..<list of names> it seems like an attempt that people providing feedback/input to a post are only exposed to upside from doing so, and the author takes all the downside reputation risk if the post is received poorly or exposed as flawed.
Maybe this works? It seems that as a capable reviewer/feedback haver, I might agree to offer feedback on a poor post written by a poor author, perhaps pointing out flaws, and my having given feedback on it might reflect poorly on my time allocation, but the bad output shouldn’t be assigned to me. Whereas if my name is attached to something quite good, it’s plausible that I contributed to that. I think because it’s easier to help a good post be great than to save a bad post.
But these inferences seem like they’re there to be made and aren’t changed by what an author might caveat at the start. I suppose the author might want to remind the reader of them rather than make them true through an utterance.
Upon reflection, I think (1) doesn’t hold. The reviewers/input makers might be aware of the errors but be unable to save the author from them. (2) That the reviewers made mistakes that have flowed into the piece seems all the more likely the worse the piece is overall, since we can update that the author wasn’t likely to catch them.
On the whole, I think I buy the premise that we can’t update too much negatively on reviewers and feedback givers from them having deigned to give feedback on something bad, though their time allocation is suspect. Maybe they’re bad at saying no, maybe they’re bad at dismissing people’s ideas aren’t that good, maybe they have hope for this person. Unclear. Upside I’m more willing to attribute.
Perhaps I would replace the “errors are my my own[, credit goes to]” with a reminder or pointer that these are the correct inferences to make. The words themselves don’t change them? Not sure, just musing here.
Edited To Add: I do think “errors are my own” is a very weird kind of social move that’s being performed in an epistemic contexts and I don’t like.
This post is comprehensive but I think “safetywashing” and “AGI is inherently risky” are far too towards and the end and get too little treatment, as I think they’re the most significant reasons against.
This post also makes no mention of race dynamics and how contributing to them might outweigh the rest, and as RyanCarey says elsethread, doesn’t talk about other temptations and biases that push people towards working at labs and would apply even if it was on net bad.
Curated. Insurance is a routine part of life, whether it be the car and home insurance we necessarily buy or the Amazon-offered protection one reflexively declines, the insurance we know doctors must have, businesses must have, and so on.
So it’s pretty neat when someone comes along along and (compellingly) says “hey guys, you (or are at least most people) are wrong about when insurance makes sense to buy, the reasons you have are wrong, here’s the formula”.
While assumptions can be questioned, e.g. infiniteness badness of going bankrupt and other factors can be raised, this is just a neat technical treatment of a very practical, everyday question. I expect that I’ll be thinking in terms of this myself making various insurance choices. Kudos!
Curated. This is a good post and in some ways ambitious as it tries to make two different but related points. One point – that AIs are going to increasingly commit shenanigans – is in the title. The other is a point regarding the recurring patterns of discussion whenever AIs are reported to have committed shenanigans. I reckon those patterns are going to be tough to beat, as strong forces (e.g. strong pre-existing conviction) cause people to take up the stances they do, but if there’s hope for doing better, I think it comes from understanding the patterns.
There’s a good round up of recent results in here that’s valuable on its own, but the post goes further and sets out to do something pretty hard in advocating for the correct interpretation of the results. This is hard because I think the correct interpretation is legitimately subtle and nuanced, with the correct update depending on your starting position (as Zvi explains). It sets out and succeeds.
Lastly, I want to express my gratitude for Zvi’s hyperlinks to lighter material, e.g. “Not great, Bob” and “Stop it!” It’s a heavy world with these topics of AI, and the lightness makes the pill go down easier. Thanks
Yes, true, fixed, thanks!
Dog: “Oh ho ho, I’ve played imaginary fetch before, don’t you worry.”
My regular policy is to not frontpage newsletters, however I frontpaged this one as it’s the first in the series and I think it’s neat for more people to know this is a series Zvi intends to write.
Curated! I think it’s generally great when people explain what they’re doing and why in way legibile to those not working on it. Great because it let’s others potentially get involved, build on it, expose flaws or omissions, etc. This one seems particularly clear and well written. While I haven’t read all of the research, nor am I particularly qualified to comment on it, I like the idea of a principled/systematic approach behind, in comparison to a lot of work that isn’t coming on a deeper, bigger, framework.
(While I’m here though, I’ll add a link to Dmitry Vaintrob’s comment that Jacob Hilton described as “best critique of ARC’s research agenda that I have read since we started working on heuristic explanations”. Eliciting such feedback is the kind of good thing that comes out of up writing agendas – it’s possible or likely Dmitry was already tracking the work and already had these critiques, but a post like this seems like a good way to propagate them and have a public back and forth.)
Roughly speaking, if the scalability of an algorithm depends on unknown empirical contingencies (such as how advanced AI systems generalize), then we try to make worst-case assumptions instead of attempting to extrapolate from today’s systems.
I like this attitude. The human standard, I think often in alignment work too, is to argue why one’s plan will work and find stories for that, and adopting the methodology of the opposite, especially given the unknowns, is much needed in alignment work.
Overall, this is neat. Kudos to Jacob (and rest of the team) for taking the time to put this all together. Doesn’t seem all that quick to write, and I think it’d be easy to think they ought to not take time out off from further object-level research to write it. Thanks!
Thanks! Fixed
Curated. I really like that even though LessWrong is 1.5 decades old now and has Bayesianism assumed as background paradigm while people discuss everything else, nonetheless we can have good exploration of our fundamental epistemological beliefs.
The descriptions of unsolved problems, or at least incompleteness of Bayesianism strikes me as technically correct. Like others, I’m not convinced of Richard’s favored approach, but it’s interesting. In practice, I don’t think these problems undermine the use of Bayesianism in typical LessWrong thought. For example, I never thought of credences being applied to “propositions” rigorously, and more like “hypotheses” or possibilities for how things are that could be framed as models already too. Context-dependent terms like “large” or quantities without explicit tolerances like “500ft” are the kind of things that you you taboo or reduce if necessary either for your own reasoning or a bet
That said, I think the claims about mistakes and downstream consequences of the way people do Bayesianism are interesting. I’m reading a claim here I don’t recall seeing. Although we already knew that bounded reasons aren’t logically omniscient, Richard is adding a claim (if I’m understanding correctly) that this means that no matter how much strong evidence we technically have, we shouldn’t have really high confidence in any domain that requires heavy of processing that evidence, because we’re not that good at processing. I do think that leaves us with a question of judging when there’s enough evidence to be conclusive without complicated processing or not.
Something I might like a bit more factored out is the rigorous gold-standard epistemological framework and the manner in which we apply our epistemology day to day.
I fear this curation notice would be better if I’d read all the cited sources on critical rationalism, Knightian uncertainty, etc., and I’ve added them to my reading list. All in all, kudos for putting some attention on the fundamentals.
Welcome! Sounds like you’re on the one hand at start of a significant journey but also you’ve come a long distance already. I hope you find much helpful stuff on LessWrong.
I hadn’t heard of Daniel Schmachtenberger, but I’m glad to have learend of him and his works. Thanks.
The actual reason why we lied in the second message was “we were in a rush and forgot.”
My recollection is we sent the same message to the majority group because:
Treating it different would require special-casing it and that would have taken more effort.
If selectors of different virtues had received a different messages, we wouldn’t be able to have a properly compared their behavior.
[At least in my mind], this was a game/test and when playing games you lie to people in the context of the game to make things work. Alternatively, it’s like how scientific experimenters mislead subjects for the sake of the study.
Added!
Added!
Money helps. I could probably buy a lot of dignity points for a billion dollars. With a trillion variance definitely goes up because you could try crazy stuff and could backfire. (I mean true for a billion too). But EV of such a world is better.
I don’t think there’s anything that’s as simple as writing a check though.
US Congress gives money to specific things. I do not have a specific plan for a trillion dollars.
I’d bet against Terrance Tao being some kind of amazing breakthrough researcher who changes the playing field.
Your access should be activated within 5-10 minutes. Look for the button in the bottom right of the screen.
Not an original observation but yeah, separate from whether it’s desirable, I think we need to be planning for it.
duplicate with Hyperstitions