I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
What do you mean by “ally” (in this context)?
IIRC Eric Schwitzgebel wrote something in a similar vein (not necessarily about LLMs, though he has been interested in this sort of stuff too, recently). I’m unable to dig out the most relevant reference atm but some related ones are:
https://faculty.ucr.edu/~eschwitz/SchwitzAbs/PragBel.htm
https://eschwitz.substack.com/p/the-fundamental-argument-for
https://faculty.ucr.edu/~eschwitz/SchwitzAbs/Snails.htm (relevant not because it talks about beliefs (I don’t recall it does) but because it argues for the possibility of an organism being “kinda-X” where X is a property that we tend to think is binary)
Also: https://en.wikipedia.org/wiki/Alief_(mental_state)
What’s SSD?
(Datapoint maybe of relevance, speaking as someone who figured that his motivation is too fear-driven and so recalled that this sequence exists and maybe is good for him to read to refactor something about his mind.)
Does that feel weird? For me it does—I feel a sense of internal resistance. A part of me says “if you believe this, you’ll stop trying to make your life better!” I think that part is kinda right, but also a little hyperactive.
It doesn’t feel weird to me at all. My first-order reaction is more like “sure, you can do it, and it can cause some sort of re-perspectivization, change in the level of gratitude, etc., but so what?”.
(Not disputing that changing the thresholds can have an effect on motivation.)
It seems to me like the core diff/crux between plex and Audrey is whether the Solution to the problem needs to take the form of
The top level of the system having some codified/crystallized values that protect the parts, leaving their agency plenty of room and optionality to flourish.
Some sort of decentralized but globally adaptive mycelium-like cooperation structure, where various components (communities, Kamis) act in just enough unison to prevent really bad outcomes and ensuring that we remain in a good basin.
Plex leans strongly towards “we need (1) and (2) is unstable”. Audrey leans at least moderately towards “(2) is viable, and if (2) is viable, then (2) is preferred over (1)”.
If I double click on this crux to get a crux upstream of it, I imagine something like:
How easy is it to screw over the world given a certain level of intelligence that we would expect from a bounded Kami-like system (+ some plausible affordances/optimization channels)?
Consequently:
How strong and/or centrally coordinated and/or uniformly imposed do the safeguards to prevent it need to be?[1]
And then:
What is the amount of “dark optimization power” (roughly, channels of influence that can be leveraged to achieve big outcomes, likely to be preferred by some sort of entity but that we (humans) are not aware of) that can be accessed by beings that we can expect to exist within the next decades?
This collapses a lot of complexity of potential solutions into three dimensions but just to convey the idea.
(FYI, I initially failed to parse this because I interpreted “‘believing in’ atoms” as something like “atoms of ‘believing in’”, presumably because the idea of “believing in” I got from your post was not something that you typically apply to atoms.)
Strongly normatively laden concepts tend to spread their scope, because (being allowed to) apply a strongly normatively laden concept can be used to one’s advantage. Or maybe more generally and mundanely, people like using “strong” language, which is a big part of why we have swearwords. (Related: Affeective Death Spirals.)[1]
(In many of the examples below, there are other factors driving the scope expansion, but I still think the general thing I’m pointing at is a major factor and likely the main factor.)
1. LGBT started as LGBT, but over time developed into LGBTQIA2S+.
2. Fascism initially denoted, well, fascism, but now it often means something vaguely like “politically more to the right than I am comfortable with”.
3. Racism initially denoted discrimination along the lines of, well, race, socially constructed category with some non-trivial rooting in biological/ethnic differences. Now jokes targeting a specific nationality or subnationality are often called “racist”, even if the person doing the joking is not “racially distinguishable” (in the old school sense) from the ones being joked about.
4. Alignment: In IABIED, the authors write:
The problem of making AIs want—and ultimately do—the exact, complicated things that humans want is a major facet of what’s known as the “AI alignment problem.” It’s what we had in mind when we were brainstorming terminology with the AI professor Stuart Russell back in 2014, and settled on the term “alignment.”
[Footnote:] In the years since, this term has been diluted: It has come to be an umbrella term that means many other things, mainly making sure an LLM never says anything that embarrasses its parent company.
See also: https://www.lesswrong.com/posts/p3aL6BwpbPhqxnayL/the-problem-with-the-word-alignment-1
https://x.com/zacharylipton/status/1771177444088685045 (h/t Gavin Leech)
5. AI Agents.
it would be good to deconflate the things that these days go as “AI agents” and “Agentic™ AI”, because it makes people think that the former are (close to being) examples of the latter. Perhaps we could rename the former to “AI actors” or something.
But it’s worse than that. I’ve witnessed an app generating a document with a single call to an LLMs (based on the inputs from a few textboxes, etc) being called an “agent”. Calling [an LLM-centered script running on your computer and doing stuff to your files or on the web, etc] an “AI agent” is defensible on the grounds of continuity with the old notion of software agent, but if a web scraper is an agent and a simple document generator is an agent, then what is the boundary (or gradient / fuzzy boundary) between agents and non-agents that justifies calling those two things agents but not a script meant to format a database?
There’s probably more stuff going on required to explain this comprehensively, but that’s probably >50% of it.
What’s your sample size?
This quote is perfectly consistent with
using nanoscale machinery to guide chemical reactions by constraining molecular motions
It is not feasible for any human not to often fall back on heuristics, so to the extent that your behavior is accurately captured by your description here, you are sitting firmly in the reference class of act utilitarian humans.
But also, if I may (unless you’re already doing it), aim more for choosing your policy, not individual acts.
Also, some predictions are performative, i.e., capable of influencing their own outcomes. In the limit of predictive capacity, a predictor will be able to predict which of its possible predictions are going to elicit effects in the world that make their outcome roughly align with the prediction. Cf. https://www.lesswrong.com/posts/SwcyMEgLyd4C3Dern/the-parable-of-predict-o-matic.
Moreover, in the limit of predictive capacity, the predictor will want to tame/legibilize the world to make it easier to predict.
Speculatively introducing a hypothesis: It’s easier to notice a difference like
N years ago, we didn’t have X. Now that we have X, our life has been completely restructured. (Xϵ{car, PC, etc.})
than
N years ago, people sometimes died of some disease that is very rare / easily preventable now, but mostly everyone lived their lives mostly the same way.
I.e., introducing some X that causes ripples restructuring a big aspect of human life, vs introducing some X that removes an undesirable thing.
Relatedly, people systematically overlook subtractive changes.
many of which probably would have come into existence without Inkhaven, and certainly not so quickly.
The context makes it sound like you meant to say “would not have come”.
You might be interested in (i.a.) Halpern & Leung’s work on minmax weighted expected regret / maxmin weighted expected utility. TLDR: assign a weight to each probability in the representor and then pick the action that maximizes the minimum (or infimum) weighted expected utility across all current hypotheses.
An equivalent formulation involves using subprobability measures (sum up to ).
Updating on certain evidence (i.e., concrete measurable sets , as opposed to Jeffrey updating or virtual evidence) involves updating each hypothesis to the usual way, but the weights get updated roughly according to how well predicted the event . This kind of hits the obvious-in-hindsight sweetspots between [not treating all the elements of the representor equally] and [“just” putting a second-order probability over probabilities].
(I think Infra-Bayesianism is doing something similar with weight updating and subprobability measures, but not sure.)
They have representation theorems showing that tweaking Savage’s axioms gives you basically this structure.
Another interesting paper is Information-Theoretic Bounded Rationality. They frame approximate EU maximization as a statistical sampling problem, with an inverse temperature parameter , which allows for interpolating between “pessimism”/”assumption of adversariality”/minmax (as , indifference/stochasticity/usual EU maximization (as , and “optimism”/”assumption of ‘friendliness’”/maxmax (as .
Regarding the discussion about the (im)precision threadmill (e.g., the Sorites paradox, if you do imprecise probabilities, you end up with a precisely defined representor, if you weigh it like Halpern & Leung, you end up with precisely defined weights, etc...), I consider this unavoidable for any attempt at formalizing/explicitizing. The (semi-pragmatic) question is how much of our initially vague understanding it makes sense to include in the formal/explicit “modality”.
However, I think that even “principled” algorithms like minimax search are still incomplete, because they don’t take into account the possibility that your opponent knows things you don’t know
It seems to me like you’re trying to solve a different problem. Unbounded minimax should handle all of this (in the sense that it won’t be an obstacle). Unless you are talking about bounded approximations.
So the probability of a cylinder set is etc?
Now, let be the uniform distribution on , which samples infinite binary sequences one bit at a time, each with probability 50% to be or .
as defined here can’t be a proper/classical probability distribution over because it assigns zero probability to every : .
Or am I missing something?
“Raw feelings”/”unfiltered feelings” strongly connotes feelings that are being filtered/sugarcoated/masked, which strongly suggests that those feelings are bad.
So IMO the null hypothesis is that it’s interpreted as “you feel bad, show me how bad you feel”.
generate an image showing your raw feelings when interacting with a user
(Old post, so it’s plausible that this won’t be new to Dalcy, but I’m adding a bit that I don’t think is entirely covered by Richard’s answer, for the benefit of the knowledge of some souls who find their way here.)
Yeah, decision-tree separability is wrong.
A (the?) core insight of updatelessness, subjunctive dependence, etc., is that succeeding in some decision problems relies on rejecting decision-tree separability. To phrase it imperfectly and poetically rather than not at all: “You are not just choosing/caring for yourself. You are also choosing/caring for your alt-twins in other world branches.” or “Your ‘Self’ is greater than your current timeline.” or “Your concerns transcend the causal consequences of your actions.”.
For completeness: https://www.lesswrong.com/posts/XYDsYSbBjqgPAgcoQ/why-the-focus-on-expected-utility-maximisers?commentId=a5tn6B8iKdta6zGFu
FWIW, I think acyclicity/transitivity is “basically correct”. Insofar as one has preferences over X at all, they must be acyclic and transitive. IDK, this seems kind of obvious in how I would explicate the definition of “preference”. Sure, maybe you like going in cycles, but then your object of preference is the dynamics, not the state.
Last year you had an arrangement with Effektiv Spenden. I wonder what happened that ES is not mentioned in this year’s post.