Theoretical Computer Science Msc student at the University of [Redacted] in the United Kingdom.
I’m an aspiring alignment theorist; my research vibes are descriptive formal theories of intelligent systems (and their safety properties) with a bias towards constructive theories.
I think it’s important that our theories of intelligent systems remain rooted in the characteristics of real world intelligent systems; we cannot develop adequate theory from the null string as input.
DragonGod
Does anyone know a ChatGPT plugin for browsing documents/webpages that can read LaTeX?
The plugin I currently use (Link Reader) strips out the LaTeX in its payload, and so GPT-4 ends up hallucinating the LaTeX content of the pages I’m feeding it.
How frequent are moderation actions? Is this discussion about saving moderator effort (by banning someone before you have to remove the rate-limited quantity of their bad posts), or something else? I really worry about “quality improvement by prior restraint”—both because low-value posts aren’t that harmful, they get downvoted and ignored pretty easily, and because it can take YEARS of trial-and-error for someone to become a good participant in LW-style discussions, and I don’t want to make it impossible for the true newbies (young people discovering this style for the first time) to try, fail, learn, try, fail, get frustrated, go away, come back, and be slightly-above-neutral for a bit before really hitting their stride.
I agree with Dagon here.
Six years ago after discovering HPMOR and reading part (most?) of the Sequences, I was a bad participant in old LW and rationalist subreddits.
I would probably have been quickly banned on current LW.
It really just takes a while for people new to LW like norms to adjust.
I find noticing surprise more valuable than noticing confusion.
Hindsight bias and post hoc rationalisations make it easy for us to gloss over events that were apriori unexpected.
I think the model of “a composition of subagents with total orders on their preferences” is a descriptive model of inexploitable incomplete preferences, and not a mechanistic model. At least, that was how I interpreted “Why Subagents?”.
I read @johnswentworth as making the claim that such preferences could be modelled as a vetocracy of VNM rational agents, not as claiming that humans (or other objects of study) are mechanistically composed of discrete parts that are themselves VNM rational.
I’d be more interested/excited by a refutation on the grounds of: “incomplete inexploitable preferences are not necessarily adequately modelled as a vetocracy of parts with complete preferences”. VNM rationality and expected utility maximisation is mostly used as a descriptive rather than mechanistic tool anyway.
Oh, do please share.
Suppose it is offered (by a third party) to switch and then
Seems incomplete (pun acknowledged). I feel like there’s something missing after “to switch” (e.g. “to switch from A to B” or similar).
Another example is an agent through time where as in the Steward of Myselves
This links to Scott Garrabrant’s page, not to any particular post. Perhaps you want to review that?
I think you meant to link to: Tyranny of the Epistemic Majority.
Ditto for me.
I’ve been waiting for this!
We aren’t offering these criteria as necessary for “knowledge”—we could imagine a breaker proposing a counterexample where all of these properties are satisfied but where intuitively M didn’t really know that A′ was a better answer. In that case the builder will try to make a convincing argument to that effect.
Bolded should be sufficient.
In fact, I’m pretty sure that’s how humans work most of the time. We use the general-intelligence machinery to “steer” ourselves at a high level, and most of the time, we operate on autopilot.
Yeah, I agree with this. But I don’t think the human system aggregates into any kind of coherent total optimiser. Humans don’t have an objective function (not even approximately?).
A human is not well modelled as a wrapper mind; do you disagree?
Thus, any greedy optimization algorithm would convergently shape its agent to not only pursue , but to maximize for ’s pursuit — at the expense of everything else.
Conditional on:
Such a system being reachable/accessible to our local/greedy optimisation process
Such a system being actually performant according to the selection metric of our optimisation process
I’m pretty sceptical of #2. I’m sceptical that systems that perform inference via direct optimisation over their outputs are competitive in rich/complex environments.
Such optimisation is very computationally intensive compared to executing learned heuristics, and it seems likely that the selection process would have access to much more compute than the selected system.
See also: “Consequentialism is in the Stars not Ourselves”.
Do please read the post. Being able to predict human text requires vastly superhuman capabilities, because predicting human text requires predicting the processes that generated said text. And large tracts of text are just reporting on empirical features of the world.
Alternatively, just read the post I linked.
Oh gosh, how did I hallucinate that?
In what sense are they “not trying their hardest”?
It is not clear how they could ever develop strongly superhuman intelligence by being superhuman at predicting human text.
which is indifferent to the simplicify of the architecture the insight lets you find.
The bolded should be “simplicity”.
Sorry, please where can I get access to the curriculum (including the reading material and exercises) if I want to study it independently?
The chapter pages on the website doesn’t seem to list full curricula.
I find this a bit confusing as worded, is something missing?