Morgan_Rogers

Karma: 92

Morgan_Rogers May 30, 2025, 9:34 AM
1 point
0
on: My supervillain origin story
I can say from first and second-hand experience that a hard part of supervising a PhD or Masters student in research (there are many) is taking someone who lies at one end of the bird-frog spectrum and pushing them to acquire the skills they need from the other end. To get to the point of pursuing research in the first place, you’re likely to be either someone technically skilled who can easily work out the fine details of a problem and habitually focuses on examples or someone who has enough of an appreciation for the overarching ideas to be motivated to build them further—it sounds like you are/were of the latter variety. If you don’t acquire some skills and perspective from the other end, you’ll inevitably drive yourself into a dead end: in the former case, one risks sinking much time into elaborating specific cases while missing a general result that simplifies matters; in the latter, one can work for a long time on a false claim because there is insufficient grounding in verifiable cases.
At the start of a research career, a responsible supervisor must push their student to be independent, but there is a compromise between giving space and giving guidance. It seems like your adviser wasn’t paying close enough attention to your work to see that you hadn’t done the basics, which is how you ended up spending so long on this without realising that you didn’t have an ‘empirical’ basis for what you were trying to prove in the first place. The fact that you weren’t getting pushback on your reluctance to read references also seems like a red flag.
All this is to say that a moral of the story could be for PhD supervisors (who, by the way, almost universally get not specific training for that role): just because a student is confident doesn’t mean they have everything it takes to do research, and you need to make sure that they aren’t wasting their time.

Morgan_Rogers Oct 5, 2022, 12:29 PM
2 points
0
in reply to: TW123’s comment on: A Bird’s Eye View of the ML Field [Pragmatic AI Safety #2]
This post sought to give an overview of how they do this, which is in my view extremely useful information!
This is what I was trying to question with my comment above: Why do you think this? How am I to use this information? It’s surely true that this is a community that needs to be convinced of the importance of work on safety, as you point out in the next post in the sequence, but how does information about, say, the turnover of ML PhD students help me do that?
Thus to answer the question “what kind of research approaches generally work for shaping machine learning systems?” it is quite useful to engage with how they have worked in capabilities advancements. In machine learning, theoretical (in “math proofs” sense of the word) approaches to advancing capabilities have largely not worked. This suggests deep learning is not amenable to these kinds of approaches.
There is conflation happening here which undermines your argument: theoretical approaches dominated how machine learning systems were shaped for decades, and you say so at the start of this post. It turned out that automated learning produced better results in terms of capabilities, and it is that success that makes it the continued default. But the former fact surely says a lot more about whether or not theory can “shape machine learning systems” than the latter. Following through with your argument, I would instead conclude that implementing theoretical approaches to safety might require us to compromise on capabilities, and this is indeed exactly what I expect: learning systems would have access to much more delicious data if they ignored privacy regulations and other similar ethical boundaries, but safety demands that capability is not the singular shaping consideration in AI systems.
Knowledge that useable theory has not really been produced in deep learning suggests to me that it’s unlikely to for safety, either.
This is simply not true. Failure modes which were identified by purely theoretical arguments have been realised in ML systems. System attacks and pathological behaviour (for image classifiers, say) are regularly built in theory before they ever meet real systems. It’s also worth noting that any architecture choices or to, say, make backprop more algorithmically efficient, are driven by theory.
In the end, my attitude is not that “iterative engineering practices will never ensure safety”, but rather that there are plenty of people already doing iterative engineering, and that while it’s great to convince as many of those as possible to be safety-conscious, there would be further benefits to safety if some of their experience could be applied to the theoretical approaches that you’re actively dismissing.

Morgan_Rogers Sep 14, 2022, 3:06 PM
LW: 10 AF: 4
0
AF
on: A Bird’s Eye View of the ML Field [Pragmatic AI Safety #2]
There is a disheartening irony to calling this series “Practical AI Safety” and having the longest post being about capabilities advancements which largely ignore safety.
The first part of this post consists in observing that ML applications proceed from metrics, and subsequently arguing that theoretical approaches have been unsuccessful in learning problems. This is true but irrelevant for safety, unless your proposal is to apply ML to safety problems, which reduces AI Safety to ‘just find good metrics for safe behaviour’. This seems as far from a pragmatic understanding of what is needed in AI Safety as one can get.
In the process of dismissing theoretical approaches, you ask “Why do residual connections work? Why does fractal data augmentation help?” These are exactly the kind of questions which we need to be building theory for, not to improve performance, but for humans to understand what is happening well enough to identify potential risks orthogonal to the benchmarks which such techniques are improving against, or trust that such risks are not present.
You say, “If we want to have any hope of influencing the ML community broadly, we need to understand how it works (and sometimes doesn’t work) at a high level,” and provide similar prefaces as motivation in other sections. I find these claims credible, assuming the “we” refers to AI Safety researchers, but considering the alleged pragmatism of this sequence, it’s surprising to me that none of the claims are followed up with suggested action points. Given the information you have provided, how can we influence this community? By publishing ML papers at NeurIPS? And to what end are you hoping to influence them? AI Safety can attract attention, but attention alone doesn’t translate into progress (or even into more person-hours).
Your disdain for theoretical approaches is transparent here (if it wasn’t already from the name of this sequence). But your reasoning cuts both ways. You say, “Even if the current paradigm is flawed and a new paradigm is needed, this does not mean that [a researcher’s] favorite paradigm will become that new paradigm. They cannot ignore or bargain with the paradigm that will actually work; they must align with it.” I expect that ‘metrics suffice’, (a strawperson of) your favoured paradigm, will not be the paradigm that will actually work, and it’s disappointing that your sequence carries the message (to my reading) that technical ML researchers can make significant progress in alignment and safety without really changing what they’re doing.

Goal-directedness: relativising complexity

Morgan_RogersAug 18, 2022, 9:48 AM

3 points

0 comments11 min readLW link

Goal-directedness: tackling complexity

Morgan_RogersJul 2, 2022, 1:51 PM

8 points

0 comments38 min readLW link

Examining Armstrong’s category of generalized models

Morgan_RogersMay 10, 2022, 9:07 AM

14 points

0 comments7 min readLW link

Morgan_Rogers Mar 24, 2022, 8:12 AM
3 points
0
AF
on: Job Offering: Help Communicate Infrabayesianism
If I haven’t found a way to extend my post-doc position (ending in August) by mid-July and by some miracle this job offer is still open, it could be the perfect job for me. Otherwise, I look forward to seeing the results.

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

Morgan_RogersMar 19, 2022, 5:28 PM

4 points

1 comment21 min readLW link

Morgan_Rogers Mar 19, 2022, 5:25 PM
1 point
0
on: Goal-directedness: exploring explanations
A note on judging explanations
I should address a point that wasn’t addressed in the post, and which may otherwise be a point of confusion going forward: the quality of an explanation can be high according to my criteria even if it isn’t empirically correct. That is, there are some explanations of behaviour which may be falsifiable: if I am observing a robot, I could explain its behaviour in terms of an algorithm, and one way to “test” that explanation would be to discover the algorithm which the robot is in fact running. However, no matter the result of this test, the judged quality of the explanation is not affected. Indeed, there are two possible outcomes: either the actual algorithm provides a better explanation overall, or our explanatory algorithm could be a simpler algorithm with the same effects, and hence be a better explanation than the true one, since using this simpler algorithm is a more efficient way to predict the robot’s behaviour than simulating the robot’s actual algorithm.
This might seem counterintuitive at first, but it’s really just Occam’s razor in action. Functionally speaking, the explanations I’m talking about in this post aren’t intended to be recovering the specific algorithm the robot is running (just as we don’t need the specifics of its hardware or operating system); I am only concerned with accounting for the robot’s behaviour.

Morgan_Rogers Mar 19, 2022, 12:52 PM
1 point
0
on: Harmful Options
Suppose your computer games, in addition to the long difficult path to your level’s goal, also had little side-paths that you could use—directly in the game, as corridors—that would bypass all the enemies and take you straight to the goal, offering along the way all the items and experience that you could have gotten the hard way. And this corridor is always visible, out of the corner of your eye.
Even if you resolutely refused to take the easy path through the game, knowing that it would cheat you of the very experience that you paid money in order to buy—wouldn’t that always-visible corridor, make the game that much less fun? Knowing, for every alien you shot, and every decision you made, that there was always an easier path?
This exact phenomenon happens in Deus Ex: Human Revolution, where you can get around almost every obstacle in the game by using the ventilation system. The frustration that results is apparent in this video essay/analysis: it undermines all of the otherwise well-designed systems in the game in spite of not actually interfering with the player’s ability to engage with them.
I wonder if, alongside the “loss of rejected options” proposition, a reason that extra choices impact us is the mental bandwidth they take up. If the satisfaction we derive from a choice is (to a first-order approximation) proportional to our intellectual and emotional investment in the option we select, then having more options leaves less to invest as soon as the options go from being free to having any cost at all. As an economic analogy, a committee seeking to design a new product or building must choose between an initial set of designs. The more designs there are, the more resources must go into the selection procedure, and if the committee’s budget is fixed, then this will remove resources that could have improved the product further down the line.

Morgan_Rogers Mar 12, 2022, 3:37 PM
6 points
0
on: Why Rationalists Shouldn’t be Interested in Topos Theory
[0,1] is a commutative quantale when equipped with its usual multiplication. You can lift the monoidal product structure to sheaves on [0,1] (viewed as a frame) via Day convolution. So we recover a topos where the truth values are probabilities.
People who have attempted to build toposes with probabilities as truth values have also failed to notice this. Take Isham and Doering’s paper, for example, (which I personally am quite averse to because they bullishly follow through on constructing toposes with certain properties which are barely justified). They don’t even think about products of probabilities.
I think the monoidal topos on the unit interval merits some serious investigation.

Morgan_Rogers Feb 21, 2022, 12:50 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Goal-directedness: exploring explanations
I see what you’re getting at. For an arbitrary explanation, we need to take into account not only the complexity of the explanation itself, but also how difficult it is to compute a relevant prediction from that explanation; according to my criteria, the Standard Model (or any sufficiently detailed theory of physics that accurately explains phenomena within a conservative range of low-ish energy environments encountered on Earth) would count as a very good explanation for any behaviour for its complexity, but that’s ignoring the fact that it would be impossible to actually compute those predictions.

While I made the claim that there is a clear dividing line between (accuracy and power) and (complexity), this strikes me as an issue straddling complexity and explanatory power, which muddies the water a little.

Since I’ve appealed to physics explanations in my post, I’m glad you’ve made me think about these points. Moving forward, though, I expect the classes of explanation under consideration to be so constrained as to make this issue insignificant. That is, I expect to be directly comparing explanations taking the form of goals to explanations taking the form of algorithms or similar; each of these has a clear interpretation in terms of its predictions and, while the former might be harder to compute, the difference in difficulty is going to be suitably uniform across the classes (after accounting for complexity of explanations), so that I feel justified in ignoring it until later.

Goal-directedness: exploring explanations

Morgan_RogersFeb 14, 2022, 4:20 PM

13 points

3 comments18 min readLW link

Morgan_Rogers Jan 14, 2022, 1:14 PM
2 points
0
in reply to: Jon Garcia’s comment on: Goal-directedness: my baseline beliefs
Thanks for the ideas!

I like the idea about the size of the target states; there’s bound to be some interesting measure theory that I can apply if I decide to formalize in that direction. In fact, measure theory might be able to clarify some of the subtleties I alluded to above regarding what happens when we refine the world model (for example, in a way that causes a single goal state to split into two or more).

There are hints in your last paragraph of associating competence with goal-directedness, which I think is an association to avoid. For example, when a zebra is swimming across a river as fast as it can, I would like the extent to which that behaviour is considered goal-directed to be independent of whether that zebra is the one that gets attacked by a crocodile.
What links here?
- Goal-directedness: exploring explanations by Morgan_Rogers (Feb 14, 2022, 4:20 PM; 13 points)
- Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs by Morgan_Rogers (Mar 19, 2022, 5:28 PM; 4 points)

Goal-directedness: my baseline beliefs

Morgan_RogersJan 8, 2022, 1:09 PM

21 points

3 comments3 min readLW link

Morgan_Rogers Oct 26, 2021, 2:26 PM
LW: 7 AF: 4
0
AF
on: Why Subagents?
The example you give has a pretty simple lattice of preferences, which lends itself to illustrations but which might create some misconceptions about how the subagent model should be formalized. For example, in your example you assume that the agents’ preferences are orthogonal (one cares about pepperoni, the other about mushrooms, and each is indifferent to the opposite direction), the agents have equal weighting in the decision-making, the lattice is distributive… Compensating for these factors, there are many ways that a given ‘weak utility’ can be expressed in terms of subagents. I’m sure there are optimization questions that follow here, about the minimum number of subagents (dimensions) needed to embed a given weak-utility function (partially ordered set), and about when reasonable constraints such as orthogonality of subagents can be imposed. There are also composition questions: how does a committee of agents with subagents behave?

Morgan_Rogers Oct 25, 2021, 3:16 PM
1 point
0
AF
on: Conceptual engineering: the revolution in philosophy you’ve never heard of
It’s really nice to see a critical take on analytic philosophy, thank you for this post. The call-out aspect was also appreciated: coming from mathematics, where people are often quite reckless about naming conventions to the detriment of pedagogical dimensions of the field, it is quite refreshing.
On the philosophy content, it seems to me that many of the vices of analytic philosophy seem hard to shake, even for a critic such as yourself.
Consider the “Back to the text” section. There is some irony in your accusation of Chalmers basing his strategy on its name via its definition rather than the converse, yet you end that section with giving a definition-by-example of what engineering is and proceed with that definition. To me, this points to the tension between dismissing the idea that concepts should be well-defined notions in philosophical discourse, while relying on at least some precision of denotation in using names of concepts in discourse.
You also seem to lean on anthropological principles as analytic philosophy does. I agree that the only concepts which will appear in philosophical discourse will be those which are relevant to human experience, but that experience extends far beyond “human life” to anything of human interest (consider the language of physics and mathematics, which often doesn’t have direct relation to our immediate experience), and this is a consequence of the fact that philosophy is a human endeavour rather than anything intrinsic to its content.
I’d like to take a different perspective on your Schmidhuber quote. Contrary to your interpretation, the fact that concepts are physically encoded in neural structures supports the Platonic idea that these concepts have an independent existence (albeit a far more mundane one than Plato might have liked). The empirical philosophy approach might be construed as investigating the nature of concepts statistically. However, there is a category error happening here: in pursuing this investigation, an empirical philosopher is conflating the value of the global concept with their own “partial” perspective concept.
I would argue that, whether one is convinced they exist or not, no one is invested in communal concepts, which are the kind of fragmented, polysemous entity which you describe, for their own sake. Individuals are mostly invested in their own conceptions of concepts, and take an interest in communal concepts only insofar as they are interested in being included in the community in which it resides. In short, relativism is an alternative way to resolve concepts: we can proceed not by rejecting the idea that concepts can have clear definitions (which serve to ground discourse in place of the more nebulous intuitions which motivate them), but rather by recognizing that any such definitions must come with a limited scope. I also personally reject the idea that a definition should be expected to conform to all of the various “intuitions” which are appealed to in classical philosophy for various reasons, but especially because there seems no a priori reason that any human should have infallible (or even rational) intuitions about concepts.
I might even go so far as to say that recognizing relativism incorporates your divide and conquer approach to resolving disagreement: the gardeners and landscape artists can avoid confusion when they discuss the concept of soil by recognizing their differing associations with the concept and hence specifying the details relevant to the union of their interests. But each can discard the extraneous details in discussion with their own community, just as physicists will go back to talking about “sound” in its narrowed sense when talking with other physicists. These narrowings only seem problematic if one expects the scope of all discourse to be universal.

Morgan_Rogers Oct 24, 2021, 11:11 AM
1 point
0
on: Model splintering: moving from one imperfect model to another
In section 4.6, you described an “unnatural” reward function splintering, and went on to advocate for more natural ones. I would agree with your argument as a general principle, but on the other hand I can think of situations where an exceptional case should be accounted for. Suppose that the manager of the rube-blegg factory keeps a single rube and a single blegg in a display case on the factory floor to present to touring visitors. A rube classifier which physically sorts rubes and bleggs should be able to recognize that these displayed examples are not to be sorted with the others, even though this requires making an unnatural extension of its internal reward function.
I think your examples in Section 6 of suitably deferring to human values upon model splintering could resolve this, but to me it highlights that a naive approach to model splintering could result in problems if the AI is not keeping track of enough features of the world to identify when an automatic “natural” extension of its model is inappropriate.

Morgan_Rogers Oct 8, 2021, 12:45 PM
3 points
0
AF
on: Generalised models as a category
Re “I’m not fully sold on category theory as a mathematical tool”, if someone (e.g. me) were to take the category you’ve outlined and run with it, in the sense of establishing its general structure and special features, could you be convinced? Are there questions that you have about this category that you currently are only able to answer by brute force computation from the definitions of the objects and morphisms as you’ve given them? More generally, are there variants of this category that you’ve considered that it might be useful to study in parallel?

Morgan_Rogers Oct 8, 2021, 12:11 PM
1 point
0
AF
on: Subagents of Cartesian Frames
I am very experienced in category theory but not the Chu construction (or *-autonomous categories in general). There is a widely used notion of subobject of an object $A$ in a category $C$ as “equivalence class of monomorphisms with codomain $A$ ”. This differs from your definition most conspicuously in the case of $⊤$ where there is no morphism from this frame to a typical frame.
If I’m calculating correctly, the standard notion of subobject is strictly stronger than the one you present here (as long as the world $W$ is inhabited, and even in that case I think the construction collapses enough to make it true) since monomorphisms are morphisms which are injective in their agent argument and surjective in their environment argument, and we can extend any morphism to $⊥$ along such a monomorphism.
Now, I notice that you refer to the concepts in this post as subagents rather than subframes, so perhaps you were deliberately avoiding this stronger concept. Intuitively, a subframe in the sense I describe above consists of an agent with a subset of the available options and who may not be able to distinguish between some of the environments present in a larger frame; the “precommitted agent” you mention early on here seems to be a special case of this which is the identity in the environment component. Incidentally, the equivalence relation corresponding to this notion of subobject corresponds to isomorphism in the finite case but is non-trivial for a similar reason to the case you described of infinite frames.
I wonder if you have any thoughts about how these notions compare? It’s clear from the discussion that you chose a definition which reflected what you wanted to express, which is always good, but on the other hand the monomorphisms I described will crop up when you consider factorizations of the morphisms in your category more generally. Perhaps they could be useful to you.

Morgan_Rogers

Goal-di­rect­ed­ness: rel­a­tivis­ing complexity

Goal-di­rect­ed­ness: tack­ling complexity

Ex­am­in­ing Arm­strong’s cat­e­gory of gen­er­al­ized models

Goal-di­rect­ed­ness: im­perfect rea­son­ing, limited knowl­edge and in­ac­cu­rate beliefs

Goal-di­rect­ed­ness: ex­plor­ing explanations

Goal-di­rect­ed­ness: my baseline beliefs