JustinShovelain

Karma: 544

I am the co founder of and researcher at the quantitative long term strategy organization Convergence (see here for our growing list of publications). Over the last fourteen years I have worked with MIRI, CFAR, EA Global, and Founders Fund, and done work in EA strategy, fundraising, networking, teaching, cognitive enhancement, and AI safety research. I have a MS degree in computer science and BS degrees in computer science, mathematics, and physics.

JustinShovelain 5 Jul 2023 19:30 UTC
5 points
0
in reply to: Charlie Steiner’s comment on: Some background for reasoning about dual-use alignment research
Gotcha. What determines the “ratios” is some sort of underlying causal structure of which some aspects can be summarized by a tech tree. For thinking about the causal structure you may also like this post: https://forum.effectivealtruism.org/posts/TfRexamDYBqSwg7er/causal-diagrams-of-the-paths-to-existential-catastrophe

JustinShovelain 5 Jul 2023 11:59 UTC
LW: 6 AF: 3
0
AF
on: Some background for reasoning about dual-use alignment research
Complementary ideas to this article:
- https://www.lesswrong.com/posts/BfKQGYJBwdHfik4Kd/fai-research-constraints-and-agi-side-effects: (the origin for the fuel tank metaphor Raemon refers to in these comments)
- Extending things further to handle higher order derivatives and putting things within a cohesive space: https://forum.effectivealtruism.org/posts/TCxik4KvTgGzMowP9/state-space-of-x-risk-trajectories
- A typology for mapping downside risks: https://www.lesswrong.com/posts/RY9XYoqPeMc8W8zbH/mapping-downside-risks-and-information-hazards
- A set of potential responses for what to do with potentially dangerous developments and a heuristic for triggering that evaluation: https://www.lesswrong.com/posts/6ur8vDX6ApAXrRN3t/information-hazards-why-you-should-care-and-what-you-can-do
- A general heuristic for what technology to develop and how to distribute it: https://forum.effectivealtruism.org/posts/4oGYbvcy2SRHTWgWk/improving-the-future-by-influencing-actors-benevolence
- A coherence focused framework from which is more fundamental than the link just above and from which it can be derived: https://www.lesswrong.com/posts/AtwPwD6PBsqfpCsHE/aligning-ai-by-optimizing-for-wisdom

JustinShovelain 6 Apr 2023 10:52 UTC
9 points
0
on: Dual-Useness is a Ratio
Relatedly, here is a post going beyond the framework of a ratio of progress to the effect on the ratio of research that still needs to be done for various outcomes: https://www.lesswrong.com/posts/BfKQGYJBwdHfik4Kd/fai-research-constraints-and-agi-side-effects
Extending further one can examine higher order derivatives and curvature in a space of existential risk trajectories: https://forum.effectivealtruism.org/posts/TCxik4KvTgGzMowP9/state-space-of-x-risk-trajectories

JustinShovelain 5 Jan 2023 12:09 UTC
3 points
0
on: When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️
Roughly speaking, in terms of the actions you take, various timelines should be weighted as P(AGI in year t)*DifferenceYouCanProduceInAGIAlignmentAt(t). This produces a new, non normalized distribution of how much to prioritize each time (you can renormalize it if you wish to make it more like “probability”).
Note that this is just a first approximation and there are additional subtleties.
- This assumes you are optimizing for each time and possible world orthogonality but much of the time optimizing for nearby times is very similar to optimizing for a particular time.
- The definition of “you” here depends on the nature of the decision maker which can vary between a group, a person, or even a person at a particular moment.
- Using different definitions of “you” between decision makers can cause a coordination issue where different people are trying to save different potential worlds (because of their different skills and ability to produce change) and their plans may tangle with each other.
- It is difficult to figure out how much of a difference you can produce in different possible worlds and times. You do the best you can but you might suffer a failure of imagination in either finding ways your plans wont work, ways your plans will have larger positive effects, or ways you may in the future improve your plans. For more on the difference one can produce see this and this.
- Lastly, there is a risk here psychologically and socially of fudging the calculations above to make things more comfortable.
(Meta: I may make a full post on this someday and use this reasoning often)

JustinShovelain 12 Apr 2022 6:36 UTC
8 points
in reply to: Davidmanheim’s comment on: Goodhart’s Law Causal Diagrams
I think causal diagrams naturally emerge when thinking about Goodhart’s law and its implications.
I came up with the concept of Goodhart’s law causal graphs above because of a presentation someone gave at the EA Hotel in late 2019 of Scott’s Goodhart Taxonomy. I thought causal diagrams were a clearer way to describe some parts of the taxonomy but their relationship to the taxonomy is complex. I also just encountered the paper you and Scott wrote a couple weeks ago when getting ready to write this Good Heart Week prompted post, and I was planning in the next post to reference it when we address “causal stomping” and “function generalization error” and can more comprehensively describe the relationship with the paper.
In terms of the relationship to the paper, I think that the Goodhart’s law causal graphs I describe above are more fundamental and atomically describe the relationship types between the target and proxies in a unified way. I read how you were using causal diagrams in your paper as rather describing various ways causal graph relationships may be broken by taking action rather than simply describing relationships between proxies and targets and ways they may be confused with each other (which is the function of the Goodhart’s law causal graphs above).
Mostly the purpose of this post and the next are to present an alternative, and I think cleaner, ontological structure for thinking about Goodhart’s law though there will still be some messiness in carving up reality.
As to your suggested mitigations, both randomization and secret metric are good to add though I’m not as sure about post hoc. Thanks for the suggestions and the surrounding paper.

JustinShovelain 18 May 2020 14:51 UTC
4 points
on: Subspace optima
I like the distinction that you’re making and that you gave it a clear name.
Relatedly, there is the method of Lagrangian multipliers for solving things in the subspace.
On a side note: there is a way to partially unify subspace optimum and local optimum by saying that the subspace optimum is a local optimum with respect to the local set of parameters you’re using to define the subspace. You’re at a local optimum with respect to defining the underlying space to optimize over (aka the subspace) and a local optimum within that space (the subspace). (Relatedly, moduli spaces.)

JustinShovelain 18 Apr 2020 12:02 UTC
7 points
on: COVID-19: An opportunity to help by modelling testing and tracing to inform the UK government
I’ve decided to try modelling testing and contact tracing over the weekend. If you wish to join and want to ping me my contact details are in the doc.

JustinShovelain 9 Apr 2020 19:59 UTC
2 points
on: Why don’t we have active human trials with inactivated SARS-COV-2?
I think virus inactivation is a normal vaccination approach and is probably being pursued here? The hardest part is probably growing it in vitro at scale and perhaps ensuring that all of them are inactive.

JustinShovelain 7 Apr 2020 14:17 UTC
10 points
on: Conflict vs. mistake in non-zero-sum games
Nice deduction about the relationship between this and conflict vs mistake theory! Similar and complementary to this post is the one I wrote on Moloch and the Pareto optimal frontier.

JustinShovelain 31 Jan 2020 18:25 UTC
1 point
in reply to: romeostevensit’s comment on: Using vector fields to visualise preferences and make them consistent
How so? I don’t follow your comment’s meaning.

JustinShovelain 21 Jan 2020 23:03 UTC
1 point
in reply to: Pattern’s comment on: Safety regulators: A tool for mitigating technological risk
Edited to add “I” immediately in front of “wish”.

JustinShovelain 28 Jul 2010 19:57 UTC
0 points
in reply to: Vladimir_Nesov’s comment on: Metaphilosophical Mysteries
By new “term” I meant to make the clear that this statement points to an operation that cannot be done with the original machine. Instead it calls this new module (say a halting oracle) that didn’t exist originally.

JustinShovelain 28 Jul 2010 9:06 UTC
0 points
in reply to: Vladimir_Nesov’s comment on: Metaphilosophical Mysteries
Are you trying to express the idea of adding new fundamental “terms” to your language describing things like halting oracles and such? And then discounting their weight by the shortest statement of said term’s properties expressed in the language that existed previously to including this additional “term?” If so, I agree that this is the natural way to extend priors out to handle arbitrary describable objects such as halting oracles.

Stated another way. You start with a language L. Let the definition of an esoteric mathematical object (say a halting oracle) E be D in the original language L. Then the prior probability of a program using that object is discounted by the description length of D. This gives us a prior over all “programs” containing arbitrary (describable) esoteric mathematical objects in their description.

I’m not yet sure how universal this approach is at allowing arbitrary esoteric mathematical objects (appealing to the Church-Turing thesis here would be assuming the conclusion) and am uncertain whether we can ignore the ones it cannot incorporate.

JustinShovelain 19 Mar 2010 23:39 UTC
8 points
on: Think Before You Speak (And Signal It)
Interesting idea.

I agree that trusting newly formed ideas is risky, but there are several reasons to convey them anyway (non-comprehensive listing):
- To recruit assistance in developing and verifying them
- To convey an idea that is obvious in retrospect, an idea you can be confident in immediately
- To signal cleverness and ability to think on one’s feet
- To socially play with the ideas
What we are really after though is to asses how much weight to assign to an idea off the bat so we can calculate the opportunity costs of thinking about the idea in greater detail and asking for the idea to be fleshed out and conveyed fully. This overlaps somewhat with the confidence (context sensitive rules in determining) with which the speaker is conveying the idea. Also, how do you gauge how old an idea really is? Especially if it condenses gradually or is a simple combination out of very old parts? Still… some metric is better than no metric.

JustinShovelain 14 Mar 2010 8:24 UTC
−47 points
in reply to: JustinShovelain’s comment on: Open Thread: March 2010, part 2
Vote this down for karma balance.
What links here?
- JustinShovelain's comment on Open Thread: March 2010, part 2 by RobinZ (14 Mar 2010 8:23 UTC; 9 points)

JustinShovelain 14 Mar 2010 8:24 UTC
48 points
in reply to: JustinShovelain’s comment on: Open Thread: March 2010, part 2
Vote this up if you are the oldest child with siblings.

JustinShovelain 14 Mar 2010 8:23 UTC
15 points
in reply to: JustinShovelain’s comment on: Open Thread: March 2010, part 2
Vote this up if you are an only child.

JustinShovelain 14 Mar 2010 8:23 UTC
18 points
in reply to: JustinShovelain’s comment on: Open Thread: March 2010, part 2
Vote this up if you have older siblings.

JustinShovelain 14 Mar 2010 8:23 UTC
9 points
on: Open Thread: March 2010, part 2
Poll: Do you have older siblings or are an only child?

karma balance
What links here?
- steven0461's comment on 2011 Survey Results by Scott Alexander (4 Dec 2011 20:59 UTC; 11 points)

JustinShovelain 10 Mar 2010 0:48 UTC
8 points
on: Open Thread: March 2010
I’m thinking of writing up a post clearly explaining update-less decision theory. I have a somewhat different way of looking at things than Wei Dia and will give my interpretation of his idea if there is demand. I might also need to do this anyway in preparation for some additional decision theory I plan to post to lesswrong. Is there demand?