DanielFilan

Karma: 8,626

DanielFilan Dec 3, 2024, 1:51 AM
3 points
0
in reply to: DanielFilan’s comment on: 2024 Unofficial LessWrong Census/Survey

Dojo Organizations What organizations are you aware of that are providing some kind of rationality dojo format (courses focused on improving the skill of rationality)?

Seems like the stuff after “Dojo Organizations” should be on a new line.

DanielFilan Dec 3, 2024, 1:49 AM
9 points
0
on: 2024 Unofficial LessWrong Census/Survey

About how often do you use LLMs like ChatGPT while active?

What does “while active” mean in this question?

AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment

DanielFilanDec 1, 2024, 6:00 AM

41 points

0 comments67 min readLW link

DanielFilan Nov 29, 2024, 6:04 PM
8 points
0
on: The Big Nonprofits Post

If one wants to investigate [the Alignment of Complex Systems research group] further, he has an AXRP podcast episode, which I haven’t listened to.

Note that if you want to investigate further but would rather read a transcript than watch a video, AXRP has you covered.

AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory

DanielFilanNov 27, 2024, 6:30 AM

34 points

0 comments10 min readLW link

DanielFilan Nov 17, 2024, 10:19 PM
4 points
0
in reply to: khafra’s comment on: Seven lessons I didn’t learn from election day
Yeah but a bunch of people might actually answer how their neigbours will vote, given that that’s what the pollster asked—and if the question is phrased as the post assumes, that’s going to be a massive issue.

AXRP Episode 38.1 - Alan Chan on Agent Infrastructure

DanielFilanNov 16, 2024, 11:30 PM

12 points

0 comments14 min readLW link

DanielFilan Nov 14, 2024, 7:11 PM
9 points
5
on: Seven lessons I didn’t learn from election day
So I guess 1.5% of Americans have worse judgment than I expected (by my lights, as someone who thinks that Trump is really bad). Those 1.5% were incredibly important for the outcome of the election and for the future of the country, but they are only 1.5% of the population.
Nitpick: they are 1.5% of the voting population, making them around 0.7% of the US population.

DanielFilan Nov 14, 2024, 7:09 PM
16 points
0
on: Seven lessons I didn’t learn from election day
If you ask people who they’re voting for, 50% will say they’re voting for Harris. But if you ask them who most of their neighbors are voting for, only 25% will say Harris and 75% will say Trump!
Note this issue could be fixed if you instead ask people who the neighbour immediately to the right of their house/apartment will vote for, which I think is compatible with what we know about this poll. That said, the critique of “do people actually know” stands.

DanielFilan Nov 14, 2024, 7:02 PM
7 points
1
on: Seven lessons I didn’t learn from election day
she should have picked Josh Shapiro as her running mate
Note that this news story makes allegations that, if true, make it sound like the decision was partly Shapiro’s:
Following Harris’s interview with Pennsylvania Governor Josh Shapiro, there was a sense among Shapiro’s team that the meeting did not go as well as it could have, sources familiar with the matter tell ABC News.
Later Sunday, after the interview, Shapiro placed a phone call to Harris’ team, indicating he had reservations about leaving his job as governor, sources said.

DanielFilan Nov 14, 2024, 6:04 PM
4 points
0
in reply to: yams’s comment on: DanielFilan’s Shortform Feed
Oh except: I did not necessarily mean to claim that any of the things I mentioned were missing from the alignment research scene, or that they were present.

DanielFilan Nov 14, 2024, 5:40 PM
LW: 4 AF: 3
0
AF
in reply to: Chris_Leong’s comment on: DanielFilan’s Shortform Feed
When I wrote that, I wasn’t thinking so much about evals / model organisms as stuff like:
- putting a bunch of agents in a simulated world and seeing how they interact
- weak-to-strong / easy-to-hard generalization
basically stuff along the lines of “when you put agents in X situation, they tend to do Y thing”, rather than trying to understand latent causes / capabilities

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems

DanielFilanNov 14, 2024, 7:00 AM

14 points

0 comments12 min readLW link

DanielFilan Nov 14, 2024, 6:56 AM
6 points
0
in reply to: yams’s comment on: DanielFilan’s Shortform Feed
Yeah, that seems right to me.

DanielFilan Nov 14, 2024, 4:07 AM
LW: 32 AF: 17
2
AF
on: DanielFilan’s Shortform Feed
A theory of how alignment research should work

(cross-posted from danielfilan.com)

Epistemic status:
- I listened to the Dwarkesh episode with Gwern and started attempting to think about life, the universe, and everything
- less than an hour of thought has gone into this post
- that said, it comes from a background of me thinking for a while about how the field of AI alignment should relate to agent foundations research
Maybe obvious to everyone but me, or totally wrong (this doesn’t really grapple with the challenges of working in a domain where an intelligent being might be working against you), but:
- we currently don’t know how to make super-smart computers that do our will
  - this is not just a problem of having a design that is not feasible to implement: we do not even have a sense of what the design would be
  - I’m trying to somewhat abstract over intent alignment vs control approaches, but am mostly thinking about intent alignment
  - I have not thought that much about societal/systemic risks very much, and this post doesn’t really address them.
- ideally we would figure out how to do this
- the closest traction that we have: deep learning seems to work well in practice, altho our theoretical knowledge of why it works so well or how capabilities are implemented is lagging
- how should we proceed? Well:
  - thinking about theory alone has not been practical
  - probably we need to look at things that exhibit alignment-related phenomena and understand them, and that will help us develop the requisite theory
    said things are probably neural networks
  - there are two ways we can look at neural networks: their behaviour, and their implementation.
  - looking at behaviour is conceptually straightforward, and valuable, and being done
  - looking at their implementation is less obvious
  - what we need is tooling that lets us see relevant things about how neural networks are working
  - such tools (e.g. SAEs) are not impossible to create, but it is not obvious that their outputs tell us quantities that are actually of interest
  - in order to discipline the creation of such tools, we should demand that they help us understand models in ways that matter
    see Stephen Casper’s engineer’s interpretability sequence, Jason Gross on compact proofs
  - once we get such tools, we should be trying to use them to understand alignment-relevant phenomena, to build up our theory of what we want out of alignment and how it might be implemented
    this is also a thing that looking at the external behaviour of models in alignment-relevant contexts should be doing
- so should we be just doing totally empirical things? No.
  - firstly, we need to be disciplined along the way by making sure that we are looking at settings that are in fact relevant to the alignment problem, when we do our behavioural analysis and benchmark our interpretability tools. This involves having a model of what situations are in fact alignment-relevant, what problems we will face as models get smarter, etc
  - secondly, once we have the building blocks for theory, ideally we will put them together and make some actual theorems like “in such-and-such situations models will never become deceptive” (where ‘deceptive’ has been satisfactorily operationalized in a way that suffices to derive good outcomes from no deception and relatively benign humans)
- I’m imagining the above as being analogous to an imagined history of statistical mechanics (people who know this history or who have read “inventing temperature” should let me know if I’m totally wrong about it):
  - first we have steam engines etc
  - then we figure out that ‘temperature’ and ‘entropy’ are relevant things to track for making the engines run
  - then we relate temperature, entropy, and pressure
  - then we get a good theory of thermodynamics
  - then we develop statistical mechanics
- exceptions to “theory without empiricism doesn’t work”:
  - thinking about deceptive mesa-optimization
  - RLHF failures
  - CIRL analysis
- lesson of above: theory does seem to help us analyze some issues and raise possibilities

DanielFilan Nov 8, 2024, 12:12 AM
8 points
4
in reply to: Lorxus’s comment on: Some Rules for an Algebra of Bayes Nets
A way I’d phrase John’s sibling comment, at least for the exact case: adding arrows to a DAG increases the set of probability distributions it can represent. This is because the fundamental rule of a Bayes net is that d-separation has to imply conditional independence—but you can have conditional independences in a distribution that aren’t represented by a network. When you add arrows, you can remove instances of d-separation, but you can’t add any (because nodes are d-separated when all paths between them satisfy some property, and (a) adding arrows can only increase the number of paths you have to worry about and (b) if you look at the definition of d-separation the relevant properties for paths get harder to satisfy when you have more arrows). Therefore, the more arrows a graph G has, the fewer constraints distribution P has to satisfy for P to be represented by G.

DanielFilan Nov 2, 2024, 8:01 PM
8 points
7
in reply to: Adam Scholl’s comment on: JargonBot Beta Test
I enjoyed reading Nicholas Carlini and Jeff Kaufman write about how they use them, if you’re looking for inspiration.

DanielFilan Nov 2, 2024, 3:16 AM
2 points
0
in reply to: DanielFilan’s comment on: DanielFilan’s Shortform Feed
Another way of maintaining Sola Scriptura and Perspicuity in the face of Protestant disagreement about essential doctrines is the possibility that all of this is cleared up in the deuterocanonical books that Catholics believe are scripture but Protestants do not. That said, this will still rule out Protestantism, and it’s not clear that the deuterocanon in fact clears everything up.

DanielFilan Nov 2, 2024, 3:12 AM
−2 points
0
on: DanielFilan’s Shortform Feed
A failure of an argument against sola scriptura (cross-posted from Superstimulus)

Recently, Catholic apologist Joe Heschmeyer has produced a couple of videos arguing against the Protestant view of the Bible—specifically, the claims of Sola Scriptura and Perspicuity (capitalized because I’ll want to refer to them as premises later). “Sola Scriptura” has been operationalized a few different ways, but one way that most Protestants would agree on is (taken from the Westminster confession):

The whole counsel of God, concerning all things necessary for [...] man’s salvation [...] is either expressly set down in Scripture, or by good and necessary consequence may be deduced from Scripture

“Perspicuity” means clarity, and is propounded in the Westminster confession like this:

[T]hose things which are necessary to be known, believed, and observed, for salvation, are so clearly propounded and opened in some place of Scripture or other, that not only the learned, but the unlearned, in a due use of the ordinary means, may attain unto a sufficient understanding of them.

So, in other words, Protestants think that everything you need to know to be saved is in the Bible, and is expressed so obviously that anyone who reads it and thinks about it in a reasonable way will understand it.

I take Heschmeyer’s argument to be that if Sola Scriptura and Perspicuity were true, then all reasonable people who have read the Bible and believe it would agree on which doctrines were necessary for salvation—in other words, you wouldn’t have a situation where one person thinks P and P is necessary for salvation, while another thinks not-P, or a third thinks that P is not necessary for salvation. But in fact this situation happens a lot, even among seemingly sincere followers of the Bible who believe in Sola Scriptura and Perspicuity. Therefore Sola Scriptura and Perspecuity are false. (For the rest of this post, I’ll write Nec(P) for the claim “P is necessary for salvation” to save space.)

I think this argument doesn’t quite work. Here’s why:

It can be the case that the Bible clearly explains everything that you need to believe, but it doesn’t clearly explain which things you need to believe. In other words, Sola Scriptura and Perspicuity say that for all P such that Nec(P), the Bible teaches P clearly—but they don’t say that for such P, the Bible teaches P clearly, and also clearly teaches Nec(P). Nor do they say that the only things that are taught clearly in the Bible are things you need to believe (otherwise you could figure out which doctrines you had to believe by just noticing what things the Bible clearly teaches).

For example, suppose that the Bible clearly teaches that Jesus died for at least some people, and that followers of Jesus should get baptized, and in fact, the only thing you need to believe to be saved is that Jesus died for at least some people. In that world, people of good faith could disagree about whether you need to believe that Jesus died for at least some people, and this would be totally consistent with Sola Scriptura and Perspicuity.

Furthermore, suppose that it’s not clear to people of good faith whether or not something is clear to people of good faith. Perhaps something could seem clear to you but not be clear to others of good faith, or also something could be clear but others could fail to understand it because they’re not actually of good faith (you need this part otherwise you can tell if something’s clear by noticing if anyone disagrees with you). Then, you can have one person who believes P and Nec(P), and another who believes not-P and Nec(not-P), and that be consistent with Sola Scriptura and Perspicuity.

For example, take the example above, and suppose that some people read the Bible as clearly saying that Jesus died for everyone (aka Unlimited Atonement), and others read the Bible as clearly saying that Jesus only died for his followers (aka Limited Atonement). You could have that disagreement, and if the two groups think the others are being disingenuous, they could both think that you have to agree with them to be saved, while still having Sola Scriptura and Perspicuity being true.

That said, Heschmeyer’s argument is still going to limit the kinds of Protestantism you can adopt. In the above example, if we suppose that you can tell that neither group is in fact being disingenuous, then his argument rules out the combination of Sola Scriptura, Perspicuity, and Nec(Limited Atonement) (as well as Sola Scriptura, Perspicuity, and Nec(Unlimited Atonement)). In this way, applied to the real world, it’s going to rule out versions of Protestantism that claim that you have to believe a bunch of things that sincere Christians who are knowledgeable about the Bible don’t agree on. That said, it won’t rule out Protestantisms that are liberal about what you can believe while being saved.

DanielFilan Nov 2, 2024, 12:45 AM
2 points
0
in reply to: Screwtape’s comment on: 2024 Unofficial LW Community Census, Request for Comments
Oh I misread it as “eighty percent of the effort” oops.

DanielFilan

AXRP Epi­sode 39 - Evan Hub­inger on Model Or­ganisms of Misalignment

AXRP Epi­sode 38.2 - Jesse Hoogland on Sin­gu­lar Learn­ing Theory

AXRP Epi­sode 38.1 - Alan Chan on Agent Infrastructure

AXRP Epi­sode 38.0 - Zhijing Jin on LLMs, Causal­ity, and Multi-Agent Systems

AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment

AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory

AXRP Episode 38.1 - Alan Chan on Agent Infrastructure

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems