lukemarks

Karma: 466

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

Oct 3, 2023, 7:45 AM

17 points

0 comments5 min readLW link

lukemarks Sep 11, 2023, 8:38 AM
12 points
0
on: High school advice
I’m in high school myself and am quite invested in AI safety. I’m not sure whether you’re requesting advice for high school as someone interested in LW, or for LW and associated topics as someone attending high school. I will try to assemble a response to accommodate both possibilities.
Absorbing yourself in topics like x-risk can make school feel like a waste of time. This seems to me to be because school is mostly a waste of time (which is a position I held before becoming interested in AI safety,) but disengaging with the practice entirely also feels incorrect. I use school mostly as a place to relax. Those eight hours are time I usually have to write off as wasted in terms of producing a technical product, but value immensely as a source of enjoyment, socializing and relaxation. It’s hard for me to overstate just how pleasurable attending school can be when you optimize for enjoyment, and if permitted by your school’s environment; a suitable place for intellectual progress in an autodidactic sense also, presuming you aren’t being provided that in the classroom. If you do feel that the classroom is an optimal learning environment for you, I don’t see why you shouldn’t just maximize knowledge extraction.
For many of my peers, school is practically their life. I think that this is a shame, but social pressures don’t let them see otherwise, even when their actions are clearly value negative. Making school just one part of your life instead of having it consume you is probably the most critical thing to extract from this response. The next is to use its resources to your advantage. If you can network with driven friends or find staff willing to push you/find you interesting opportunities, you absolutely should. I would be shocked if there wasn’t at least one staff member at your school passionate about something you were too. Just asking can get you a long way, and shutting yourself off from that is another mistake I made in my first few years of high school, falsely assuming that school simply had nothing to offer me.
In terms of getting involved with LW/AI safety, the biggest mistake I made was being insular, assuming my age would get in the way of networking. There are hundreds of people available at any given time who probably share your interests but possess an entirely different perspective. Most people do not care about my age, and I find that phenomena especially prevalent in the rationality community. Just talk to people. Discord and Slack are the two biggest clusters for online spaces, and if you’re interested I can message you invites.
Another important point, particularly as a high school student is not falling victim to group think. It’s easy to be vulnerable to the failing in your formative years, but it can massively skew your perspective, even when your thinking seems unaffected. Don’t let LessWrong memetics propagate throughout your brain too strongly without good reason.
What links here?
- High Schooler getting involved in EA by GamerChess (EA Forum; Apr 20, 2024, 11:19 AM; 26 points)

lukemarks Sep 8, 2023, 2:23 AM
3 points
0
in reply to: Charlie Steiner’s comment on: The Löbian Obstacle, And Why You Should Care
I expect agentic simulacra to occur without intentionally simulating them, in that agents are just generally useful for solving prediction problems and that in conducting millions of predictions (as would be expected of a product on the order of ChatGPT, or future successors,) it’s probable for agentic simulacra to occur. Even if these agents are just approximations, in predicting the behaviors of approximated agents their preferences could still be satisfied in the real world (as described in the Hubinger post.)

The problem I’m interested in is how you ensure that all subsequent agentic simulacra (whether occurred intentionally or otherwise) are safe, which seems difficult to verify formally due to the Löbian Obstacle.

lukemarks Sep 8, 2023, 1:31 AM
6 points
1
in reply to: Charlie Steiner’s comment on: The Löbian Obstacle, And Why You Should Care
Which part specifically are you referring to as being overly complicated? What I take to be the primary assertions of the post to be are:
- Simulacra may themselves conduct simulation, and advanced simulators could produce vast webs of simulacra organized as a hierarchy.
- Simulating an agent is not fundamentally different to creating one in the real world.
- Due to instrumental convergence, agentic simulacra might be expected to engage in resource acquisition. This could take the shape of ‘complexity theft’ as described in the post.^[1]
- The Löbian Obstacle accurately describes why an agent cannot obtain a formal guarantee via design-inspection of its subsequent agent.
- For a simulator to be safe, all simulacra need to be aligned unless we figure some upper bound on “programs of this complexity are too simple to be dangerous,” at which point we would consider simulacra above that complexity only.
I’ll try to justify my approach with respect to one or more of these claims, and if I can’t, I suppose that would give me strong reason to believe the method is overly complicated.
1. ^
  This doesn’t have to be resource acquisition, just any negative action that we could reasonably expect a rational agent to pursue.

The Löbian Obstacle, And Why You Should Care

lukemarksSep 7, 2023, 11:59 PM

18 points

6 comments2 min readLW link

lukemarks Aug 24, 2023, 10:05 PM
4 points
2
on: AI Regulation May Be More Important Than AI Alignment For Existential Safety
The issue I have with pivotal act models is that they presume an aligned superintelligence would be capable of bootstrapping its capabilities in such a way that it could perform that act before the creation of the next superintelligence. Soft takeoff seems a very popular opinion now, and isn’t conducive to this kind of scheme.

Also, if a large org were planning a pivotal act I highly doubt they would do so publicly. I imagine subtly modifying every GPU on the planet, melting them or doing anything pivotal on a planetary scale such that the resulting world has only one or a select few superintelligences (at least until a better solution exists) would be very unpopular with the public and with any government.

I don’t think the post explicity argues against either of these points, and I agree with what you have written. I think these are useful things to bring up in such a discussion however.

[Question] What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?

lukemarksJul 8, 2023, 11:42 AM

84 points

28 comments2 min readLW link

Direct Preference Optimization in One Minute

lukemarksJun 26, 2023, 11:52 AM

22 points

3 comments2 min readLW link

lukemarks Jun 22, 2023, 12:25 PM
2 points
1
on: why I’m here now
I have enjoyed your writings both on LessWrong and on your personal blog. I share your lack of engagement with EA and with Hanson (although I find Yudkowsky’s writing very elegant and so felt drawn to LW as a result.) If not the above, which intellectuals do you find compelling, and what makes them so by comparison to Hanson/Yudkowsky?

lukemarks Jun 19, 2023, 8:04 AM
2 points
0
in reply to: NeuralSystem_e5e1’s comment on: Why I am not an AI extinction cautionista
In (P2) you talk about a roadblock for RSI, but in (C) you talk about about RSI as a roadblock, is that intentional?
This was a typo.
By “difficult”, do you mean something like, many hours of human work or many dollars spent? If so, then I don’t see why the current investment level in AI is relevant. The investment level partially determines how quickly it will arrive, but not how difficult it is to produce.
The primary implications of the difficulty of a capabilities problem in the context of safety is when said capability will arrive in most contexts. I didn’t mean to imply that the investment amount determined the difficulty of the problem, but that if you invest additional resources into a problem it is more likely to be solved faster than if you didn’t invest those resources. As a result, the desired effect of RSI being a difficult hurdle to overcome (increasing the window to AGI) wouldn’t be realized.

lukemarks Jun 19, 2023, 5:23 AM
1 point
0
in reply to: NeuralSystem_e5e1’s comment on: Why I am not an AI extinction cautionista
More like: (P1) Currently there is a lot of investment in AI. (P2) I cannot currently imagine a good roadblock for RSI. (C) Therefore, I have more reasons to believe RSI will not be entail atypically difficult roadblocks than I do to believe it will.

This is obviously a high level overview, and a more in-depth response might cite claims like the fact that RSI is likely an effective strategy for achieving most goals, or mention counterarguments like Robin Hanson’s, which asserts that RSI is unlikely due to the observed behaviors of existing >human systems (e.g. corporations).

lukemarks Jun 18, 2023, 10:10 PM
15 points
6
on: Why I am not an AI extinction cautionista
“But what if [it’s hard]/[it doesn’t]”-style arguments are very unpersuasive to me. What if it’s easy? What if it does? We ought to prefer evidence to clinging to an unknown and saying “it could go our way.” For a risk analysis post to cause me to update I would need to see “RSI might be really hard because...” and find the supporting reasoning robust.

Given current investment in AI and the fact that I can’t conjure a good roadblock for RSI, I am erring on the side of it being easier rather than harder, but I’m open to updating in light of strong counter-reasoning.

Partial Simulation Extrapolation: A Proposal for Building Safer Simulators

lukemarksJun 17, 2023, 1:55 PM

16 points

0 comments10 min readLW link

lukemarks Jun 11, 2023, 10:30 AM
2 points
0
in reply to: dr_s’s comment on: The Dictatorship Problem
See:
Defining fascism in this way makes me worry that future fascist figures can hide behind the veil of “But we aren’t doing x specific thing (e.g. minority persecution) and therefore are not fascist!”
And:
Is a country that exhibits all symptoms of fascism except for minority group hostility still fascist?

lukemarks Jun 11, 2023, 6:39 AM
1 point
0
in reply to: the gears to ascension’s comment on: The Dictatorship Problem
Agreed. I have edited that excerpt to be:
It’s not obvious to me that selection for loyalty over competence is necessarily more likely in fascism or bad. A competent figure who is opposed to democracy would be a considerably more concerning electoral candidate than a less competent one who is loyal to democracy assuming that democracy is your optimization target.

lukemarks Jun 11, 2023, 6:27 AM
4 points
0
in reply to: the gears to ascension’s comment on: The Dictatorship Problem
As in decreases the ‘amount of democracy’ given that democracy is what you were trying to optimize for.

lukemarks Jun 11, 2023, 3:44 AM
24 points
12
on: The Dictatorship Problem
Sam Altman, the quintessential short-timeline accelerationist, is currently on an international tour meeting with heads of state, and is worried about the 2024 election. He wouldn’t do that if he thought it would all be irrelevant next year.
Whilst I do believe Sam Altman is probably worried about the rise of fascism and its augmenting by artificial intelligence, I don’t see this as evidence of his care regarding this fact. Even if he believed a rise in fascism had no likelihood of occurring; it would still be beneficial for him to pursue the international tour as a means of minimizing x-risks, assuming even that we would see AGI in the next <6 months.
[Facism is] a system of government where there are no meaningful elections; the state does not respect civil liberties or property rights; dissidents, political opposition, minorities, and intellectuals are persecuted; and where government has a strong ideology that is nationalist, populist, socially conservative, and hostile to minority groups.
I doubt that including some of the conditions toward the end makes for a more useful dialogue. Irrespective of social conservatism and hostility directed at minority groups, the risk of fascism existentially is probably quite similar. I can picture both progressive and conservative dictatorships reaching essentially all AI x-risk outcomes. Furthermore, is a country that exhibits all symptoms of fascism except for minority group hostility still fascist? Defining fascism in this way makes me worry that future fascist figures can hide behind the veil of “But we aren’t doing x specific thing (e.g. minority persecution) and therefore are not fascist!”
My favored definition, particularly for discussing x-risk would be more along the lines of the Wikipedia definition:
Fascism is a far-right, authoritarian, ultranationalist political ideology and movement, characterized by a dictatorial leader, centralized autocracy, militarism, forcible suppression of opposition, belief in a natural social hierarchy, subordination of individual interests for the perceived good of the nation and race, and strong regimentation of society and the economy.
But I would like to suggest a re-framing of this issue, and claim that the problem of focus should be authoritarianism. What authoritarianism is is considerably clearer than what fascism is, and is more targeted in addressing the problematic governing qualities future governments could possess. It doesn’t appear obvious to me that a non-fascist authoritarian government would be better at handling x-risks than a fascist one, which is contingent on the fact that progressive political attitudes don’t seem better at addressing AI x-risks than conservative ones (or vice versa). Succinctly, political views look to me to be orthogonal to capacity in handling AI x-risk (bar perspectives like anarcho-primitivism or accelerationism that strictly mention this topic in their doctrine).
AI policy, strategy, and governance involves working with government officials within the political system. This will be very different if the relevant officials are fascists, who are selected for loyalty rather than competence.
It’s not obvious to me that selection for loyalty over competence is necessarily more likely in fascism or bad. A competent figure who is opposed to democracy would be a considerably more concerning electoral candidate than a less competent one who is loyal to democracy assuming that democracy is your optimization target.
A fascist government will likely interfere with AI development itself, in the same way that the COVID pandemic was a non-AI issue that nonetheless affected AI engineers.
Is interference with AI development necessarily bad? We can’t predict the unknown unknown of what views on AI development fascist dictatorship (that mightn’t yet exist) might hold or how they will act on them. I agree that on principal a fascist body interfering with industry does obviously not result in good outcomes in most cases but not see how/why this appeals to AI x-risk specifically.

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarksJun 11, 2023, 12:13 AM

22 points

0 comments5 min readLW link

lukemarks Jun 3, 2023, 5:50 AM
1 point
0
in reply to: Logan Zoellner’s comment on: The AGI Race Between the US and China Doesn’t Exist.
While it’s true that Chinese semiconductor fabs are a decade behind TSMC (and will probably remain so for some time), that doesn’t seem to have stopped them from building 162 of the top 500 largest supercomputers in the world.
They did this (mostly) before the export regulations were instantiated. I’m not sure what the exact numbers are, but both of their supercomputers in the top 10 were constructed before October 2022 (when they were imposed). Also, I imagine that they still might have had a steady supply of cutting edge chips soon after the export regulations. It would make sense that they were not enacted immediately and also that exports that had already begun hadn’t been ceased, but I have not verified that.

lukemarks May 31, 2023, 10:00 PM
9 points
2
on: The Divine Move Paradox & Thinking as a Species
Sure, this is an argument ‘for AGI’, but rarely do people (on this forum at least) reject the deployment of AGI because they feel discomfort in not fully comprehending the trajectory of their decisions. I’m sure that this is something most of us ponder and would acknowledge is not optimal, but if you asked the average LW user to list the reasons they were not for the deployment of AGI, I think that this would be quite low on the list.

Reasons higher on the list for me for example would be “literally everyone might die.” In light of that; dismissing control loss as a worry seems quite miniscule. The reason people fear control loss is generally because losing control of something more intelligent than you with instrumental subgoals that if pursued would probably result in a bad outcome for you, but this doesn’t change the fact that “we shouldn’t fear not being in control for the above reasons” does not constitute sufficient reason to deploy AGI.

Also, although some of the analogies drawn here do have merit; I can’t help but gesture toward the giant mass of tentacles and eyes you are applying them to. To make this more visceral, picture a literal Shoggoth descending from a plane of Eldlitch horror and claiming decision-making supremacy and human-aligned goals. Do you accept its rule because of its superior decision making supremacy and claimed human-aligned, or do you seek an alternative arrangement?

lukemarks

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

The Löbian Ob­sta­cle, And Why You Should Care

[Question] What Does LessWrong/​EA Think of Hu­man In­tel­li­gence Aug­men­ta­tion as of mid-2023?

Direct Prefer­ence Op­ti­miza­tion in One Minute

Par­tial Si­mu­la­tion Ex­trap­o­la­tion: A Pro­posal for Build­ing Safer Simulators

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’