#e/āac
ArthurB
We have a winner! laserficheās entry is the best (and only, but that doesnāt mean itās not good quality) submission, and they win $5K.
Code and demo will be posted soon.
Exactly. As for the cost issue, the code can be deployed as:
- Twitter bots (registered as such) so the deployer controls the cost
- A webpage that charges you a small payment (via crypto or credit card) to run 100 queries. Such websites can actually be generated by ChatGPT4 so itās an easy lift. Useful for people who truly want to learn or who want to get good arguments for online argumentation
- A webpage with captchas and reasonable rate limits to keep cost small
In general yes, here no. My impression from reading LW is that many people suffer from a great deal of analysis paralysis and are taking too few chances, especially given that the default isnāt looking great.
There is such a thing as doing a dumb thing because it feels like doing something (e.g. letās make AI Open!) but this aināt it. The consequences of this project are not going to be huge (talking to people) but you might get a nice little gradient read as to how helpful it is and iterate from there.
It should be possible to ask content owners for permission and get pretty far with that.
AFAIK what character.ai does is fine tuning, with their own language models, which arenāt at parity with ChatGPT. Using a better language model will yield better answers but, MUCH MORE IMPORTANTLY, what Iām suggesting is NOT fine tuning.
What Iām suggesting gives you an answer thatās closer to a summary of relevant bits of LW, Arbital, etc. The failure mode is much more likely to be that the answer is irrelevant or off the mark than it being at odds with prevalent viewpoints on this platform.
Think more interpolating over an FAQ, and less reproducing someoneās cognition.
Speed runĀning evĀeryĀone through the bad alĀignĀment bingo. $5k bounty for a LW conĀverĀsaĀtional agent
The US has around one traffic fatality per 100 million miles driven; if a human driver makes 100 decisions per mile
A human driver does not make 100 ālife or death decisionsā per mile. They make many more decisions, most of which can easily be corrected, if wrong, by another decision.
The statistic is misleading though in that it includes people who text, drunk drivers, tired drivers. The performance of a well rested human driver thatās paying attention to the road is much, much higher than that. And thatās really the bar that matters for self driving car, you donāt want a car that is doing better than the average driver whoāhey you never knowācould be a drunk.
Fixing hardware failures in software is literally how quantum computing is supposed to work, and itās clearly not a silly idea.
Generally speaking, thereās a lot of appeal to intuition here, but I donāt find it convincing. This isnāt good for Tokyo property prices? Well maybe, but how good of a heuristic is that when Mechagodzilla is on its way regardless.
In addition
There arenāt that many actors in the lead.
Simple but key insights in AI (e.g doing backprop, using sensible weight initialisation) have been missed for decades.
If the right tail for the time to AGI by a single group can be long and there arenāt that many groups, convincing one group to slow down /ā paying more attention to safety can have big effects.
How big of an effect? Years doesnāt seem off the table. Eliezer suggests 6 months dismissively. But add a couple years here and a couple years there, and pretty soon youāre talking about the possibility of real progress. Itās obviously of little use if no research towards alignment is attempted in that period of course, but itās not nothing.
There are IMO in-distribution ways of successfully destroying much of the computing overhang. Itās not easy by any means, but on a scale where āthe Mossad pulling off Stuxnetā is 0 and ābuild self replicating nanobotsā is 10, I think itās is closer to a 1.5.
Indeed, there is nothing irrational (in an epistemic way) about having hyperbolic time preference. However, this means that a classical decision algorithm is not conducive to achieving long term goals.
One way around this problem is to use TDT, another way is to modify your preferences to be geometric.
A geometric time preference is a bit like a moral preferenceā¦ itās a para-preference. Not something you want in the first place, but something you benefit from wanting when interacting with other agents (including your future self).
The second dot point is part of the problem description. Youāre saying itās irrelevant, but you canāt just parachute a payoff matrix where causality goes backward in time.
Find any example you like, as long as theyāre physically possible, youāll either have the payoff tied to your decision algorithm (Newcombās) or to your preference set (Solomonās).
Iām making a simple, logical argument. If itās wrong, it should be trivial to debunk. Youāre relying on an outside view to judge; it is pretty weak.
As Iāve clearly said, Iām entirely aware that Iām making a rather controversial claim. I never bother to post on lesswrong, so Iām clearly not whoring for attention or anything like that. Look at it this way, in order to present my point despite it being so unorthodox, I have to be pretty damn sure itās solid.
Thatās certainly possible, itās also possible that you do not understand the argument.
To make things absolutely clear, Iām relying on the following definition of EDT
Policy that picks action a = argmax( Sum( P( Wj | W, ai ). U( Wj ), j ) , i ) Where {ai} are the possible actions, W is the state of the world, P( Wā | W, a ) the probability of moving to state of the world Wā after doing a, and U is the utility function.
I believe the argument I made in the case of Solomonās problem is the clearest and strongest, would you care to rebut it?
Iāve challenged you to clarify through which mechanism someone with a cancer gene would decide to chew gum, and you havenāt answered this properly.
If your decision algorithm is EDT, the only free variables that will determine what your decisions are are going to be your preferences and sensory input.
The only way the gene can cause you to chew gum in any meaningful sense is to make you prefer to chew gum.
An EDT agent has knowledge of its own preferences. Therefore, an EDT agent already knows if it falls in the ālikely to get cancerā population.
Yes, the causality is from the decision process to the reward. The decision process may or may not be known to the agent, but its preferences are (data can be read, but the code can only be read if introspection is available).
You can and should self-modify to prefer acting in such a way that you would benefit from others predicting you would act a certain way. You get one-boxing behavior in Newcombās and this is still CDT/āEDT (which are really equivalent, as shown).
Yes, you could implement this behavior in the decision algorithm itself, and yes this is very much isomorphic. Evolutionās way to implement better cooperation has been to implement moral preferences though, it feels like a more natural design.
Typo, I do mean that EDT two boxes.
According to wikipedia, the definition of EDT is
Evidential decision theory is a school of thought within decision theory according to which the best action is the one which, conditional on your having chosen it, gives you the best expectations for the outcome.
This is not the same as ābeing a randomly chosen member of a group of people...ā and Iāve explained why. The information about group membership is contained in the filtration.
Youāre saying EDT causes you not to chew gum because cancer gives you EDT? Where does the gum appear in the equation?
Popular with Silicon Valley VCs 16 years later: just maximize the rate of entropy creationš¤¦š»āāļø