harfe

Karma: 628

harfe Aug 8, 2024, 11:23 AM
4 points
0
in reply to: rsaarelm’s comment on: It’s time for a self-reproducing machine
There is also Project Quine, which is a newer attempt to build a self-replicating 3D printer

harfe Jul 29, 2024, 6:00 PM
3 points
4
on: New Blog Post Against AI Doom
This was already referenced here: https://www.lesswrong.com/posts/MW6tivBkwSe9amdCw/ai-existential-risk-probabilities-are-too-unreliable-to

I think it would be better to comment there instead of here.

harfe Jun 20, 2024, 4:06 PM
8 points
0
on: Ilya Sutskever created a new AGI startup
One thing I find positive about SSI is their intent to not have products before superintelligence (note that I am not arguing here that the whole endeavor is net-positive). Not building intermediate products lessens the impact on race dynamics. I think it would be preferable if all the other AGI labs had a similar policy (funnily, while typing this comment, I got a notification about Claude 3.5 Sonnet… ). The policy not to have any product can also give them cover to focus on safety research that is relevant for superintelligence, instead of doing some shallow control of the output of LLMs.

To reduce bad impacts from SSI, it would be desirable that SSI also
- have a clearly stated policy to not publish their capabilities insights,
- take security sufficiently seriously to be able to defend against nation-state actors that try to steal their insights.

harfe Jun 19, 2024, 6:37 PM
4 points
1
in reply to: Chris_Leong’s comment on: Ilya Sutskever created a new AGI startup
It does not appear paywalled to me. The link that @mesaoptimizer posted is an archive, not the original bloomberg.com article.

Ilya Sutskever created a new AGI startup

harfeJun 19, 2024, 5:17 PM

95 points

35 comments1 min readLW link

(ssi.inc)

harfe May 12, 2024, 10:57 AM
3 points
0
on: New intro textbook on AIXI
I haven’t watched it yet, but there is also a recent technical discussion/podcast episode about AIXI and relatedd topics with Marcus Hutter: https://www.youtube.com/watch?v=7TgOwMW_rnk

harfe May 3, 2024, 9:55 PM
3 points
1
on: Maximal Lottery-Lotteries Exist
It suffices to show that the Smith lotteries that the above result establishes are the only lotteries that can be part of maximal lottery-lotteries are also subject to the partition-of-unity condition.

I fail to understand this sentence. Here are some questions about this sentence:
- what are Smith lotteries? Ctrl+f only finds lottery-Smith lottery-lotteries, do you mean these? Or do you mean lotteries that are smith?
- which result do you mean by “above result”?
- What does it mean for a lottery to be part of maximal lottery-lotteries?
- does “also subject to the partition-of-unity” refer to the smith lotteries or to the lotteries that are part of maximal lottery-lotteries? (it also feels like there is a word missing somewhere)
- Why would this suffice?
- Is this part also supposed to imply the existence of maximal lottery-lotteries? If so, why?

harfe Apr 19, 2024, 8:19 PM
1 point
0
on: What is the best way to talk about probabilities you expect to change with evidence/experiments?
A lot of the probabilities we talk about are probabilities we expect to change with evidence. If we flip a coin, our p(heads) changes after we observe the result of the flipped coin. My p(rain today) changes after I look into the sky and see clouds. In my view, there is nothing special in that regard for your p(doom). Uncertainty is in the mind, not in reality.

However, how you expect your p(doom) to change depending on facts or observation is useful information and it can be useful to convey that information. Some options that come to mind:
1. describe a model: If your p(doom) estimate is the result of a model consisting of other variables, just describing this model is useful information about your state of knowledge, even if that model is only approximate. This seems to come closest to your actual situation.
2. describe your probability distribution of your p(doom) in 1 year (or another time frame): You could say that you think there is a 25% chance that your p(doom) in 1 year is between 10% and 30%. Or give other information about that distribution. Note: your current p(doom) should be the mean of your p(doom) in 1 year.
3. describe your probability distribution of your p(doom) after a hypothetical month of working on a better p(doom) estimate: You could say that if you were to work hard for a month on investigating p(doom), you think there is a 25% chance that your p(doom) after that month is between 10% and 30%. This is similar to 2., but imo a bit more informative. Again, your p(doom) should be the mean of your p(doom) after a hypothetical month of investigation, even if you don’t actually do that investigation.

harfe Feb 10, 2024, 4:47 PM
1 point
in reply to: Yitz’s comment on: Yitz’s Shortform
This sounds like https://www.super-linear.org/trumanprize. It seems like it is run by Nonlinear and not FTX.

harfe Jan 10, 2024, 7:25 PM
1 point
on: Basic Inframeasure Theory
I think Proposition 1 is false as stated because the resulting functional $f^{+} : M^{\pm} (X) \oplus R \to R$ is not always continuous (wrt the KR-metric). The function $f : [0, 1] \to [0, 1]$ , $x \mapsto x^{1 / 3}$ with $X = [0, 1]$ should be a counterexample. However, the non-continuous functional $f^{+}$ should still be continuous on the set of sa-measures.

Another thing: the space of measures $M^{\pm} (X)$ is claimed to be a Banach space with the KR-norm (in the notation section). Afaik this is not true, while the space is a Banach space with the TV-norm, with the KR-metric/norm it should not be complete and is merely a normed vector space. Also the claim (in “Basic concepts”) that $M^{\pm} (X)$ is the dual space of $C (X)$ is only true if equipped with TV-norm, not with KR-metric.

Another nitpick: in Theorem 5, the type of $h$ in the assumption is probably meant to be $C (X) \to R \cup {- \infty}$ , instead of $C (X) \to R$ .

harfe Jan 9, 2024, 4:21 PM
LW: 3 AF: 3
0
AF
on: The Learning-Theoretic Agenda: Status 2023
Regarding direction 17: There might be some potential drawbacks to ADAM. I think its possible that some very agentic programs have relatively low $g$ score. This is due to explicit optimization algorithms being low complexity.

(Disclaimer: the following argument is not a proof, and appeals to some heuristics/etc. We fix $M = M_{0}$ for these considerations too.) Consider an utility function $^U$ . Further, consider a computable approximation of the optimal policy (AIXI that explicitly optimizes for $^U$ ) and has an approximation parameter n (this could be AIXI-tl, plus some approximation of $^U$ ; higher $n$ is better approximation). We will call this approximation of the optimal policy $π_{n}^{^U}$ . This approximation algorithm has complexity $K (π_{n}^{^U}) = C + K (^U) + K (n)$ , where $C > 0$ is a constant needed to describe the general algorithm (this should not be too large).

We can get better approximation by using a quickly growing function, such as the Ackermann function with $n = A (k, k)$ . Then we have $K (π_{A (k, k)}^{^U}) = C + K (^U) + K (A (k, k)) \leq C + K (^U) + log (k)$ .

What is the $g$ score of this policy? We have $g (π_{A (k, k)}^{^U}) = {max}_{U} ({min}_{π^{'} : \dots} K (π^{'}) - K (U))$ . Let $¯ U$ be maximal in this expression. If $K (¯ U) \geq K (^U) - C$ , then $g (π_{A (k, k)}^{^U}) = min π^{'} : E_{ζ_{M_{0}} π^{'}} (¯ U) \geq E_{ζ_{M_{0}} π_{A (k, k)}^{^U}} (¯ U) K (π^{'}) - K (¯ U) \leq K (π_{A (k, k)}^{^U}) - K (^U) + C \leq 2 C log (k)$ .

For the other case, let us assume that if $K (¯ U) < K (^U) - C$ , the policy $π_{A (k, k)}^{¯ U}$ is at least as good at maximizing $¯ U$ than $π_{A (k, k)}^{^U)}$ . Then, we have $g (π_{A (k, k)}^{^U}) = min π^{'} : E_{ζ_{M_{0}} π^{'}} (¯ U) \geq E_{ζ_{M_{0}} π_{A (k, k)}^{^U}} (¯ U) K (π^{'}) - K (¯ U) \leq K (π_{A (k, k)}^{¯ U}) - K (¯ U)) \leq C + log (k)$ .

I don’t think that the assumption ( $(π_{A (k, k)}^{¯ U}$ maximizes $b a r U$ better than $(π_{A (k, k)}^{^U}$ ) is true for all $^U$ and $k$ , but plausibly we can select $^U$ such that this is the case (exceptions, if they exist, would be a bit weird, and if ADAM working well due to these weird exceptions feels a bit disappointing to me). A thing that is not captured by approximations such as AIXI-tl are programs that halt but have insane runtime (longer than $A (k, k)$ ). Again, it would feel weird to me if ADAM sort of works because of low-complexity extremely-long-running halting programs.

To summarize, maybe there exist policies $π_{A (k, k)}^{^U}$ which strongly optimize a non-trivial utility function $^U$ with approximation parameter $n = A (k, k)$ , but where $g (π_{A (k, k)}^{^U}) \leq 2 C + log (k)$ is relatively small.

harfe Dec 21, 2023, 11:59 AM
3 points
0
on: How Would an Utopia-Maximizer Look Like?
I think the “deontological preferences are isomorphic to utility functions” is wrong as presented.

Firts, the formula has issues with dividing by zero and not summing probabilities to one (and re-using variable $x$ as a local variable in the sum). So you probably meant something like $P (x) = \frac{e^{u (x)}}{\sum_{y \in X} e^{u (y)}} .$ Even then, I dont think this describes any isomorphism of deontological preferences to utility functions.
- Utility functions are invariant when multiplied with a positive constant. This is not reflected in the formula.
- utility maximizers usually take the action with the best utility with probability $1$ , rather than using different probabilities for different utilities.
- modelling deontological constraints as probability distributions doesnt seem right to me. Let’s say I decide between drinking green tea and black tea, and neither of those violate any deontological constraints, then assigning some values (which ones?) to P(“I drink green tea”) or P(“I drink black tea”) doesnt describe these deontological constraints well.
- any behavior can be encoded as utility functions, so finding any isomorphisms to utility functions is usually possible, but not always meaningful.

harfe Nov 13, 2023, 1:12 PM
2 points
0
in reply to: [deactivated]’s comment on: TESCREAL and social justice: a synthesis
Some of the downvotes were probably because of the unironic use of the term TESCREAL. This term mixes a bunch of different things together, which makes your writing less clear.

harfe Oct 16, 2023, 3:46 PM
1 point
in reply to: Buck’s comment on: Buck’s Shortform
Sure, I’d be happy to read a draft

harfe Oct 15, 2023, 10:31 PM
LW: 3 AF: 2
AF
in reply to: Buck’s comment on: Buck’s Shortform
I am going to assume that in the code, when calculating p_alice_win_given_not_caught, we do not divide the $λ_{2}$ term by two (since this is not that consistent with the description. I am also assuming that $0.25 - t_{2}$ is a typo and $0.5 - t_{2}$ is meant, which would also be more consistent with other stuff). So I am going to assume assume a symmetrical version.

Here, P(Alice wins) is $e^{- λ_{1} t_{1} - λ_{2} t_{2}} - e^{- λ_{1} / 2 - λ_{2} / 2} =: f (λ_{1}, λ_{2}, t_{1}, t_{2})$ . Wlog we can assume $0 < t < 1$ (otherwise Bob will run everything or nothing in shielded mode).

We claim that $((λ_{1}^{*}, λ_{2}^{*}), (t_{1}^{*}, t_{2}^{*})) := ((μ^{*}, μ^{*}), (t / 2, t / 2)$ is a (pure) Nash equilibrium, where $μ^{*} = log t / (t - 1)$ .

To verify, lets first show that Alice cannot make a better choice if Bob plays $(t_{1}^{*}, t_{2}^{*}) = (t / 2, t / 2)$ . We have $f (λ_{1}, λ_{2}, t_{1}^{*}, t_{2}^{*}) = e^{- (λ_{1} + λ_{2}) t / 2} - e^{- (λ_{1} + λ_{2}) / 2}$ . Since this only depends on the sum, we can make the substitution $μ = (λ_{1} + λ_{2}) / 2$ . Thus, we want to maximize $~ f (μ, t) = e^{- μ t} - e^{- μ}$ . We have ${~ f}^{'} (μ, t) = - t e^{- μ t} + e^{- μ} = 0$ . Rearranging, we get $t e^{- μ t} = e^{- μ}$ . Taking logs, we get $log t - μ t = - μ$ . Rearranging, we get $μ = log t / (t - 1)$ . Thus, $μ = μ^{*}$ is the optimal choice. This means, that if Bob sticks to his strategy, Alice cannot do better than $f (μ^{*}, μ^{*}, t / 2, t / 2)$ .

Now, lets show that Bob cannot do better. We have $f (λ_{1}^{*}, λ_{2}^{*}, t_{1}, t_{2}) = e^{- μ^{*} t_{1} - μ^{*} t_{2}} - e^{- μ^{*} / 2 - μ^{*} / 2} = e^{- μ^{*} t} - e^{- μ^{*}}$ . This does not depend on $t_{1}$ and $t_{2}$ anymore, so any choice of $t_{1}$ and $t_{2}$ is optimal if Alice plays $(μ^{*}, μ^{*})$ .

(If I picked the wrong version of the question, and you actually want some symmetry: I suspect that the solution will have similarities, or that in some cases the solution can be obtained by rescaling the problem back into a more symmetric form.)

harfe Oct 13, 2023, 5:57 PM
1 point
0
on: To open-source or to not open-source, that is (an oversimplification of) the question.
This article talks a lot about risks from AI. I wish the author would be more specific what kinds of risks they are thinking about. For example, it is unclear which parts are motivated by extinction risks or not. The same goes for the benefits of open-sourcing these models. (note: I haven’t read the reports this article is based on, these might have been more specific)

harfe Oct 5, 2023, 11:06 PM
5 points
4
on: Provably Safe AI
Thank you for writing this review.

The strategy assumes we’ll develop a good set of safety properties that we’re demanding proof of.

I think this is very important. From skimming the paper it seems that unfortunately the authors do not discuss it much. I imagine that actually formally specifying safety properties is actually a rather difficult step.

To go with the example of not helping terrorists spread harmful virus: How would you even go about formulating this mathematically? This seems highly non-trivial to me. Do you need to mathematically formulate what exactly are harmful viruses?

The same holds for Asimov’s three laws of robotics, turning these into actual math or code seems to be quite challenging.

There’s likely some room for automated systems to figure out what safety humans want, and turn it into rigorous specifications.

Probably obvious to many, but I’d like to point out that these automated systems themselves need to be sufficiently aligned to humans, while also accomplishing tasks that are difficult for humans to do and probably involve a lot of moral considerations.

harfe Sep 24, 2023, 3:10 AM
3 points
0
on: Five neglected work areas that could reduce AI risk

A common response is that “evaluation may be easier than generation”. However, this doesn’t mean evaluation will be easy in absolute terms, or relative to one’s resources for doing it, or that it will depend on the same resources as generation.

I wonder to what degree this is true for the human-generated alignment ideas that are being submitted LessWrong/Alignment Forum?

For mathematical proofs, evaluation is (imo) usually easier than generation: Often, a well-written proof can be evaluated by reading it once, but often the person who wrote up the proof had to consider different approaches and discard a lot of them.

To what degree does this also hold for alignment research?

harfe Sep 24, 2023, 1:22 AM
9 points
4
on: The Dick Kick’em Paradox
The setup violates a fairness condition that has been talked about previously.

From https://arxiv.org/pdf/1710.05060.pdf, section 9:

We grant that it is possible to punish agents for using a speciﬁc decision proce- dure, or to design one decision problem that punishes an agent for rational behavior in a diﬀerent decision problem. In those cases, no decision theory is safe. CDT per- forms worse that FDT in the decision problem where agents are punished for using CDT, but that hardly tells us which theory is better for making decisions. [...]

Yet FDT does appear to be superior to CDT and EDT in all dilemmas where the agent’s beliefs are accurate and the outcome depends only on the agent’s behavior in the dilemma at hand. Informally, we call these sorts of problems “fair problems.” By this standard, Newcomb’s problem is fair; Newcomb’s predictor punishes and rewards agents only based on their actions. [...]

There is no perfect decision theory for all possible scenarios, but there may be a general-purpose decision theory that matches or outperforms all rivals in fair dilem- mas, if a satisfactory notion of “fairness” can be formalized

harfe Aug 17, 2023, 12:25 PM
1 point
4
on: If we had known the atmosphere would ignite
Is the organization who offers the prize supposed to define “alignment” and “AGI” or the person who claims the prize? this is unclear to me from reading your post.

Defining alignment (sufficiently rigorous so that a formal proof of (im)possibility of alignment is conceivable) is a hard thing! Such formal definitions would be very valuable by themselves (without any proofs). Especially if people widely agree that the definitions capture the important aspects of the problem.

harfe

Ilya Sutskever cre­ated a new AGI startup

Ilya Sutskever created a new AGI startup